Jump to content

Server Admin Log/Archive 97

From Wikitech


2025-09-30

  • 23:55 krinkle@deploy2002: krinkle: Continuing with sync
  • 23:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbprov1007.eqiad.wmnet with OS bookworm
  • 23:54 krinkle@deploy2002: krinkle: Backport for Disable wmgUseMdotRouting on Wikidata (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:46 krinkle@deploy2002: Started scap sync-world: Backport for Disable wmgUseMdotRouting on Wikidata (T403510)
  • 23:36 krinkle@deploy2002: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on Wiktionary (T403510) (duration: 12m 27s)
  • 23:31 krinkle@deploy2002: krinkle: Continuing with sync
  • 23:29 krinkle@deploy2002: krinkle: Backport for Disable wmgUseMdotRouting on Wiktionary (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:24 krinkle@deploy2002: Started scap sync-world: Backport for Disable wmgUseMdotRouting on Wiktionary (T403510)
  • 23:17 krinkle@deploy2002: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on Wikisource (T403510) (duration: 14m 01s)
  • 23:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbprov1007.eqiad.wmnet with OS bookworm
  • 23:12 krinkle@deploy2002: krinkle: Continuing with sync
  • 23:10 krinkle@deploy2002: krinkle: Backport for Disable wmgUseMdotRouting on Wikisource (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:03 krinkle@deploy2002: Started scap sync-world: Backport for Disable wmgUseMdotRouting on Wikisource (T403510)
  • 22:04 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbprov1007.eqiad.wmnet with OS bookworm
  • 21:56 urbanecm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 21:51 urbanecm@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 21:51 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 21:51 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 20:49 tgr_: UTC late deploys done
  • 20:49 ejegg: fundraising civicrm upgraded from 3ca0872a to 7c31a25c
  • 20:47 tgr@deploy2002: Finished scap sync-world: Backport for Enable JWT session cookies on group0 (T399631) (duration: 15m 27s)
  • 20:42 tgr@deploy2002: tgr: Continuing with sync
  • 20:39 tgr@deploy2002: tgr: Backport for Enable JWT session cookies on group0 (T399631) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:36 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs
  • 20:35 bking@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: T405978 (duration: 00m 10s)
  • 20:35 bking@deploy2002: Started deploy [wdqs/wdqs@fea7794]: T405978
  • 20:34 bking@deploy2002: Finished deploy [wdqs/wdqs@fea7794]: T405978 (duration: 00m 20s)
  • 20:33 bking@deploy2002: Started deploy [wdqs/wdqs@fea7794]: T405978
  • 20:32 tgr@deploy2002: Started scap sync-world: Backport for Enable JWT session cookies on group0 (T399631)
  • 20:29 dani@deploy2002: Finished scap sync-world: Backport for Remove reader foundational survey on enwiki (beta) (T405410), Increase coverage of Design Research participant recruitment survey on jawiki (T405577), Update reader foundational survey on enwiki (T405410), Enable USERLANGUAGE for sourceswiki (T406050) (duration: 21m 42s)
  • 20:27 ebysans@deploy2002: Finished deploy [analytics/refinery@c5c78d1] (thin): Regular analytics weekly train THIN [analytics/refinery@c5c78d17] (duration: 00m 57s)
  • 20:26 ebysans@deploy2002: Started deploy [analytics/refinery@c5c78d1] (thin): Regular analytics weekly train THIN [analytics/refinery@c5c78d17]
  • 20:25 ebysans@deploy2002: Finished deploy [analytics/refinery@c5c78d1]: Regular analytics weekly train [analytics/refinery@c5c78d17] (duration: 04m 43s)
  • 20:24 dani@deploy2002: dani, jhsoby: Continuing with sync
  • 20:22 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy minor UI tweak for improved DSL viewing - swfrench@cumin2002"
  • 20:22 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy minor UI tweak for improved DSL viewing - swfrench@cumin2002
  • 20:21 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy minor UI tweak for improved DSL viewing - swfrench@cumin2002
  • 20:21 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy minor UI tweak for improved DSL viewing - swfrench@cumin2002"
  • 20:21 ebysans@deploy2002: Started deploy [analytics/refinery@c5c78d1]: Regular analytics weekly train [analytics/refinery@c5c78d17]
  • 20:15 ebysans@deploy2002: Finished deploy [analytics/refinery@c5c78d1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@c5c78d17] (duration: 01m 00s)
  • 20:14 dani@deploy2002: dani, jhsoby: Backport for Remove reader foundational survey on enwiki (beta) (T405410), Increase coverage of Design Research participant recruitment survey on jawiki (T405577), Update reader foundational survey on enwiki (T405410), Enable USERLANGUAGE for sourceswiki (T406050) synced to the testservers (see https://wikitech.w
  • 20:13 ebysans@deploy2002: Started deploy [analytics/refinery@c5c78d1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@c5c78d17]
  • 20:12 SandraEbele_: Deploying Refinery as part of deployment weekly train
  • 20:07 dani@deploy2002: Started scap sync-world: Backport for Remove reader foundational survey on enwiki (beta) (T405410), Increase coverage of Design Research participant recruitment survey on jawiki (T405577), Update reader foundational survey on enwiki (T405410), Enable USERLANGUAGE for sourceswiki (T406050)
  • 20:07 SandraEbele_: refinery-source deployment paused due to maven release error
  • 20:06 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbprov1007.eqiad.wmnet with OS bookworm
  • 19:23 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirtlocal1003.eqiad.wmnet
  • 19:16 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudvirtlocal1003.eqiad.wmnet
  • 19:16 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirtlocal1002.eqiad.wmnet
  • 19:09 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudvirtlocal1002.eqiad.wmnet
  • 19:09 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet
  • 19:02 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudvirtlocal1001.eqiad.wmnet
  • 19:01 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet
  • 19:01 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudvirtlocal1001.eqiad.wmnet
  • 19:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 19:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 19:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T401906)', diff saved to https://phabricator.wikimedia.org/P83515 and previous config saved to /var/cache/conftool/dbconfig/20250930-190012-fceratto.json
  • 18:53 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy DSL rendering for known_client objects - swfrench@cumin2002"
  • 18:53 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy DSL rendering for known_client objects - swfrench@cumin2002
  • 18:52 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy DSL rendering for known_client objects - swfrench@cumin2002
  • 18:52 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy DSL rendering for known_client objects - swfrench@cumin2002"
  • 18:51 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T405978, transfer scholarly graph to newly-reimaged host) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2016.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 18:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P83514 and previous config saved to /var/cache/conftool/dbconfig/20250930-184504-fceratto.json
  • 18:38 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
  • 18:37 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
  • 18:36 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync
  • 18:36 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: sync
  • 18:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P83513 and previous config saved to /var/cache/conftool/dbconfig/20250930-182957-fceratto.json
  • 18:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T401906)', diff saved to https://phabricator.wikimedia.org/P83512 and previous config saved to /var/cache/conftool/dbconfig/20250930-181449-fceratto.json
  • 18:13 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T401906)', diff saved to https://phabricator.wikimedia.org/P83511 and previous config saved to /var/cache/conftool/dbconfig/20250930-181340-fceratto.json
  • 18:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 18:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 18:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T401906)', diff saved to https://phabricator.wikimedia.org/P83510 and previous config saved to /var/cache/conftool/dbconfig/20250930-181300-fceratto.json
  • 17:58 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer scholarly graph to newly-reimaged host) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2016.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:58 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T405978, transfer scholarly graph to newly-reimaged host) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2016.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:58 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer scholarly graph to newly-reimaged host) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2016.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P83509 and previous config saved to /var/cache/conftool/dbconfig/20250930-175752-fceratto.json
  • 17:57 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T405978, transfer scholarly graph to newly-reimaged host) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2016.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:57 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T405978, transfer scholarly graph to newly-reimaged host) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2016.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:56 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2016.codfw.wmnet with OS bullseye
  • 17:49 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbprov1007.eqiad.wmnet with OS bookworm
  • 17:46 swfrench@deploy2002: Finished scap sync-world: Non-image-build scap run to switch next and migration releases to PHP 8.3 - T405955 (duration: 04m 29s)
  • 17:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P83508 and previous config saved to /var/cache/conftool/dbconfig/20250930-174245-fceratto.json
  • 17:42 swfrench@deploy2002: Started scap sync-world: Non-image-build scap run to switch next and migration releases to PHP 8.3 - T405955
  • 17:38 jgleeson: payments-wiki upgraded from dc7cda24 to 2b281477
  • 17:29 swfrench@deploy2002: Finished scap sync-world: Deployment to pick up new PHP 8.3 production images (duration: 25m 33s)
  • 17:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T401906)', diff saved to https://phabricator.wikimedia.org/P83507 and previous config saved to /var/cache/conftool/dbconfig/20250930-172738-fceratto.json
  • 17:26 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T401906)', diff saved to https://phabricator.wikimedia.org/P83506 and previous config saved to /var/cache/conftool/dbconfig/20250930-172628-fceratto.json
  • 17:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 17:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T401906)', diff saved to https://phabricator.wikimedia.org/P83505 and previous config saved to /var/cache/conftool/dbconfig/20250930-172605-fceratto.json
  • 17:16 SandraEbele_: starting refinery-source deployment as part of weekly deployment train
  • 17:14 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 17:13 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 17:13 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 17:13 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 17:11 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 17:11 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 17:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P83503 and previous config saved to /var/cache/conftool/dbconfig/20250930-171058-fceratto.json
  • 17:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbprov1007.eqiad.wmnet with OS bookworm
  • 17:04 swfrench@deploy2002: Started scap sync-world: Deployment to pick up new PHP 8.3 production images
  • 16:56 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@f3216ec] (releasing): test (duration: 01m 02s)
  • 16:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P83502 and previous config saved to /var/cache/conftool/dbconfig/20250930-165550-fceratto.json
  • 16:55 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@f3216ec] (releasing): test
  • 16:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T401906)', diff saved to https://phabricator.wikimedia.org/P83501 and previous config saved to /var/cache/conftool/dbconfig/20250930-164043-fceratto.json
  • 16:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T401906)', diff saved to https://phabricator.wikimedia.org/P83500 and previous config saved to /var/cache/conftool/dbconfig/20250930-163933-fceratto.json
  • 16:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 16:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T401906)', diff saved to https://phabricator.wikimedia.org/P83499 and previous config saved to /var/cache/conftool/dbconfig/20250930-163910-fceratto.json
  • 16:38 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@f3216ec] (releasing): test (duration: 00m 31s)
  • 16:37 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@f3216ec] (releasing): test
  • 16:36 tgr@deploy2002: Finished scap sync-world: Backport for Revert "session: Enable MultiBackendSessionStore on `group1` wikis" (duration: 12m 49s)
  • 16:33 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2016.codfw.wmnet with reason: host reimage
  • 16:31 tgr@deploy2002: d3r1ck01, tgr: Continuing with sync
  • 16:30 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2016.codfw.wmnet with reason: host reimage
  • 16:30 tgr@deploy2002: d3r1ck01, tgr: Backport for Revert "session: Enable MultiBackendSessionStore on `group1` wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:29 andrewbogott: reprepro copy bookworm-wikimedia trixie-wikimedia helm3
  • 16:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P83498 and previous config saved to /var/cache/conftool/dbconfig/20250930-162402-fceratto.json
  • 16:23 tgr@deploy2002: Started scap sync-world: Backport for Revert "session: Enable MultiBackendSessionStore on `group1` wikis"
  • 16:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 16:15 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 16:15 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 16:12 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2016.codfw.wmnet with OS bullseye
  • 16:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts wdqs2016.codfw.wmnet
  • 16:11 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host wdqs2016.codfw.wmnet
  • 16:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P83497 and previous config saved to /var/cache/conftool/dbconfig/20250930-160855-fceratto.json
  • 16:01 cdanis@deploy2002: Finished scap sync-world: Backport for intake-logging EventGate: store x-ja3n req hdr (duration: 14m 01s)
  • 15:56 swfrench-wmf: reprepro include php8.3_8.3.25-1+wmf11u2 in component/php83
  • 15:55 cdanis@deploy2002: cdanis: Continuing with sync
  • 15:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T401906)', diff saved to https://phabricator.wikimedia.org/P83496 and previous config saved to /var/cache/conftool/dbconfig/20250930-155347-fceratto.json
  • 15:52 cdanis@deploy2002: cdanis: Backport for intake-logging EventGate: store x-ja3n req hdr synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T401906)', diff saved to https://phabricator.wikimedia.org/P83495 and previous config saved to /var/cache/conftool/dbconfig/20250930-155223-fceratto.json
  • 15:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 15:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T401906)', diff saved to https://phabricator.wikimedia.org/P83494 and previous config saved to /var/cache/conftool/dbconfig/20250930-155200-fceratto.json
  • 15:50 ejegg: donorwiki upgraded from 41b5ac89 to dc7cda24
  • 15:49 ejegg: payments-wiki upgraded from 3e533e02 to dc7cda24
  • 15:47 cdanis@deploy2002: Started scap sync-world: Backport for intake-logging EventGate: store x-ja3n req hdr
  • 15:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P83493 and previous config saved to /var/cache/conftool/dbconfig/20250930-153653-fceratto.json
  • 15:34 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 15:34 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 15:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P83491 and previous config saved to /var/cache/conftool/dbconfig/20250930-152146-fceratto.json
  • 15:09 brennen@deploy2002: Finished deploy [phabricator/deployment@41325d8]: deploy phab1004 for T406041 (duration: 00m 59s)
  • 15:08 brennen@deploy2002: Started deploy [phabricator/deployment@41325d8]: deploy phab1004 for T406041
  • 15:08 brennen@deploy2002: Finished deploy [phabricator/deployment@41325d8]: deploy phab2002 for T406041 (duration: 00m 31s)
  • 15:07 brennen@deploy2002: Started deploy [phabricator/deployment@41325d8]: deploy phab2002 for T406041
  • 15:06 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 15:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T401906)', diff saved to https://phabricator.wikimedia.org/P83490 and previous config saved to /var/cache/conftool/dbconfig/20250930-150638-fceratto.json
  • 15:05 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T401906)', diff saved to https://phabricator.wikimedia.org/P83489 and previous config saved to /var/cache/conftool/dbconfig/20250930-150529-fceratto.json
  • 15:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T401906)', diff saved to https://phabricator.wikimedia.org/P83488 and previous config saved to /var/cache/conftool/dbconfig/20250930-150448-fceratto.json
  • 15:04 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: phab deploy
  • 15:04 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: phab deploy
  • 15:03 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 15:03 dzahn@cumin2002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: version upgrade
  • 15:02 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 15:02 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 14:59 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:58 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 14:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 14:57 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P83486 and previous config saved to /var/cache/conftool/dbconfig/20250930-144940-fceratto.json
  • 14:48 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:45 dancy@deploy2002: Installation of scap version "4.213.0" completed for 2 hosts
  • 14:43 dancy@deploy2002: Installing scap version "4.213.0" for 2 host(s)
  • 14:42 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:41 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:41 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:40 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:40 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:40 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:39 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:38 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P83485 and previous config saved to /var/cache/conftool/dbconfig/20250930-143433-fceratto.json
  • 14:30 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:29 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:29 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:29 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:29 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 14:27 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:27 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2016.codfw.wmnet
  • 14:26 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2016.codfw.wmnet
  • 14:25 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T401906)', diff saved to https://phabricator.wikimedia.org/P83484 and previous config saved to /var/cache/conftool/dbconfig/20250930-141925-fceratto.json
  • 14:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1159 (T401906)', diff saved to https://phabricator.wikimedia.org/P83483 and previous config saved to /var/cache/conftool/dbconfig/20250930-141816-fceratto.json
  • 14:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 14:15 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:55 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:53 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for session: Enable MultiBackendSessionStore on `group1` wikis (T402808) (duration: 14m 40s)
  • 13:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, d3r1ck01: Continuing with sync
  • 13:47 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbprov1007.eqiad.wmnet with OS bookworm
  • 13:45 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, d3r1ck01: Backport for session: Enable MultiBackendSessionStore on `group1` wikis (T402808) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:38 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for session: Enable MultiBackendSessionStore on `group1` wikis (T402808)
  • 13:36 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:33 jforrester@deploy2002: Finished scap sync-world: Backport for Wikifunctions clients: Enable rich text (HTML) output in embedded calls (T397402) (duration: 12m 15s)
  • 13:28 jforrester@deploy2002: jforrester: Continuing with sync
  • 13:27 jforrester@deploy2002: jforrester: Backport for Wikifunctions clients: Enable rich text (HTML) output in embedded calls (T397402) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:25 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:21 jforrester@deploy2002: Started scap sync-world: Backport for Wikifunctions clients: Enable rich text (HTML) output in embedded calls (T397402)
  • 13:18 mfossati@deploy2002: Finished scap sync-world: Backport for ReaderExperiments' ImageBrowsing: don't collect the HTTP user agent (T403259) (duration: 12m 56s)
  • 13:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:14 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:13 mfossati@deploy2002: mfossati: Continuing with sync
  • 13:12 mfossati@deploy2002: mfossati: Backport for ReaderExperiments' ImageBrowsing: don't collect the HTTP user agent (T403259) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:05 mfossati@deploy2002: Started scap sync-world: Backport for ReaderExperiments' ImageBrowsing: don't collect the HTTP user agent (T403259)
  • 13:04 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:04 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbprov1007.eqiad.wmnet with OS bookworm
  • 12:51 bking@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2016.codfw.wmnet with OS bullseye
  • 12:16 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 11:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbprov1007.eqiad.wmnet with OS bookworm
  • 11:45 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 11:26 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.21 refs T405677
  • 11:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
  • 11:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1087.eqiad.wmnet with OS bullseye
  • 10:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 10:56 Amir1: dropping interwiki table on group0 (T397367)
  • 10:53 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1087.eqiad.wmnet with reason: host reimage
  • 10:43 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:40 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1087.eqiad.wmnet with reason: host reimage
  • 10:35 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.45.0-wmf.21 refs T405677 (duration: 45m 21s)
  • 10:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1087.eqiad.wmnet with OS bullseye
  • 10:23 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 10:22 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 10:01 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:50 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.45.0-wmf.21 refs T405677
  • 09:46 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 09:45 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 09:45 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 09:44 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 09:44 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 09:44 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 09:44 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:37 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:36 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:36 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1086.eqiad.wmnet with OS bullseye
  • 09:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1086.eqiad.wmnet with reason: host reimage
  • 09:04 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1086.eqiad.wmnet with reason: host reimage
  • 09:04 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 09:02 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 09:00 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 08:59 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 08:52 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1086.eqiad.wmnet with OS bullseye
  • 08:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 08:39 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 08:37 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 08:34 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 08:31 kharlan@deploy2002: Finished scap sync-world: Backport for CheckUser/UserInfoCard: Phase 3 enable by default on pilot wikis (T405342) (duration: 13m 25s)
  • 08:26 kharlan@deploy2002: kharlan: Continuing with sync
  • 08:24 kharlan@deploy2002: kharlan: Backport for CheckUser/UserInfoCard: Phase 3 enable by default on pilot wikis (T405342) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:17 kharlan@deploy2002: Started scap sync-world: Backport for CheckUser/UserInfoCard: Phase 3 enable by default on pilot wikis (T405342)
  • 08:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 08:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 07:55 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 07:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 07:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 07:29 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 07:28 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 07:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 06:42 kharlan@deploy2002: Finished scap sync-world: Backport for Hooks: Enable overriding the hook instance per action (T405239 T404204) (duration: 15m 09s)
  • 06:37 kharlan@deploy2002: kharlan: Continuing with sync
  • 06:33 kharlan@deploy2002: kharlan: Backport for Hooks: Enable overriding the hook instance per action (T405239 T404204) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:27 kharlan@deploy2002: Started scap sync-world: Backport for Hooks: Enable overriding the hook instance per action (T405239 T404204)
  • 05:12 stevemunene@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on druid[1007-1008].eqiad.wmnet with reason: Decommissioning druid_public hosts
  • 04:03 mwpresync@deploy2002: Pruned MediaWiki: 1.45.0-wmf.18 (duration: 03m 50s)
  • 01:18 ejegg: SmashPig upgraded from dc03e91b to 86bde4e4
  • 00:58 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker2003.codfw.wmnet with OS bookworm

2025-09-29

  • 23:46 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 22:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 22:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 22:55 bking@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 22:55 bking@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2017.codfw.wmnet']
  • 22:54 bking@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2017.codfw.wmnet with OS bullseye
  • 22:17 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply
  • 22:17 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply
  • 22:14 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
  • 22:14 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
  • 21:48 bking@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2016.codfw.wmnet with reason: host reimage
  • 21:46 rzl@deploy2002: Finished scap sync-world: https://gerrit.wikimedia.org/r/1191522 T403663 (duration: 06m 44s)
  • 21:45 bking@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2016.codfw.wmnet with reason: host reimage
  • 21:40 rzl@deploy2002: Started scap sync-world: https://gerrit.wikimedia.org/r/1191522 T403663
  • 21:39 btullis@cumin1003: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1209-1236].eqiad.wmnet
  • 21:36 btullis@cumin1003: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1209-1236].eqiad.wmnet
  • 21:33 bking@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
  • 21:28 bking@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs2016.codfw.wmnet with OS bullseye
  • 21:01 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1236.eqiad.wmnet with OS bullseye
  • 21:01 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 20:40 sergi0: end of UTC late backport window
  • 20:38 sgimeno@deploy2002: Finished scap sync-world: Backport for Growth: enable new notifications (T404085) (duration: 13m 45s)
  • 20:33 sgimeno@deploy2002: sgimeno: Continuing with sync
  • 20:32 sgimeno@deploy2002: sgimeno: Backport for Growth: enable new notifications (T404085) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:25 sgimeno@deploy2002: Started scap sync-world: Backport for Growth: enable new notifications (T404085)
  • 20:23 sgimeno@deploy2002: Finished scap sync-world: Backport for Enable $wgParserEnableUserLanguage (en) on Wikidata (T405830) (duration: 13m 23s)
  • 20:17 sgimeno@deploy2002: lucaswerkmeister, sgimeno: Continuing with sync
  • 20:16 sgimeno@deploy2002: lucaswerkmeister, sgimeno: Backport for Enable $wgParserEnableUserLanguage (en) on Wikidata (T405830) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on zuul1001.eqiad.wmnet with reason: WIP
  • 20:09 sgimeno@deploy2002: Started scap sync-world: Backport for Enable $wgParserEnableUserLanguage (en) on Wikidata (T405830)
  • 20:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on zuul2001.codfw.wmnet with reason: WIP
  • 20:04 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 20:00 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2001.codfw.wmnet with reason: WIP
  • 20:00 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
  • 19:55 jgleeson: payments-wiki upgraded from bb9fdfdb to 3e533e02
  • 19:48 swfrench@deploy2002: Stopping before sync operations
  • 19:48 swfrench@deploy2002: Started scap sync-world: Non-deploy scap run to initialize mw-script/next helmfile-defaults values - T405955
  • 19:13 jgleeson: civicrm upgraded from 0c89cc9f to b19e2594
  • 19:02 krinkle@deploy2002: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on wikimedia.org wikis (group1) (T403510) (duration: 13m 50s)
  • 18:57 krinkle@deploy2002: krinkle: Continuing with sync
  • 18:55 krinkle@deploy2002: krinkle: Backport for Disable wmgUseMdotRouting on wikimedia.org wikis (group1) (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:48 krinkle@deploy2002: Started scap sync-world: Backport for Disable wmgUseMdotRouting on wikimedia.org wikis (group1) (T403510)
  • 18:23 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 17:36 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:34 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:26 dancy@deploy2002: Finished scap sync-world: Testing T405110 (v2) (duration: 07m 20s)
  • 17:19 dancy@deploy2002: Started scap sync-world: Testing T405110 (v2)
  • 17:17 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 17:16 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:10 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:06 dancy@deploy2002: Stopping before sync operations
  • 17:05 dancy@deploy2002: Started scap sync-world: Testing T405110
  • 17:04 dancy@deploy2002: Installation of scap version "4.212.0" completed for 2 hosts
  • 17:03 dancy@deploy2002: Installing scap version "4.212.0" for 2 host(s)
  • 16:46 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 16:45 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 16:42 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 16:42 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 16:14 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 16:14 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 16:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 16:13 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 16:11 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048']
  • 16:11 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048']
  • 16:09 elukey@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048']
  • 16:09 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048']
  • 16:07 elukey@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048']
  • 16:07 elukey@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048']
  • 16:05 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 15:58 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet
  • 15:56 dancy@deploy2002: Finished scap sync-world: Testing gitinfo fix (T405738) (duration: 11m 16s)
  • 15:56 ejegg: donorwiki upgraded from 52104fab to 41b5ac89
  • 15:55 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5021.eqsin.wmnet
  • 15:54 fabfur: restart haproxy on cp5021 to test utf8ps converter
  • 15:52 tappof@deploy2002: Started restart [performance/navtiming@578b1d3]: Add authenticated mw_context values
  • 15:51 tappof@deploy2002: Finished deploy [performance/navtiming@578b1d3]: Add authenticated mw_context values (duration: 00m 02s)
  • 15:51 tappof@deploy2002: Started deploy [performance/navtiming@578b1d3]: Add authenticated mw_context values
  • 15:46 tappof@deploy2002: Finished deploy [performance/navtiming@578b1d3]: Add authenticated mw_context values (duration: 00m 15s)
  • 15:46 tappof@deploy2002: Started deploy [performance/navtiming@578b1d3]: Add authenticated mw_context values
  • 15:45 dancy@deploy2002: Started scap sync-world: Testing gitinfo fix (T405738)
  • 15:45 tappof@deploy2002: Finished deploy [performance/navtiming@578b1d3]: Add authenticated mw_context values (duration: 00m 02s)
  • 15:45 tappof@deploy2002: Started deploy [performance/navtiming@578b1d3]: Add authenticated mw_context values
  • 15:44 dancy@deploy2002: Installation of scap version "4.211.0" completed for 168 hosts
  • 15:44 tappof@deploy2002: Started restart [performance/navtiming@578b1d3]: Add authenticated mw_context values
  • 15:43 tappof@deploy2002: Finished deploy [performance/navtiming@578b1d3]: Add authenticated mw_context values (duration: 00m 02s)
  • 15:43 tappof@deploy2002: Started deploy [performance/navtiming@578b1d3]: Add authenticated mw_context values
  • 15:42 tappof@deploy2002: Finished deploy [performance/navtiming@578b1d3]: Add authenticated mw_context values (duration: 00m 02s)
  • 15:41 tappof@deploy2002: Started deploy [performance/navtiming@578b1d3]: Add authenticated mw_context values
  • 15:40 dancy@deploy2002: Installing scap version "4.211.0" for 168 host(s)
  • 15:39 ejegg: fundraising civicrm upgraded from 3771abe2 to 0c89cc9f
  • 15:36 tappof@deploy2002: Finished deploy [performance/navtiming@94fa387]: Add authenticated mw_context values (duration: 00m 02s)
  • 15:36 tappof@deploy2002: Started deploy [performance/navtiming@94fa387]: Add authenticated mw_context values
  • 15:33 tappof@deploy2002: Finished deploy [performance/navtiming@94fa387]: Add authenticated mw_context values (duration: 00m 02s)
  • 15:33 tappof@deploy2002: Started deploy [performance/navtiming@94fa387]: Add authenticated mw_context values
  • 15:33 ejegg: payments-wiki upgraded from bf2864e9 to bb9fdfdb
  • 15:32 ejegg: standalone SmashPig upgraded from 96afe81c to dc03e91b
  • 15:26 tappof@deploy2002: Started restart [performance/navtiming@94fa387]: Add authenticated mw_context values
  • 15:26 tappof@deploy2002: Finished deploy [performance/navtiming@94fa387]: Add authenticated mw_context values (duration: 00m 02s)
  • 15:26 tappof@deploy2002: Started deploy [performance/navtiming@94fa387]: Add authenticated mw_context values
  • 15:24 tappof@deploy2002: Finished deploy [performance/navtiming@94fa387]: Add authenticated mw_context values (duration: 00m 02s)
  • 15:24 tappof@deploy2002: Started deploy [performance/navtiming@94fa387]: Add authenticated mw_context values
  • 15:02 stevemunene@puppetserver1001: conftool action : set/pooled=no; selector: service=(druid-public-broker),name=druid1008.eqiad.wmnet
  • 15:02 stevemunene@puppetserver1001: conftool action : set/pooled=no; selector: service=(druid-public-broker),name=druid1007.eqiad.wmnet
  • 14:46 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 14:34 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy haproxy acl naming refactor and minor UI improvements - swfrench@cumin2002"
  • 14:34 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy haproxy acl naming refactor and minor UI improvements - swfrench@cumin2002
  • 14:33 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy haproxy acl naming refactor and minor UI improvements - swfrench@cumin2002
  • 14:33 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy haproxy acl naming refactor and minor UI improvements - swfrench@cumin2002"
  • 14:24 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1236.eqiad.wmnet with reason: host reimage
  • 14:23 cgoubert@cumin1003: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check toolhub: maintenance
  • 14:23 cgoubert@cumin1003: START - Cookbook sre.discovery.service-route check toolhub: maintenance
  • 14:21 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1236.eqiad.wmnet with reason: host reimage
  • 14:07 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1236.eqiad.wmnet with OS bullseye
  • 14:06 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1236.eqiad.wmnet with OS bullseye
  • 13:58 awight@deploy2002: mwscript-k8s job started: purgePage.php --wiki=dewiki # T389363
  • 13:57 mvernon@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ms-be[1086-1088].eqiad.wmnet with reason: awaiting controller swap
  • 13:48 Lucas_WMDE: UTC afternoon backport+config window done (CentralAuth:FixRenameUserLocalLogs maintenance script will keep running for a few hours)
  • 13:48 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: foreachwikiindblist sul CentralAuth:FixRenameUserLocalLogs --logwiki=metawiki # T398177 (dry run)
  • 13:47 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for FixRenameUserLocalLogs: Improve matching for users renamed multiple times (T398177), FixRenameUserLocalLogs: Ensure field subquery returns just 1 result (T398177) (duration: 11m 24s)
  • 13:42 lucaswerkmeister-wmde@deploy2002: matmarex, lucaswerkmeister-wmde: Continuing with sync
  • 13:41 lucaswerkmeister-wmde@deploy2002: matmarex, lucaswerkmeister-wmde: Backport for FixRenameUserLocalLogs: Improve matching for users renamed multiple times (T398177), FixRenameUserLocalLogs: Ensure field subquery returns just 1 result (T398177) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:36 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 13:35 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 13:35 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for FixRenameUserLocalLogs: Improve matching for users renamed multiple times (T398177), FixRenameUserLocalLogs: Ensure field subquery returns just 1 result (T398177)
  • 13:35 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 13:34 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 13:34 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 13:33 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 13:27 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1236.eqiad.wmnet with OS bullseye
  • 13:22 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1235.eqiad.wmnet with OS bullseye
  • 13:22 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 13:21 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for session: Enable MultiBackendSessionStore on `group0` wikis (T402808) (duration: 17m 52s)
  • 13:21 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 13:21 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1236.eqiad.wmnet with OS bullseye
  • 13:18 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 13:18 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 13:15 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 13:15 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 13:14 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde: Continuing with sync
  • 13:09 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde: Backport for session: Enable MultiBackendSessionStore on `group0` wikis (T402808) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for session: Enable MultiBackendSessionStore on `group0` wikis (T402808)
  • 12:59 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1235.eqiad.wmnet with reason: host reimage
  • 12:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1236.eqiad.wmnet with OS bullseye
  • 12:54 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1235.eqiad.wmnet with reason: host reimage
  • 12:51 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1236.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:49 btullis@cumin1003: START - Cookbook sre.hosts.provision for host an-worker1236.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:40 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1235.eqiad.wmnet with OS bullseye
  • 12:40 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1235.eqiad.wmnet with OS bullseye
  • 12:07 kharlan@deploy2002: Finished scap sync-world: Backport for SI: Fix sorting by status (T405605), UIC: Disable external permission check for Active wikis section (T405889) (duration: 44m 11s)
  • 11:54 kharlan@deploy2002: kharlan: Continuing with sync
  • 11:49 kharlan@deploy2002: kharlan: Backport for SI: Fix sorting by status (T405605), UIC: Disable external permission check for Active wikis section (T405889) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:42 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:42 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 11:23 gehel: pooling wdqs2007 (caught up on lag)
  • 11:23 kharlan@deploy2002: Started scap sync-world: Backport for SI: Fix sorting by status (T405605), UIC: Disable external permission check for Active wikis section (T405889)
  • 11:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1235.eqiad.wmnet with OS bullseye
  • 11:14 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1235.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:12 btullis@cumin1003: START - Cookbook sre.hosts.provision for host an-worker1235.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2050.codfw.wmnet
  • 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2050.codfw.wmnet
  • 10:59 gehel: pooling wdqs2021 and wdqs2011 (caught up on lag)
  • 10:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2050.codfw.wmnet
  • 10:53 moritzm: upgrade Envoy on an-web1001 T403663
  • 10:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2050.codfw.wmnet
  • 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2049.codfw.wmnet
  • 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2049.codfw.wmnet
  • 10:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2049.codfw.wmnet
  • 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2049.codfw.wmnet
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2048.codfw.wmnet
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2048.codfw.wmnet
  • 10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2048.codfw.wmnet
  • 10:34 Dreamy_Jazz: Created `global_block_whitelist` on thwikimedia - T400001
  • 10:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2048.codfw.wmnet
  • 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2031.codfw.wmnet
  • 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
  • 10:27 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on dbprov1007.eqiad.wmnet with reason: needs reimage
  • 10:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
  • 10:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
  • 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2030.codfw.wmnet
  • 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
  • 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
  • 10:03 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1245.eqiad.wmnet
  • 10:03 jynus@cumin1003: START - Cookbook sre.hosts.remove-downtime for db1245.eqiad.wmnet
  • 10:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2029.codfw.wmnet
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
  • 09:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
  • 09:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
  • 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
  • 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
  • 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 09:41 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: MariaDB package update
  • 09:41 gehel: depooling wdqs2007, wdqs2021 and wdqs2011 (update lag)
  • 09:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
  • 09:31 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 09:31 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 09:24 gehel: restarting blazegraph on wdqs2007, wdqs2021 and wdqs2011 (high thread count)
  • 09:24 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 09:23 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 09:11 slyngshede@dns1004: END - running authdns-update
  • 09:11 slyngs: Upgrading IDP/CAS-SSO to version 7.1.6.2
  • 09:10 slyngshede@dns1004: START - running authdns-update
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2027.codfw.wmnet
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
  • 08:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
  • 08:53 slyngshede@dns1004: END - running authdns-update
  • 08:52 slyngshede@dns1004: START - running authdns-update
  • 08:52 jynus: powercycling db1150 T405885
  • 08:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2027.codfw.wmnet
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2026.codfw.wmnet
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
  • 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 08:39 slyngshede@dns1004: END - running authdns-update
  • 08:38 slyngshede@dns1004: START - running authdns-update
  • 08:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2026.codfw.wmnet
  • 08:35 elukey: rolled out spicerack 11.9.0 to all cumin nodes
  • 08:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2025.codfw.wmnet
  • 08:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 08:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 08:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2025.codfw.wmnet
  • 08:20 elukey: uploaded spicerack_11.9.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
  • 08:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1054.eqiad.wmnet
  • 08:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1054.eqiad.wmnet
  • 08:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1054.eqiad.wmnet
  • 08:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1054.eqiad.wmnet
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1053.eqiad.wmnet
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1053.eqiad.wmnet
  • 08:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1053.eqiad.wmnet
  • 08:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1053.eqiad.wmnet
  • 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1052.eqiad.wmnet
  • 08:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1052.eqiad.wmnet
  • 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1052.eqiad.wmnet
  • 07:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1052.eqiad.wmnet
  • 07:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1051.eqiad.wmnet
  • 07:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1051.eqiad.wmnet
  • 07:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1051.eqiad.wmnet
  • 07:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1051.eqiad.wmnet
  • 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1050.eqiad.wmnet
  • 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1050.eqiad.wmnet
  • 07:37 moritzm: upgrade Envoy on config-master* T403663
  • 07:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1050.eqiad.wmnet
  • 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1050.eqiad.wmnet
  • 07:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1049.eqiad.wmnet
  • 07:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1049.eqiad.wmnet
  • 07:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1049.eqiad.wmnet
  • 07:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1049.eqiad.wmnet
  • 06:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1048.eqiad.wmnet
  • 06:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet
  • 06:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet
  • 06:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1048.eqiad.wmnet
  • 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1047.eqiad.wmnet
  • 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet
  • 06:37 moritzm: upgrade Envoy on chartmuseum hosts T403663
  • 06:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet
  • 06:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 50s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-09-28

  • 18:22 moritzm: restart Tomcat on idp1004
  • 18:12 jmm@dns1004: END - running authdns-update
  • 18:11 jmm@dns1004: START - running authdns-update
  • 16:38 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:38 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:38 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:37 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:37 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:37 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 01:02 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 01m 23s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-09-27

  • 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 31s)
  • 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image

2025-09-26

  • 22:41 rzl: rzl@apt1002:~$ sudo -i reprepro -C component/envoy-future include bullseye-wikimedia /home/rzl/envoyproxy/envoyproxy_1.32.12-1_amd64.changes # T405808
  • 21:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1214.eqiad.wmnet with OS bullseye
  • 21:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 21:56 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 21:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1214.eqiad.wmnet with reason: host reimage
  • 21:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1214.eqiad.wmnet with reason: host reimage
  • 21:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1214.eqiad.wmnet with OS bullseye
  • 20:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1210.eqiad.wmnet with OS bullseye
  • 20:52 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1212.eqiad.wmnet with OS bullseye
  • 20:44 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1211.eqiad.wmnet with OS bullseye
  • 20:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1209.eqiad.wmnet with OS bullseye
  • 20:35 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:34 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1213.eqiad.wmnet with OS bullseye
  • 20:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:31 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1210.eqiad.wmnet with reason: host reimage
  • 20:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1211.eqiad.wmnet with reason: host reimage
  • 20:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1212.eqiad.wmnet with reason: host reimage
  • 20:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1209.eqiad.wmnet with reason: host reimage
  • 20:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1213.eqiad.wmnet with reason: host reimage
  • 20:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1210.eqiad.wmnet with reason: host reimage
  • 20:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1209.eqiad.wmnet with reason: host reimage
  • 20:12 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1211.eqiad.wmnet with reason: host reimage
  • 20:11 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1212.eqiad.wmnet with reason: host reimage
  • 20:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1213.eqiad.wmnet with reason: host reimage
  • 19:58 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1210.eqiad.wmnet with OS bullseye
  • 19:57 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1209.eqiad.wmnet with OS bullseye
  • 19:57 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1211.eqiad.wmnet with OS bullseye
  • 19:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1212.eqiad.wmnet with OS bullseye
  • 19:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1213.eqiad.wmnet with OS bullseye
  • 19:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1214.eqiad.wmnet with OS bullseye
  • 19:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1219.eqiad.wmnet with OS bullseye
  • 19:51 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1218.eqiad.wmnet with OS bullseye
  • 19:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:49 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1216.eqiad.wmnet with OS bullseye
  • 19:47 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:46 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1215.eqiad.wmnet with OS bullseye
  • 19:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1217.eqiad.wmnet with OS bullseye
  • 19:43 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:42 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:37 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1216.eqiad.wmnet with reason: host reimage
  • 19:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1218.eqiad.wmnet with reason: host reimage
  • 19:22 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1213.eqiad.wmnet with OS bullseye
  • 19:22 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1214.eqiad.wmnet with OS bullseye
  • 19:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1215.eqiad.wmnet with reason: host reimage
  • 19:17 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1216.eqiad.wmnet with reason: host reimage
  • 19:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1217.eqiad.wmnet with reason: host reimage
  • 19:17 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1215.eqiad.wmnet with reason: host reimage
  • 19:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1219.eqiad.wmnet with reason: host reimage
  • 19:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1217.eqiad.wmnet with reason: host reimage
  • 19:11 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1218.eqiad.wmnet with reason: host reimage
  • 19:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1219.eqiad.wmnet with reason: host reimage
  • 19:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1213.eqiad.wmnet with OS bullseye
  • 19:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1214.eqiad.wmnet with OS bullseye
  • 19:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1216.eqiad.wmnet with OS bullseye
  • 19:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1215.eqiad.wmnet with OS bullseye
  • 18:58 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1217.eqiad.wmnet with OS bullseye
  • 18:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1218.eqiad.wmnet with OS bullseye
  • 18:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1219.eqiad.wmnet with OS bullseye
  • 18:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1220.eqiad.wmnet with OS bullseye
  • 18:51 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:46 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1223.eqiad.wmnet with OS bullseye
  • 18:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1221.eqiad.wmnet with OS bullseye
  • 18:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:39 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1222.eqiad.wmnet with OS bullseye
  • 18:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1224.eqiad.wmnet with OS bullseye
  • 18:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1220.eqiad.wmnet with reason: host reimage
  • 18:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1223.eqiad.wmnet with reason: host reimage
  • 18:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1221.eqiad.wmnet with reason: host reimage
  • 18:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1222.eqiad.wmnet with reason: host reimage
  • 18:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1224.eqiad.wmnet with reason: host reimage
  • 18:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1220.eqiad.wmnet with reason: host reimage
  • 18:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1221.eqiad.wmnet with reason: host reimage
  • 18:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1222.eqiad.wmnet with reason: host reimage
  • 18:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1223.eqiad.wmnet with reason: host reimage
  • 18:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1224.eqiad.wmnet with reason: host reimage
  • 17:48 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1220.eqiad.wmnet with OS bullseye
  • 17:48 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1221.eqiad.wmnet with OS bullseye
  • 17:48 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1222.eqiad.wmnet with OS bullseye
  • 17:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1223.eqiad.wmnet with OS bullseye
  • 17:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1224.eqiad.wmnet with OS bullseye
  • 17:08 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 17:07 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 17:04 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on dbprov1007.eqiad.wmnet with reason: needs reinstall
  • 17:00 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 17:00 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 16:52 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 16:51 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 16:47 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:45 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 16:44 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:41 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:40 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 16:39 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:33 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 16:30 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:12 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 16:11 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:51 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:50 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:43 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:39 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:38 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:35 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:35 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:16 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:13 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:09 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:07 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:52 dancy: Ran `scap clean-images` on deploy1003. Trimmed /srv down to 48% usage. (T401647)
  • 14:35 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 14:34 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 13:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1225.eqiad.wmnet with OS bullseye
  • 13:59 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1227.eqiad.wmnet with OS bullseye
  • 13:58 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1226.eqiad.wmnet with OS bullseye
  • 13:42 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:42 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:39 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Checking sanitization for wikis tokwiki in section s5
  • 13:37 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:35 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki in section s5
  • 13:35 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.sanitize-wiki (exit_code=97) Checking sanitization for wikis tokwiki in section s5
  • 13:34 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki in section s5
  • 13:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1225.eqiad.wmnet with reason: host reimage
  • 13:32 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Checking sanitization for wikis tokwiki in section s5
  • 13:32 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki in section s5
  • 13:32 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Checking sanitization for wikis tokwiki in section s5
  • 13:32 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki in section s5
  • 13:31 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Checking sanitization for wikis tokwiki in section s5
  • 13:31 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki in section s5
  • 13:30 phuedx@deploy2002: Finished scap sync-world: Backport for ext.xLab: Add mw.xLab.getInstrument() (T401380 T404851), lib: Update metrics-platform to fc7678c10a1f (T401380) (duration: 23m 55s)
  • 13:29 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1225.eqiad.wmnet with reason: host reimage
  • 13:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1228.eqiad.wmnet with OS bullseye
  • 13:28 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:27 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:25 phuedx@deploy2002: phuedx: Continuing with sync
  • 13:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1229.eqiad.wmnet with OS bullseye
  • 13:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:20 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Checking sanitization for wikis tokwiki in section s5
  • 13:18 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki in section s5
  • 13:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1226.eqiad.wmnet with reason: host reimage
  • 13:14 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1225.eqiad.wmnet with OS bullseye
  • 13:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1227.eqiad.wmnet with reason: host reimage
  • 13:13 phuedx@deploy2002: phuedx: Backport for ext.xLab: Add mw.xLab.getInstrument() (T401380 T404851), lib: Update metrics-platform to fc7678c10a1f (T401380) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:09 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1225.eqiad.wmnet with OS bullseye
  • 13:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1228.eqiad.wmnet with reason: host reimage
  • 13:06 phuedx@deploy2002: Started scap sync-world: Backport for ext.xLab: Add mw.xLab.getInstrument() (T401380 T404851), lib: Update metrics-platform to fc7678c10a1f (T401380)
  • 13:05 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1226.eqiad.wmnet with reason: host reimage
  • 13:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1227.eqiad.wmnet with reason: host reimage
  • 13:04 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
  • 13:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1228.eqiad.wmnet with reason: host reimage
  • 12:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1229.eqiad.wmnet with reason: host reimage
  • 12:52 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1229.eqiad.wmnet with reason: host reimage
  • 12:51 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1225.eqiad.wmnet with OS bullseye
  • 12:50 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1226.eqiad.wmnet with OS bullseye
  • 12:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1227.eqiad.wmnet with OS bullseye
  • 12:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1228.eqiad.wmnet with OS bullseye
  • 12:36 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1229.eqiad.wmnet with OS bullseye
  • 12:35 moritzm: created cn=airflow-wikidata-ops group T405557
  • 12:35 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:58 jynus: testing backups after new config deploy T403166
  • 10:33 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:44 jynus: finished deploying new grants T403166
  • 08:28 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 07:29 jynus: start deploying new backup grants T403166
  • 07:24 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
  • 07:21 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab replica
  • 07:12 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab replica
  • 07:10 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
  • 07:00 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
  • 02:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1230.eqiad.wmnet with OS bullseye
  • 02:59 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 02:59 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 02:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1232.eqiad.wmnet with OS bullseye
  • 02:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 02:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 02:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1231.eqiad.wmnet with OS bullseye
  • 02:38 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 02:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1232.eqiad.wmnet with reason: host reimage
  • 02:38 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 02:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1229.eqiad.wmnet with OS bullseye
  • 02:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1226.eqiad.wmnet with OS bullseye
  • 02:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1225.eqiad.wmnet with OS bullseye
  • 02:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1227.eqiad.wmnet with OS bullseye
  • 02:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1230.eqiad.wmnet with reason: host reimage
  • 02:31 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1232.eqiad.wmnet with reason: host reimage
  • 02:31 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1230.eqiad.wmnet with reason: host reimage
  • 02:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1225.eqiad.wmnet with OS bullseye
  • 02:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1226.eqiad.wmnet with OS bullseye
  • 02:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1227.eqiad.wmnet with OS bullseye
  • 02:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1228.eqiad.wmnet with OS bullseye
  • 02:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1229.eqiad.wmnet with OS bullseye
  • 02:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1232.eqiad.wmnet with OS bullseye
  • 02:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1230.eqiad.wmnet with OS bullseye
  • 02:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1232.eqiad.wmnet with OS bullseye
  • 02:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1231.eqiad.wmnet with reason: host reimage
  • 02:13 krinkle@deploy2002: Finished scap sync-world: Backport for Disable inert MobileFrontend on wikimedia.org wikis lacking DNS (part 2) (T152882) (duration: 13m 09s)
  • 02:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1231.eqiad.wmnet with reason: host reimage
  • 02:08 krinkle@deploy2002: krinkle: Continuing with sync
  • 02:07 krinkle@deploy2002: krinkle: Backport for Disable inert MobileFrontend on wikimedia.org wikis lacking DNS (part 2) (T152882) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 02:00 krinkle@deploy2002: Started scap sync-world: Backport for Disable inert MobileFrontend on wikimedia.org wikis lacking DNS (part 2) (T152882)
  • 01:51 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1231.eqiad.wmnet with OS bullseye
  • 01:38 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1232.eqiad.wmnet with OS bullseye
  • 01:15 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 53s)
  • 01:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
  • 00:36 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 00:36 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 00:33 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 00:32 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 00:10 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 00:10 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 00:03 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 00:03 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply

2025-09-25

  • 23:29 mutante: releases2003 - re-enabling puppet which was disabled for debugging T405352 - then the deployment server failover happened and this server didn't get the update what the active deployment server was.. which subsequently caused T405646
  • 23:10 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 23:10 rzl@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 23:08 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 23:08 rzl@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 22:44 eileen: civicrm upgraded from a53165c1 to 3771abe2
  • 22:09 krinkle@deploy2002: Finished scap sync-world: Backport for Disable inert MobileFrontend on wikimedia.org wikis that lack DNS (T152882) (duration: 13m 52s)
  • 22:04 krinkle@deploy2002: krinkle: Continuing with sync
  • 22:02 krinkle@deploy2002: krinkle: Backport for Disable inert MobileFrontend on wikimedia.org wikis that lack DNS (T152882) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:55 krinkle@deploy2002: Started scap sync-world: Backport for Disable inert MobileFrontend on wikimedia.org wikis that lack DNS (T152882)
  • 21:43 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on wdqs[2009,2016].codfw.wmnet,wdqs[1018-1020].eqiad.wmnet with reason: T395772
  • 21:29 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 21:29 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 21:27 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on wdqs[2009,2016].codfw.wmnet,wdqs[1018-1020].eqiad.wmnet with reason: T395772
  • 21:24 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 21:24 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 21:21 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 21:21 rzl@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 20:55 cjming: end of UTC late backport window
  • 20:52 cjming@deploy2002: Finished scap sync-world: Backport for xLab: instrument page visits with delayed events (duration: 11m 17s)
  • 20:47 cjming@deploy2002: cjming: Continuing with sync
  • 20:47 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 20:47 cjming@deploy2002: cjming: Backport for xLab: instrument page visits with delayed events synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:46 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 20:44 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 20:44 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 20:41 cjming@deploy2002: Started scap sync-world: Backport for xLab: instrument page visits with delayed events
  • 20:38 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 20:35 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 20:34 mutante: [releases2003:~] $ sudo systemctl reset-failed - monitoring alerted about failed rsync from deploy1003 after active deployment server switched to deploy2002 today - T405646
  • 20:31 dani@deploy2002: Finished scap sync-world: Backport for Pre-deploy reader foundational survey on enwiki (T405410), Deploy Design Research participant recruitment survey on jawiki (T405577) (duration: 12m 35s)
  • 20:26 dani@deploy2002: dani: Continuing with sync
  • 20:24 dani@deploy2002: dani: Backport for Pre-deploy reader foundational survey on enwiki (T405410), Deploy Design Research participant recruitment survey on jawiki (T405577) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:18 dani@deploy2002: Started scap sync-world: Backport for Pre-deploy reader foundational survey on enwiki (T405410), Deploy Design Research participant recruitment survey on jawiki (T405577)
  • 20:16 mutante: deploy1003 alerted because /srv/ is at 98% - T401647
  • 20:15 sbisson@deploy2002: Finished scap sync-world: Backport for Special:Contribute: configure new page target title for enwiki (T327063) (duration: 11m 31s)
  • 20:10 sbisson@deploy2002: sbisson: Continuing with sync
  • 20:10 sbisson@deploy2002: sbisson: Backport for Special:Contribute: configure new page target title for enwiki (T327063) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:04 sbisson@deploy2002: Started scap sync-world: Backport for Special:Contribute: configure new page target title for enwiki (T327063)
  • 19:36 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:36 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:35 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.20 refs T396381
  • 19:31 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 19:30 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 19:24 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 19:19 brennen@deploy2002: Finished scap sync-world: Backport for fix: provide a eventType fallback for already scheduled jobs (T405514), fix: prevent type-error from outdated serialization (T405511) (duration: 54m 29s)
  • 19:11 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 19:05 brennen@deploy2002: sgimeno, migr, brennen: Continuing with sync
  • 19:04 brennen@deploy2002: sgimeno, migr, brennen: Backport for fix: provide a eventType fallback for already scheduled jobs (T405514), fix: prevent type-error from outdated serialization (T405511) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:56 mutante: LDAP - added uid=elishacohenwmde to 'wmde' and 'nda' T404359
  • 18:55 mutante: LDAP - added member: uid=elishacohenwmde,ou=people,dc=wikimedia,dc=org
  • 18:24 brennen@deploy2002: Started scap sync-world: Backport for fix: provide a eventType fallback for already scheduled jobs (T405514), fix: prevent type-error from outdated serialization (T405511)
  • 18:11 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 18:08 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 17:57 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 17:56 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 17:32 jasmine@deploy2002: Finished scap sync-world: Test deployment to validate deployment server switchover - T399891. (duration: 39m 28s)
  • 17:25 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 17:25 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • food: donorwiki upgraded from df2482ce to 52104fab
  • 17:02 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw1-f5b-codfw
  • 17:02 pt1979@cumin2002: START - Cookbook sre.network.tls for network device fasw1-f5b-codfw
  • 16:54 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw1-f5b-codfw.mgmt.codfw.wmnet
  • 16:52 jasmine@deploy2002: Started scap sync-world: Test deployment to validate deployment server switchover - T399891.
  • 16:46 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:41 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 16:41 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 16:40 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:27 jasmine@dns1004: END - running authdns-update
  • 16:26 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host durum7003.magru.wmnet with OS bookworm
  • 16:25 jasmine@dns1004: START - running authdns-update
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw1-f5b-codfw - pt1979@cumin2002"
  • 16:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw1-f5b-codfw - pt1979@cumin2002"
  • 16:22 jasmine@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases2003.codfw.wmnet,releases1003.eqiad.wmnet with reason: Deployment server switchover
  • 16:15 jasmine_: sopped spiderpig-apiserver, spiderpig-jobrunner on deploy1003
  • 16:13 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:13 pt1979@cumin2002: START - Cookbook sre.network.provision for device fasw1-f5b-codfw.mgmt.codfw.wmnet
  • 16:13 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw1-f5a-codfw
  • 16:13 pt1979@cumin2002: START - Cookbook sre.network.tls for network device fasw1-f5a-codfw
  • 16:09 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw1-f5a-codfw.mgmt.codfw.wmnet
  • 16:06 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 15:54 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:44 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:41 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum5001.eqsin.wmnet with OS trixie
  • 15:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw1-f5a-codfw - pt1979@cumin2002"
  • 15:38 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw1-f5a-codfw - pt1979@cumin2002"
  • 15:35 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum7003.magru.wmnet with OS bookworm
  • 15:34 sukhe: sudo puppet node deactivate durum7003.magru.wmnet: stuck after reimage with failed puppet run
  • 15:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:33 pt1979@cumin2002: START - Cookbook sre.network.provision for device fasw1-f5a-codfw.mgmt.codfw.wmnet
  • 15:32 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 15:31 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 15:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1045.eqiad.wmnet
  • 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet
  • 15:22 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 15:22 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 15:19 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5001.eqsin.wmnet with reason: host reimage
  • 15:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet
  • 15:15 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5001.eqsin.wmnet with reason: host reimage
  • 15:14 brennen@deploy1003: Finished deploy [phabricator/deployment@5d4a2bb]: deploy phab1004 for T404134 (duration: 03m 49s)
  • 15:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1045.eqiad.wmnet
  • 15:11 brennen@deploy1003: Started deploy [phabricator/deployment@5d4a2bb]: deploy phab1004 for T404134
  • 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1044.eqiad.wmnet
  • 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet
  • 15:06 brennen@deploy1003: Finished deploy [phabricator/deployment@5d4a2bb]: deploy phab2002 for T404134 (duration: 00m 41s)
  • 15:05 brennen@deploy1003: Started deploy [phabricator/deployment@5d4a2bb]: deploy phab2002 for T404134
  • 15:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet
  • 15:04 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host durum7003.magru.wmnet with OS trixie
  • 15:01 jelto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator deploy
  • 15:01 jelto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator deploy
  • 14:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
  • 14:54 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 14:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1044.eqiad.wmnet
  • 14:53 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
  • 14:48 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:48 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:48 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:42 sukhe: merging revert for HTTP1.0 discard on cp1107
  • 14:41 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
  • 14:41 elukey@deploy1003: helmfile [staging] START helmfile.d/services/kartotherian: sync
  • 14:41 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 14:40 elukey@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
  • 14:22 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum7003.magru.wmnet with OS trixie
  • 14:22 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum5001.eqsin.wmnet with OS trixie
  • 14:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1043.eqiad.wmnet
  • 14:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet
  • 14:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet
  • 14:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1043.eqiad.wmnet
  • 14:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1042.eqiad.wmnet
  • 14:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
  • 14:11 cdanis@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "feat: search inside inline pattern values - cdanis@cumin1003"
  • 14:11 cdanis@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: search inside inline pattern values - cdanis@cumin1003
  • 14:10 cdanis@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: search inside inline pattern values - cdanis@cumin1003
  • 14:10 cdanis@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "feat: search inside inline pattern values - cdanis@cumin1003"
  • 14:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
  • 14:06 tgr_: UTC afternoon deploys done
  • 14:04 tgr@deploy1003: Finished scap sync-world: Backport for Enable multibackend session store on beta and testwiki (T402808), Pre-deploy Design Research participant recruitment survey on jawiki (T405577) (duration: 16m 11s)
  • 14:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1042.eqiad.wmnet
  • 13:59 tgr@deploy1003: tgr, d3r1ck01, dani: Continuing with sync
  • 13:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1210.eqiad.wmnet with OS bullseye
  • 13:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
  • 13:54 tgr@deploy1003: tgr, d3r1ck01, dani: Backport for Enable multibackend session store on beta and testwiki (T402808), Pre-deploy Design Research participant recruitment survey on jawiki (T405577) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
  • 13:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
  • 13:48 tgr@deploy1003: Started scap sync-world: Backport for Enable multibackend session store on beta and testwiki (T402808), Pre-deploy Design Research participant recruitment survey on jawiki (T405577)
  • {{safesubst:SAL entry|1=13:43 tgr@deploy1003: Finished scap sync-world: Backport for objectcache: Add a hit/miss flag to CachedBagOStuff, session: Improve logging and monitoring in SessionStore implementations (T399195 T402808), hCaptcha: Fix mock for StatsFactory, NewcomerTasks: Use StatsFactory unit test helper, [[gerrit:1191350|objectcache: Add a hit/miss flag to Cached}}
  • 13:38 tgr@deploy1003: d3r1ck01, wmde-fisch, tgr: Continuing with sync
  • 13:33 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • {{safesubst:SAL entry|1=13:33 tgr@deploy1003: d3r1ck01, wmde-fisch, tgr: Backport for objectcache: Add a hit/miss flag to CachedBagOStuff, session: Improve logging and monitoring in SessionStore implementations (T399195 T402808), hCaptcha: Fix mock for StatsFactory, NewcomerTasks: Use StatsFactory unit test helper, [[gerrit:1191350|objectcache: Add a hit/miss flag to Cache}}
  • 13:29 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
  • {{safesubst:SAL entry|1=13:27 tgr@deploy1003: Started scap sync-world: Backport for objectcache: Add a hit/miss flag to CachedBagOStuff, session: Improve logging and monitoring in SessionStore implementations (T399195 T402808), hCaptcha: Fix mock for StatsFactory, NewcomerTasks: Use StatsFactory unit test helper, [[gerrit:1191350|objectcache: Add a hit/miss flag to CachedB}}
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1040.eqiad.wmnet
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
  • 13:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
  • 13:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1040.eqiad.wmnet
  • 13:14 dreamyjazz@deploy1003: Finished scap sync-world: Backport for CheckUser: Enable SI special page on enwiki and frwiki (T405556) (duration: 14m 04s)
  • 13:09 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
  • 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
  • 13:07 dreamyjazz@deploy1003: dreamyjazz: Backport for CheckUser: Enable SI special page on enwiki and frwiki (T405556) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
  • 13:00 dreamyjazz@deploy1003: Started scap sync-world: Backport for CheckUser: Enable SI special page on enwiki and frwiki (T405556)
  • 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
  • 12:54 elukey@cumin1003: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for puppetserver1001.eqiad.wmnet: Renew puppet certificate - elukey@cumin1003
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1038.eqiad.wmnet
  • 12:49 jynus: swap read only for db1176/db2230 (test-s4) T403966
  • 12:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1038.eqiad.wmnet
  • 12:38 elukey@cumin1003: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for puppetserver1001.eqiad.wmnet: Renew puppet certificate - elukey@cumin1003
  • 12:16 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 12:15 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:15 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:15 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
  • 12:09 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:09 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1037.eqiad.wmnet
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1037.eqiad.wmnet
  • 11:47 ladsgroup@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw for all core sections
  • 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
  • 11:42 ladsgroup@cumin1003: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw for all core sections
  • 11:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
  • 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
  • 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1036.eqiad.wmnet
  • 11:26 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:25 stevemunene@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1210.eqiad.wmnet with OS bullseye
  • 11:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1210.eqiad.wmnet with OS bullseye
  • 11:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
  • 11:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
  • 09:52 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 09:42 elukey@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
  • 09:40 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:39 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 09:39 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 09:38 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:38 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 09:29 elukey@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1035.eqiad.wmnet
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1035.eqiad.wmnet
  • 09:18 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 09:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
  • 09:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
  • 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
  • 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
  • 09:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
  • 08:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
  • 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
  • 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
  • 08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
  • 08:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
  • 08:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
  • 08:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
  • 08:34 jforrester@deploy1003: Finished scap sync-world: Backport for Stop loading the Graph extension anywhere (T362317) (duration: 39m 14s)
  • 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
  • 08:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
  • 08:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1030.eqiad.wmnet
  • 08:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
  • 08:24 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1030.eqiad.wmnet
  • 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
  • 08:22 jforrester@deploy1003: jforrester: Continuing with sync
  • 08:21 jforrester@deploy1003: jforrester: Backport for Stop loading the Graph extension anywhere (T362317) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
  • 08:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 08:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 08:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
  • 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
  • 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
  • 07:58 kart_: staging: Updated cxserver to 2025-09-25-074241-production (T394982)
  • 07:56 brouberol@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 07:56 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 07:56 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 07:55 brouberol@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 07:55 jforrester@deploy1003: Started scap sync-world: Backport for Stop loading the Graph extension anywhere (T362317)
  • 07:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
  • 07:54 jforrester@deploy1003: Finished scap sync-world: Backport for ExperimentXLabManager: allow to re-enroll a user in experiments (T401308) (duration: 18m 03s)
  • 07:52 brouberol@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 07:52 brouberol@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 07:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
  • 07:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
  • 07:50 jforrester@deploy1003: jforrester, sgimeno: Continuing with sync
  • 07:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
  • 07:49 brouberol@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 07:49 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 07:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
  • 07:42 jforrester@deploy1003: jforrester, sgimeno: Backport for ExperimentXLabManager: allow to re-enroll a user in experiments (T401308) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
  • 07:36 jforrester@deploy1003: mwscript-k8s job started: namespaceDupes mswikiquote --fix # T404700
  • 07:36 jforrester@deploy1003: Started scap sync-world: Backport for ExperimentXLabManager: allow to re-enroll a user in experiments (T401308)
  • 07:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
  • 07:36 jforrester@deploy1003: Finished scap sync-world: Backport for Growth [testwiki]: enable new notifications and reduce scheduling time (T404085), mswikiquote: set timezone, sitename and project namespace (T404700), mswikiquote: add logo (T404700) (duration: 13m 36s)
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 07:31 jforrester@deploy1003: anzx, jforrester, sgimeno: Continuing with sync
  • 07:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 07:28 jforrester@deploy1003: anzx, jforrester, sgimeno: Backport for Growth [testwiki]: enable new notifications and reduce scheduling time (T404085), mswikiquote: set timezone, sitename and project namespace (T404700), mswikiquote: add logo (T404700) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
  • 07:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
  • 07:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 07:22 jforrester@deploy1003: Started scap sync-world: Backport for Growth [testwiki]: enable new notifications and reduce scheduling time (T404085), mswikiquote: set timezone, sitename and project namespace (T404700), mswikiquote: add logo (T404700)
  • 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
  • 07:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
  • 07:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet
  • 07:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
  • 07:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
  • 06:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 06:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 06:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
  • 06:33 kharlan@deploy1003: Finished scap sync-world: Backport for CheckUser/UserInfoCard: Phase 2 enable by default on pilot wikis (T405342) (duration: 11m 22s)
  • 06:31 gehel: restarting blazegraph on wdqs-main@codfw
  • 06:28 kharlan@deploy1003: kharlan: Continuing with sync
  • 06:26 kharlan@deploy1003: kharlan: Backport for CheckUser/UserInfoCard: Phase 2 enable by default on pilot wikis (T405342) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:21 kharlan@deploy1003: Started scap sync-world: Backport for CheckUser/UserInfoCard: Phase 2 enable by default on pilot wikis (T405342)
  • 05:30 kart_: staging: Updated cxserver to 2025-09-25-051716-production (T394982)
  • 05:27 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:26 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:03 eileen: civicrm upgraded from c1a08ee7 to a53165c1
  • 04:56 eileen: civicrm upgraded from 67156d98 to c1a08ee7
  • 03:35 eileen: civicrm upgraded from c9211920 to 67156d98
  • 02:18 eileen: civicrm upgraded from 6ce95d83 to c9211920
  • 01:33 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:31 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 01:29 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1210.eqiad.wmnet with OS bullseye
  • 01:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1210.eqiad.wmnet with OS bullseye
  • 01:20 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1210.eqiad.wmnet with OS bullseye
  • 01:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1210.eqiad.wmnet with OS bullseye
  • 01:15 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1210.eqiad.wmnet with OS bullseye
  • 01:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1210.eqiad.wmnet with OS bullseye
  • 01:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1231.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 01:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1229.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 01:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1230.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 01:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1232.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 01:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1216.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:49 eileen: civicrm upgraded from cf8bdf15 to 6ce95d83
  • 00:48 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1216.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1228.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1227.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:45 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1231.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1232.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1230.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1229.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1226.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1225.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1223.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1224.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1228.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1222.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1227.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:19 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1221.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:17 ebernhardson@deploy1003: Finished scap sync-world: Backport for Revert "cirrus: Send more_like traffic to eqiad" (T405394) (duration: 09m 48s)
  • 00:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1226.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1225.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1223.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1218.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1224.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1219.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:12 ebernhardson@deploy1003: ebernhardson: Continuing with sync
  • 00:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1215.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1217.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:12 ebernhardson@deploy1003: ebernhardson: Backport for Revert "cirrus: Send more_like traffic to eqiad" (T405394) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1220.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:07 ebernhardson@deploy1003: Started scap sync-world: Backport for Revert "cirrus: Send more_like traffic to eqiad" (T405394)

2025-09-24

  • 23:53 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1222.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:53 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1221.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:52 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1232
  • 23:52 eileen: civicrm upgraded from 7acd20cd to 56935a5b
  • 23:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1216.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:51 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1232
  • 23:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1216.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:51 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1231
  • 23:50 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1216.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:49 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1231
  • 23:48 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1218.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:47 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1219.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:47 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1218.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:47 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1218.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1217.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1220.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:45 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1217.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:45 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1217.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:45 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1216.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1214.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1215.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:43 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1230
  • 23:42 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1230
  • 23:42 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1229
  • 23:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1211.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1209.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1213.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:41 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1229
  • 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1228
  • 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1212.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1210.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:39 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1228
  • 23:39 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1227
  • 23:38 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1227
  • 23:38 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1226
  • 23:37 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1226
  • 23:37 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1225
  • 23:36 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1225
  • 23:36 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1224
  • 23:34 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1224
  • 23:34 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1223
  • 23:33 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1223
  • 23:33 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1222
  • 23:32 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1222
  • 23:32 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1221
  • 23:31 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1221
  • 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1220
  • 23:30 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1220
  • 23:29 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1219
  • 23:28 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1219
  • 23:28 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1218
  • 23:27 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1218
  • 23:27 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1217
  • 23:25 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1217
  • 23:25 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1216
  • 23:24 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1216
  • 23:24 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1215
  • 23:21 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1215
  • 23:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1214.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:16 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1214.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1214.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:15 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1214.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1213.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1214.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1210.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1214.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1213.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1214.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1213.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1212.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:12 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1210.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1210.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1210.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1211.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1210.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1209.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:08 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:08 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns an-worker - jclark@cumin1002"
  • 23:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns an-worker - jclark@cumin1002"
  • 23:03 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 22:53 krinkle@deploy1003: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on Wikivoyage and Wikiversity (group1) (T403510) (duration: 11m 53s)
  • 22:48 krinkle@deploy1003: krinkle: Continuing with sync
  • 22:48 krinkle@deploy1003: krinkle: Backport for Disable wmgUseMdotRouting on Wikivoyage and Wikiversity (group1) (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:41 krinkle@deploy1003: Started scap sync-world: Backport for Disable wmgUseMdotRouting on Wikivoyage and Wikiversity (group1) (T403510)
  • 21:39 eileen: civicrm upgraded from 9e1ffed5 to 7acd20cd
  • 21:39 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:39 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:38 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:38 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:37 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:37 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:31 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:30 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:30 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:28 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:28 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:27 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:23 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:23 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:22 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:22 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:21 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:21 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:17 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:17 jhathaway: uploaded spicerack_11.8.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
  • 21:17 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:16 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:16 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:15 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:14 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:10 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:09 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:09 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:08 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:07 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:06 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:05 brett@dns1004: END - running authdns-update
  • 21:04 brett@dns1004: START - running authdns-update
  • 20:56 mutante: phab1004/phab2002 - unlink /srv/phab/phabricator/bin/move_project (broken symbolic link to ../scripts/move_beneath.php) T342275
  • 20:42 ejegg: fundraising civicrm upgraded from 5e5445a9 to 9e1ffed5
  • 20:37 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2009.codfw.wmnet with OS bookworm
  • 20:36 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm
  • 20:31 mfossati@deploy1003: Finished scap sync-world: Backport for Fix typo in ReaderExperiments' ImageBrowsing stream configuration (T403259) (duration: 12m 40s)
  • 20:26 mfossati@deploy1003: mfossati: Continuing with sync
  • 20:25 mfossati@deploy1003: mfossati: Backport for Fix typo in ReaderExperiments' ImageBrowsing stream configuration (T403259) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:18 mfossati@deploy1003: Started scap sync-world: Backport for Fix typo in ReaderExperiments' ImageBrowsing stream configuration (T403259)
  • 20:18 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2009.mgmt.codfw.wmnet on all recursors
  • 20:18 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache sretest2009.mgmt.codfw.wmnet on all recursors
  • 20:17 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:17 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add back entry for sretest2009 - cmooney@cumin1003"
  • 20:17 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add back entry for sretest2009 - cmooney@cumin1003"
  • 20:16 esanders@deploy1003: Finished scap sync-world: Backport for DbFactory: Use primary DB when running maintenance scripts (T405080), DbFactory: Use primary DB when running maintenance scripts (T405080) (duration: 11m 45s)
  • 20:12 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 20:11 esanders@deploy1003: esanders: Continuing with sync
  • 20:10 esanders@deploy1003: esanders: Backport for DbFactory: Use primary DB when running maintenance scripts (T405080), DbFactory: Use primary DB when running maintenance scripts (T405080) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:07 denisse: Remove kibana.discovery.wmnet from Puppet CA - T364622
  • 20:04 esanders@deploy1003: Started scap sync-world: Backport for DbFactory: Use primary DB when running maintenance scripts (T405080), DbFactory: Use primary DB when running maintenance scripts (T405080)
  • 19:33 ejegg: fundraising civicrm upgraded from 0081bc3f to 5e5445a9
  • 19:16 ladsgroup@dns1004: END - running authdns-update
  • 19:15 ladsgroup@dns1004: START - running authdns-update
  • 18:20 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.20 refs T396381
  • 18:07 brennen: 1.45.0-wmf.20 train status (T396381): logs ok, no current blockers. rolling to group1.
  • 17:01 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:41 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:57 ejegg: fundraising civicrm upgraded from 1c973d84 to 0081bc3f
  • 15:55 jasmine@deploy1003: Finished scap sync-world: Backport for debug.json: order codfw (primary) DC backends first (T399891) (duration: 13m 49s)
  • 15:52 sukhe@dns1004: END - running authdns-update
  • 15:51 sukhe@dns1004: START - running authdns-update
  • 15:50 jasmine@deploy1003: jasmine: Continuing with sync
  • 15:47 jasmine@deploy1003: jasmine: Backport for debug.json: order codfw (primary) DC backends first (T399891) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:41 jasmine@deploy1003: Started scap sync-world: Backport for debug.json: order codfw (primary) DC backends first (T399891)
  • 15:40 jasmine@deploy1003: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover - T399891 (duration: 64m 41s)
  • 15:35 jasmine@dns1004: END - running authdns-update
  • 15:33 jasmine@dns1004: START - running authdns-update
  • 15:29 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0) for datacenter switchover from eqiad to codfw
  • 15:20 sukhe@dns1004: END - running authdns-update
  • 15:18 sukhe@dns1004: START - running authdns-update
  • 15:17 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters for datacenter switchover from eqiad to codfw
  • 15:17 jasmine@dns1004: END - running authdns-update
  • 15:16 jasmine_: Phase 9: Update DNS records for new database masters
  • 15:15 jasmine@dns1004: START - running authdns-update
  • 15:14 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:14 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:14 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 15:13 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 15:13 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:12 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:12 cmooney@dns2005: END - running authdns-update
  • 15:11 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0) for datacenter switchover from eqiad to codfw
  • 15:11 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl for datacenter switchover from eqiad to codfw
  • 15:11 cmooney@dns2005: START - running authdns-update
  • 15:09 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0) for datacenter switchover from eqiad to codfw
  • 15:09 cmooney@dns2005: START - running authdns-update
  • 15:09 root@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 15:08 root@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 15:07 root@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 15:07 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:07 root@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 15:07 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance for datacenter switchover from eqiad to codfw
  • 15:06 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-mw-jobrunner (exit_code=0) for datacenter switchover from eqiad to codfw
  • 15:06 root@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: sync
  • 15:06 root@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: sync
  • 15:06 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.08-restart-mw-jobrunner for datacenter switchover from eqiad to codfw
  • 15:05 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0) for datacenter switchover from eqiad to codfw
  • 15:05 jasmine@cumin1003: MediaWiki read-only period ends at: 2025-09-24 15:05:16.845948
  • 15:02 jasmine@cumin1003: MediaWiki read-only period starts at: 2025-09-24 15:02:35.395589
  • 15:02 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.02-set-readonly for datacenter switchover from eqiad to codfw
  • 15:01 cmooney@dns2005: START - running authdns-update
  • 15:01 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0) for datacenter switchover from eqiad to codfw
  • 14:59 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:59 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from eqiad to codfw
  • 14:58 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1006.eqiad.wmnet with OS bookworm
  • 14:58 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0) for datacenter switchover from eqiad to codfw
  • 14:52 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl for datacenter switchover from eqiad to codfw
  • 14:51 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0) for datacenter switchover from eqiad to codfw
  • 14:51 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks for datacenter switchover from eqiad to codfw
  • 14:47 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:38 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1006.eqiad.wmnet with reason: host reimage
  • 14:35 jasmine@deploy1003: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover - T399891
  • 14:34 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1006.eqiad.wmnet with reason: host reimage
  • 14:13 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1006.eqiad.wmnet with OS bookworm
  • 14:04 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:04 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 14:04 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 14:04 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1005.eqiad.wmnet with OS bookworm
  • 14:03 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 14:03 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:03 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis tokwiki in section s5
  • 14:02 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 13:56 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis tokwiki in section s5
  • 13:49 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Checking sanitization for wikis tokwiki in section s5
  • 13:48 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki in section s5
  • 13:48 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis tokwiki in section s5
  • 13:46 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis tokwiki in section s5
  • 13:46 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephmon1005.eqiad.wmnet with reason: host reimage
  • 13:46 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1005.eqiad.wmnet with reason: host reimage
  • 13:45 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:45 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate new snippet files for reverse range for 2620:0:861:fe17::/64 - cmooney@cumin1003"
  • 13:41 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate new snippet files for reverse range for 2620:0:861:fe17::/64 - cmooney@cumin1003"
  • 13:38 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 13:37 Emperor: update envoyproxy to 1.29.12 on apus rgw nodes T405469
  • 13:37 cmooney@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 13:37 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 13:27 Emperor: update envoyproxy to 1.29.12 on thanos-fe nodes T405469
  • 13:27 moritzm: upgrade Envoy on puppet servers T403663
  • 13:25 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bookworm
  • 13:22 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:22 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns - cmooney@cumin1003"
  • 13:22 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis tokwiki in section s5
  • 13:22 cmooney@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 13:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns - cmooney@cumin1003"
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2047.codfw.wmnet
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2047.codfw.wmnet
  • 13:14 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki in section s5
  • 13:14 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 13:12 cmooney@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 13:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2047.codfw.wmnet
  • 13:12 cmooney@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 13:12 cmooney@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 13:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2047.codfw.wmnet
  • 13:07 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2046.codfw.wmnet
  • 13:06 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis tokwiki in section s5
  • 13:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2046.codfw.wmnet
  • 13:05 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2046.codfw.wmnet
  • 13:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2046.codfw.wmnet
  • 13:03 mfossati@deploy1003: Finished scap sync-world: Backport for ReaderExperiments' ImageBrowsing stream configuration (T403259) (duration: 23m 53s)
  • 13:03 Emperor: update envoyproxy to 1.29.12 on ms-fe nodes T405469
  • 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2045.codfw.wmnet
  • 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2045.codfw.wmnet
  • 12:58 mfossati@deploy1003: mfossati: Continuing with sync
  • 12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2045.codfw.wmnet
  • 12:54 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki in section s5
  • 12:45 mfossati@deploy1003: mfossati: Backport for ReaderExperiments' ImageBrowsing stream configuration (T403259) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2045.codfw.wmnet
  • 12:39 mfossati@deploy1003: Started scap sync-world: Backport for ReaderExperiments' ImageBrowsing stream configuration (T403259)
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2044.codfw.wmnet
  • 12:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2044.codfw.wmnet
  • 12:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2044.codfw.wmnet
  • 12:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2044.codfw.wmnet
  • 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2043.codfw.wmnet
  • 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2043.codfw.wmnet
  • 12:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2043.codfw.wmnet
  • 12:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2043.codfw.wmnet
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2042.codfw.wmnet
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet
  • 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet
  • 12:05 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 12:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2042.codfw.wmnet
  • 12:04 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 12:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2041.codfw.wmnet
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet
  • 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet
  • 11:56 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis tokwiki in section s5
  • 11:54 stevemunene@dns1004: END - running authdns-update
  • 11:53 stevemunene@dns1004: START - running authdns-update
  • 11:51 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:51 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:50 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:49 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:49 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:48 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2041.codfw.wmnet
  • 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2040.codfw.wmnet
  • 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2040.codfw.wmnet
  • 11:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2040.codfw.wmnet
  • 11:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2040.codfw.wmnet
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2039.codfw.wmnet
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2039.codfw.wmnet
  • 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2039.codfw.wmnet
  • 11:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2039.codfw.wmnet
  • 11:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2038.codfw.wmnet
  • 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2038.codfw.wmnet
  • 11:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2038.codfw.wmnet
  • 11:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2038.codfw.wmnet
  • 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2037.codfw.wmnet
  • 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2037.codfw.wmnet
  • 11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2037.codfw.wmnet
  • 10:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2037.codfw.wmnet
  • 10:23 claime: Upgraded envoy to v1.29.12 on api-gateway and rest-gateway - T403663
  • 10:21 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:20 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:17 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:17 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:14 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:14 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:12 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 10:12 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 10:11 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2036.codfw.wmnet
  • 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
  • 10:10 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 10:10 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 10:09 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 10:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
  • 10:02 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 10:01 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 10:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2036.codfw.wmnet
  • 10:00 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 10:00 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 09:58 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 09:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2035.codfw.wmnet
  • 09:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
  • 09:57 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 09:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
  • 09:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2035.codfw.wmnet
  • 09:33 moritzm: upgrading perf on bookworm nodes to 6.1.153
  • 09:19 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 09:18 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 09:18 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:15 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1013.eqiad.wmnet with OS trixie
  • 09:10 moritzm: installing qemu security updates
  • 09:06 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
  • 09:00 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 08:59 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1095
  • 08:59 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1095
  • 08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
  • 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
  • 08:54 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1013.eqiad.wmnet with reason: host reimage
  • 08:46 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1013.eqiad.wmnet with reason: host reimage
  • 08:41 moritzm: failover Ganeti master in esams to ganeti3005
  • 08:40 moritzm: failover Ganeti master in magru to ganeti3005
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3008.esams.wmnet
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
  • 08:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1013.eqiad.wmnet with OS trixie
  • 08:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
  • 08:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3008.esams.wmnet
  • 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3007.esams.wmnet
  • 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3007.esams.wmnet
  • 08:14 mvernon@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1066.eqiad.wmnet with reason: vacuum
  • 08:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
  • 08:13 Emperor: VACUUM large container dbs on ms-be1066 T377827
  • 08:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
  • 07:58 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki in section s5
  • 07:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
  • 07:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3005.esams.wmnet
  • 07:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
  • 07:31 mlitn@deploy1003: Finished scap sync-world: Backport for Add MediaSearch custommatch:linked_from keyword (T403613) (duration: 13m 04s)
  • 07:26 mlitn@deploy1003: mlitn: Continuing with sync
  • 07:25 mlitn@deploy1003: mlitn: Backport for Add MediaSearch custommatch:linked_from keyword (T403613) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:18 mlitn@deploy1003: Started scap sync-world: Backport for Add MediaSearch custommatch:linked_from keyword (T403613)
  • 07:11 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Enable account creation trial on phase 2 wikis (T402366) (duration: 19m 26s)
  • 07:06 kharlan@deploy1003: kharlan: Continuing with sync
  • 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7002.magru.wmnet
  • 07:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
  • 06:58 kharlan@deploy1003: kharlan: Backport for hCaptcha: Enable account creation trial on phase 2 wikis (T402366) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:52 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Enable account creation trial on phase 2 wikis (T402366)
  • 06:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
  • 06:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7002.magru.wmnet
  • 05:35 eileen: * civicrm upgraded from 8cdce9e0 to 1c973d84
  • 01:57 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1049.eqiad.wmnet with OS bookworm
  • 01:40 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 01:36 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 01:17 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1049.eqiad.wmnet with OS bookworm
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 58s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:58 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 00:44 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 00:40 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage

2025-09-23

  • 23:55 krinkle@deploy1003: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on Wikibooks and Wikiquote (group1) (T403510) (duration: 16m 39s)
  • 23:49 krinkle@deploy1003: krinkle: Continuing with sync
  • 23:47 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 23:47 krinkle@deploy1003: krinkle: Backport for Disable wmgUseMdotRouting on Wikibooks and Wikiquote (group1) (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:45 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 23:39 krinkle@deploy1003: Started scap sync-world: Backport for Disable wmgUseMdotRouting on Wikibooks and Wikiquote (group1) (T403510)
  • 23:27 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1047.eqiad.wmnet with reason: host reimage
  • 23:24 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1047.eqiad.wmnet with reason: host reimage
  • 23:04 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 22:03 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1046.eqiad.wmnet with OS bookworm
  • 22:00 jgleeson: civicrm upgraded from 4304c138 to 8cdce9e0
  • 21:44 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1046.eqiad.wmnet with reason: host reimage
  • 21:38 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1046.eqiad.wmnet with reason: host reimage
  • 21:18 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1046.eqiad.wmnet with OS bookworm
  • 21:17 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1045.eqiad.wmnet with OS bookworm
  • 21:11 tgr_: UTC late deploys done
  • 21:10 tgr@deploy1003: Finished scap sync-world: Backport for session: Fix date handling for JWT cookies (T399243 T399200), session: Fix date handling for JWT cookies (T399243 T399200) (duration: 41m 51s)
  • 20:59 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1045.eqiad.wmnet with reason: host reimage
  • 20:57 tgr@deploy1003: tgr: Continuing with sync
  • 20:55 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1045.eqiad.wmnet with reason: host reimage
  • 20:55 tgr@deploy1003: tgr: Backport for session: Fix date handling for JWT cookies (T399243 T399200), session: Fix date handling for JWT cookies (T399243 T399200) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:35 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1045.eqiad.wmnet with OS bookworm
  • 20:29 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1040.eqiad.wmnet with OS bookworm
  • 20:28 tgr@deploy1003: Started scap sync-world: Backport for session: Fix date handling for JWT cookies (T399243 T399200), session: Fix date handling for JWT cookies (T399243 T399200)
  • 20:09 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
  • 20:06 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
  • 19:50 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.20 refs T396381
  • 19:44 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bookworm
  • 19:42 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1039.eqiad.wmnet with OS bookworm
  • 19:35 logmsgbot: brennen Deployed security patch for T405112
  • 19:24 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
  • 19:19 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
  • 19:19 jgleeson: payments-wiki upgraded from 3e13fadf to bf2864e9
  • 19:18 logmsgbot: brennen Deployed security patch for T405112
  • 19:07 brennen@deploy1003: Finished scap sync-world: Backport for Revert "User: Reduce locking severity of ::getInstanceForUpdate()" (duration: 11m 36s)
  • 19:02 brennen@deploy1003: brennen, tgr: Continuing with sync
  • 19:01 brennen@deploy1003: brennen, tgr: Backport for Revert "User: Reduce locking severity of ::getInstanceForUpdate()" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:57 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bookworm
  • 18:55 brennen@deploy1003: Started scap sync-world: Backport for Revert "User: Reduce locking severity of ::getInstanceForUpdate()"
  • 18:53 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1038.eqiad.wmnet with OS bookworm
  • 18:38 ebernhardson@deploy1003: Finished scap sync-world: Backport for cirrus: Send more_like traffic to eqiad (T405394) (duration: 10m 34s)
  • 18:34 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 18:33 ebernhardson@deploy1003: ebernhardson: Continuing with sync
  • 18:32 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
  • 18:32 ebernhardson@deploy1003: ebernhardson: Backport for cirrus: Send more_like traffic to eqiad (T405394) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:28 jgleeson: SmashPig upgraded from f805ba74 to 96afe81c
  • 18:28 ebernhardson@deploy1003: Started scap sync-world: Backport for cirrus: Send more_like traffic to eqiad (T405394)
  • 18:27 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
  • 18:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 18:08 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
  • 18:08 bking@cumin1002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
  • 18:05 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1038.eqiad.wmnet with OS bookworm
  • 18:05 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet
  • 18:03 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
  • 17:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1037.eqiad.wmnet with OS bookworm
  • 17:24 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
  • 17:20 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
  • 17:18 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 17:18 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 16:57 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bookworm
  • 16:56 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1036.eqiad.wmnet with OS bookworm
  • 16:40 denisse: Upgrade Envoy to v1.29.12 on titan hosts - T403663
  • 16:39 denisse: Upgrade Envoy to v1.29.12 on prometheus::pop hosts - T403663
  • 16:37 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 16:37 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1025.eqiad.wmnet with OS bookworm
  • 16:37 denisse: Upgrade Envoy to v1.29.12 on prometheus hosts - T403663
  • 16:32 denisse: Upgrade Envoy to v1.29.12 on graphite hosts - T403663
  • 16:31 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in eqiad: Moving services to codfw, Southward DC Switchover Day 1 - T399891
  • 16:31 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 16:25 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch2093.codfw.wmnet for thread pool rejections - bking@cumin1002 - T399891
  • 16:25 bking@cumin1002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch2093.codfw.wmnet for thread pool rejections - bking@cumin1002 - T399891
  • 16:22 denisse: Upgrade Envoy to v1.29.12 on logstash hosts - T403663
  • 16:20 denisse: Upgrade Envoy to v1.29.12 on grafana hosts - T403663
  • 16:19 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
  • 16:15 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
  • 16:09 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bookworm
  • 16:03 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter depool all services in eqiad: Moving services to codfw, Southward DC Switchover Day 1 - T399891
  • 16:03 stevemunene@cumin1003: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 16:03 jasmine@cumin1003: END (ERROR) - Cookbook sre.discovery.datacenter (exit_code=93) depool all services in eqiad: Moving services to codfw, Southward DC Switchover Day 1 - T399891
  • 15:57 stevemunene@cumin1003: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 15:56 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bookworm
  • 15:55 andrew@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudcephosd1025 is no longer failed, I think - andrew@cumin2002"
  • 15:52 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter depool all services in eqiad: Moving services to codfw, Southward DC Switchover Day 1 - T399891
  • 15:51 stevemunene@cumin1003: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 15:49 andrew@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudcephosd1025 is no longer failed, I think - andrew@cumin2002"
  • 15:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1044.eqiad.wmnet with OS bookworm
  • 15:30 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad [reason: Moving traffic to codfw, Southward DC Switchover Day 1, T399891]
  • 15:30 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool site eqiad [reason: Moving traffic to codfw, Southward DC Switchover Day 1, T399891]
  • 15:28 Emperor: restart swift-proxy ms-fe1010 ms-fe2010 ms-fe2011 ms-fe2015 T360913
  • 15:25 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 15:25 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 15:24 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 15:24 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 15:23 swfrench-wmf: upsizing mw-web in advance of services switchover - T399891
  • 15:22 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 15:21 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 15:18 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 15:17 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 15:09 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1044.eqiad.wmnet with reason: host reimage
  • 15:06 btullis@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-launcher1003.eqiad.wmnet
  • 15:05 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-launcher1003.eqiad.wmnet with OS bookworm
  • 15:02 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1044.eqiad.wmnet with reason: host reimage
  • 14:56 moritzm: failover Ganeti master in magru to ganeti7001
  • 14:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-launcher1003.eqiad.wmnet with reason: host reimage
  • 14:42 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1044.eqiad.wmnet with OS bookworm
  • 14:42 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-launcher1003.eqiad.wmnet with reason: host reimage
  • 14:39 stevemunene@cumin1003: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:36 stevemunene: restart pybal.service on lvs1020 to pickup new druid hosts druid101[2-3] T397441
  • 14:34 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: service=(druid-public-broker),name=druid1012.eqiad.wmnet
  • 14:33 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: service=(druid-public-broker),name=druid1013.eqiad.wmnet
  • 14:32 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1041.eqiad.wmnet with OS bookworm
  • 14:26 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-launcher1003.eqiad.wmnet with OS bookworm
  • 14:25 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM an-launcher1003.eqiad.wmnet - btullis@cumin1003"
  • 14:25 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM an-launcher1003.eqiad.wmnet - btullis@cumin1003"
  • 14:24 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-launcher1003.eqiad.wmnet on all recursors
  • 14:24 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-launcher1003.eqiad.wmnet on all recursors
  • 14:24 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:24 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-launcher1003.eqiad.wmnet - btullis@cumin1003"
  • 14:23 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-launcher1003.eqiad.wmnet - btullis@cumin1003"
  • 14:22 Lucas_WMDE: UTC afternoon backport+config window (special xLab collab edition) done
  • 14:22 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for lib: Update lib/metrics-platform to f1a18553 (T385180), lib: Update metrics-platform to fc7678c10a1f (T401380), ext.xLab: Add mw.xLab.getInstrument() (T401380 T404851) (duration: 18m 54s)
  • 14:16 lucaswerkmeister-wmde@deploy1003: phuedx, lucaswerkmeister-wmde: Continuing with sync
  • 14:13 ejegg: civicrm upgraded from 6d97bf23 to 4304c138
  • 14:09 lucaswerkmeister-wmde@deploy1003: phuedx, lucaswerkmeister-wmde: Backport for lib: Update lib/metrics-platform to f1a18553 (T385180), lib: Update metrics-platform to fc7678c10a1f (T401380), ext.xLab: Add mw.xLab.getInstrument() (T401380 T404851) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:08 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:08 btullis@cumin1003: START - Cookbook sre.ganeti.makevm for new host an-launcher1003.eqiad.wmnet
  • 14:08 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1025.eqiad.wmnet with OS bookworm
  • 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet
  • 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet
  • 14:03 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for lib: Update lib/metrics-platform to f1a18553 (T385180), lib: Update metrics-platform to fc7678c10a1f (T401380), ext.xLab: Add mw.xLab.getInstrument() (T401380 T404851)
  • 13:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet
  • 13:57 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
  • 13:52 mvernon@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ms-be[1086-1088].eqiad.wmnet with reason: awaiting controller swap
  • 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet
  • 13:52 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
  • 13:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet
  • 13:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet
  • 13:42 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [arbcom_plwiki] Add an icon (T391009), CheckUser/UserInfoCard: Phase 1 enable by default on pilot wikis (T405342) (duration: 13m 16s)
  • 13:41 kart_: Updated Recommendation API to 2025-09-23-124706-production (T405004, T404976)
  • 13:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet
  • 13:37 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bookworm
  • 13:36 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, superpes, kharlan: Continuing with sync
  • 13:35 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, superpes, kharlan: Backport for [arbcom_plwiki] Add an icon (T391009), CheckUser/UserInfoCard: Phase 1 enable by default on pilot wikis (T405342) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet
  • 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet
  • 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
  • 13:31 kartik@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:29 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bookworm
  • 13:28 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [arbcom_plwiki] Add an icon (T391009), CheckUser/UserInfoCard: Phase 1 enable by default on pilot wikis (T405342)
  • 13:25 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host durum4001.ulsfo.wmnet
  • 13:24 kartik@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:24 tchanders@deploy1003: Finished scap sync-world: Backport for Increase the number of shards used for temp user name generation (T404131), Enable temporary accounts on itwiki (T405195) (duration: 14m 19s)
  • 13:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
  • 13:21 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host durum4001.ulsfo.wmnet
  • 13:20 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host durum2001.codfw.wmnet
  • 13:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet
  • 13:19 tchanders@deploy1003: tchanders: Continuing with sync
  • 13:17 kartik@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:16 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host durum2001.codfw.wmnet
  • 13:16 tchanders@deploy1003: tchanders: Backport for Increase the number of shards used for temp user name generation (T404131), Enable temporary accounts on itwiki (T405195) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:13 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host durum1002.eqiad.wmnet
  • 13:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bookworm
  • 13:09 tchanders@deploy1003: Started scap sync-world: Backport for Increase the number of shards used for temp user name generation (T404131), Enable temporary accounts on itwiki (T405195)
  • 13:09 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host durum1002.eqiad.wmnet
  • 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2034.codfw.wmnet
  • 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2034.codfw.wmnet
  • 12:55 claime: Enabling puppet on all cp nodes - T400131
  • 12:50 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:50 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:49 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2034.codfw.wmnet
  • 12:48 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:42 claime: sudo run-puppet-agent -e 'Deploying multi-dc.lua changes - T400131 - cgoubert'
  • 12:35 claime: Depooling cp1110.eqiad.wmnet for testing - T400131
  • 12:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for Deprecate User::getInstanceForUpdate() (T405231) (duration: 16m 54s)
  • 12:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
  • 12:28 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 12:27 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:26 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:23 claime: Repooling cp2041.codfw.wmnet for live traffic testing - T400131
  • 12:22 ladsgroup@deploy1003: ladsgroup: Backport for Deprecate User::getInstanceForUpdate() (T405231) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:17 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 12:17 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 12:17 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 12:16 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 12:16 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:16 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:16 ladsgroup@deploy1003: Started scap sync-world: Backport for Deprecate User::getInstanceForUpdate() (T405231)
  • 12:07 moritzm: failover Ganeti master in codfw02 to ganeti2033
  • 12:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2033.codfw.wmnet
  • 12:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2033.codfw.wmnet
  • 11:59 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2050 gradually with 4 steps - Pooling in new host
  • 11:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2033.codfw.wmnet
  • 11:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2033.codfw.wmnet
  • 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 11:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 11:25 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.207.71 # T405278
  • 11:24 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.207.70 # T405278
  • 11:24 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.207.69 # T405278
  • 11:24 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.207.68 # T405278
  • 11:24 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.207.67 # T405278
  • 11:24 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.207.66 # T405278
  • 11:24 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.207.65 # T405278
  • 11:24 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.207.64 # T405278
  • 11:23 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.233.71 # T405278
  • 11:23 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.233.70 # T405278
  • 11:23 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.233.69 # T405278
  • 11:23 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.233.68 # T405278
  • 11:23 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.233.67 # T405278
  • 11:22 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.233.66 # T405278
  • 11:22 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.233.65 # T405278
  • 11:22 zabe@deploy1003: mwscript-k8s job started: resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 194.61.233.64 # T405278
  • 11:14 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2050 gradually with 4 steps - Pooling in new host
  • 11:03 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:03 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:03 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:02 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:02 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:01 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:57 fceratto@cumin1002: dbctl commit (dc=all): 'Add es2050 T402859', diff saved to https://phabricator.wikimedia.org/P83452 and previous config saved to /var/cache/conftool/dbconfig/20250923-105727-fceratto.json
  • 10:52 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:50 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) es2050 gradually with 4 steps - Pooling in new host
  • 10:50 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2050 gradually with 4 steps - Pooling in new host
  • 10:48 moritzm: installing Linux 6.1.153 on Bookworm hosts
  • 10:47 zabe: zabe@deploy1003:~$ mwscript createAndPromote.php --wiki=arbcom_plwiki --bureaucrat --sysop --reason="T391009" Msz2001 REDACTED
  • 10:42 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:40 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:40 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:28 claime: cgoubert@cp2041:~$ sudo run-puppet-agent -e 'Deploying multi-dc.lua changes - T400131 - cgoubert'
  • 10:18 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) es2050 gradually with 4 steps - Pooling in new host
  • 10:18 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2050 gradually with 4 steps - Pooling in new host
  • 10:16 claime: Depooling cp2041.codfw.wmnet for testing - T400131
  • 10:14 claime: Disabling puppet on all cp nodes - T400131
  • 09:40 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis tokwiki in section s5
  • 08:54 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Backport temporary account message translations to Italian (T405195), Backport temporary account message translations for Italian (T405195), Backport temporary account message translations for Italian (T405195), Backport temporary account message translations to Italian (T405195) (d
  • 08:49 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki in section s5
  • 08:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 08:47 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis tokwiki in section s5
  • 08:43 dreamyjazz@deploy1003: dreamyjazz: Backport for Backport temporary account message translations to Italian (T405195), Backport temporary account message translations for Italian (T405195), Backport temporary account message translations for Italian (T405195), Backport temporary account message translations to Italian (T405195) synced to the te
  • 08:36 dreamyjazz@deploy1003: Started scap sync-world: Backport for Backport temporary account message translations to Italian (T405195), Backport temporary account message translations for Italian (T405195), Backport temporary account message translations for Italian (T405195), Backport temporary account message translations to Italian (T405195)
  • 08:32 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis tokwiki in section s5
  • 08:17 dcausse@deploy1003: Finished scap sync-world: Backport for Update interwiki map (T391009) (duration: 15m 49s)
  • 08:11 dcausse@deploy1003: dcausse: Continuing with sync
  • 08:07 dcausse@deploy1003: dcausse: Backport for Update interwiki map (T391009) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:01 dcausse@deploy1003: Started scap sync-world: Backport for Update interwiki map (T391009)
  • 07:54 moritzm: installing unbound security updates
  • 07:52 dcausse: T391009: running scap update-interwiki-cache
  • 07:51 dcausse@deploy1003: Finished scap sync-world: Backport for Activate arbcom_plwiki (T391009) (duration: 18m 20s)
  • 07:43 dcausse@deploy1003: superpes, dcausse: Continuing with sync
  • 07:39 dcausse@deploy1003: superpes, dcausse: Backport for Activate arbcom_plwiki (T391009) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:36 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:36 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 07:34 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 07:34 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 07:33 dcausse@deploy1003: Started scap sync-world: Backport for Activate arbcom_plwiki (T391009)
  • 07:29 dcausse: T391009: running extensions/WikimediaMaintenance/addWiki.php --wiki=arbcom_plwiki
  • 07:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1003.eqiad.wmnet
  • 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1003.eqiad.wmnet
  • 07:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2003.codfw.wmnet
  • 07:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2003.codfw.wmnet
  • 06:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2004.codfw.wmnet
  • 06:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2004.codfw.wmnet
  • 06:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1005.eqiad.wmnet
  • 06:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1005.eqiad.wmnet
  • 06:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
  • 06:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
  • 04:51 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 04:51 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 04:27 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1034.eqiad.wmnet with OS bookworm
  • 04:09 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1034.eqiad.wmnet with reason: host reimage
  • 04:06 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1034.eqiad.wmnet with reason: host reimage
  • 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.17 (duration: 01m 16s)
  • 03:47 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.20 refs T396381 (duration: 43m 55s)
  • 03:47 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bookworm
  • 03:45 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1033.eqiad.wmnet with OS bookworm
  • 03:29 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
  • 03:25 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
  • 03:05 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bookworm
  • 03:05 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1032.eqiad.wmnet with OS bookworm
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.20 refs T396381
  • 02:48 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1032.eqiad.wmnet with reason: host reimage
  • 02:47 ejegg: fundraising civicrm upgraded from e9ae563f to 6d97bf23
  • 02:42 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1032.eqiad.wmnet with reason: host reimage
  • 02:23 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS bookworm
  • 02:22 ejegg: fundraising civicrm upgraded from 2c830cae to e9ae563f
  • 02:21 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1031.eqiad.wmnet with OS bookworm
  • 02:19 krinkle@deploy1003: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on wikinews wikis (group1) (T403510) (duration: 49m 12s)
  • 02:14 krinkle@deploy1003: krinkle: Continuing with sync
  • 02:03 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
  • 01:59 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
  • 01:41 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bookworm
  • 01:40 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1031.eqiad.wmnet with OS bookworm
  • 01:37 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bookworm
  • 01:37 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1031.eqiad.wmnet with OS bookworm
  • 01:36 krinkle@deploy1003: krinkle: Backport for Disable wmgUseMdotRouting on wikinews wikis (group1) (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:30 krinkle@deploy1003: Started scap sync-world: Backport for Disable wmgUseMdotRouting on wikinews wikis (group1) (T403510)
  • 01:19 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bookworm
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 30s)
  • 01:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:37 rzl@deploy1003: Finished scap sync-world: https://gerrit.wikimedia.org/r/1190363 T403663 (duration: 05m 12s)
  • 00:33 rzl@deploy1003: Started scap sync-world: https://gerrit.wikimedia.org/r/1190363 T403663
  • 00:26 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 00:22 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 00:18 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 00:17 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1030.eqiad.wmnet with OS bookworm
  • 00:14 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 00:12 rzl@deploy1003: helmfile [staging-codfw] DONE helmfile.d/services/mw-debug: apply
  • 00:12 rzl@deploy1003: helmfile [staging-codfw] START helmfile.d/services/mw-debug: apply
  • 00:11 rzl@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
  • 00:11 rzl@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply

2025-09-22

  • 23:58 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
  • 23:54 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
  • 23:47 krinkle@deploy1003: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on fawiki and metawiki (T403510) (duration: 38m 31s)
  • 23:34 krinkle@deploy1003: krinkle: Continuing with sync
  • 23:33 krinkle@deploy1003: krinkle: Backport for Disable wmgUseMdotRouting on fawiki and metawiki (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:09 krinkle@deploy1003: Started scap sync-world: Backport for Disable wmgUseMdotRouting on fawiki and metawiki (T403510)
  • 22:56 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1030.eqiad.wmnet with OS bookworm
  • 22:53 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1029.eqiad.wmnet with OS bookworm
  • 22:51 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2017.codfw.wmnet with OS bullseye
  • 22:38 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1029.eqiad.wmnet with reason: host reimage
  • 22:32 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1029.eqiad.wmnet with reason: host reimage
  • 22:12 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS bookworm
  • 22:11 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1028.eqiad.wmnet with OS bookworm
  • 21:53 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1028.eqiad.wmnet with reason: host reimage
  • 21:48 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1028.eqiad.wmnet with reason: host reimage
  • 21:40 maryum: Deployed security fix for T398706
  • 21:33 maryum: Deployed security fix for T403761
  • 21:31 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
  • 21:29 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS bookworm
  • 21:28 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1027.eqiad.wmnet with OS bookworm
  • 21:09 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
  • 21:02 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
  • 20:54 dani@deploy1003: Finished scap sync-world: Backport for Deploy Newcomers survey on enwiki (T402915), Initial configuration for arbcom_plwiki (T391009), Deploy Parsoid Read Views to 28 Wikipedias (T405016) (duration: 13m 23s)
  • 20:49 dani@deploy1003: arlolra, superpes, dani: Continuing with sync
  • 20:47 dani@deploy1003: arlolra, superpes, dani: Backport for Deploy Newcomers survey on enwiki (T402915), Initial configuration for arbcom_plwiki (T391009), Deploy Parsoid Read Views to 28 Wikipedias (T405016) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:45 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bookworm
  • 20:44 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1027.eqiad.wmnet with OS bookworm
  • 20:41 dani@deploy1003: Started scap sync-world: Backport for Deploy Newcomers survey on enwiki (T402915), Initial configuration for arbcom_plwiki (T391009), Deploy Parsoid Read Views to 28 Wikipedias (T405016)
  • 20:38 sbisson@deploy1003: Finished scap sync-world: Backport for SpecialContribute: configure new page target (T327063) (duration: 11m 33s)
  • 20:34 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bookworm
  • 20:33 sbisson@deploy1003: sbisson: Continuing with sync
  • 20:32 sbisson@deploy1003: sbisson: Backport for SpecialContribute: configure new page target (T327063) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:26 sbisson@deploy1003: Started scap sync-world: Backport for SpecialContribute: configure new page target (T327063)
  • 20:25 sbisson@deploy1003: Finished scap sync-world: Backport for eswiki, commonswiki, wikidata: lift IP cap for edit-a-thon (T405095) (duration: 11m 34s)
  • 20:19 sbisson@deploy1003: sbisson, gergesshamon: Continuing with sync
  • 20:19 sbisson@deploy1003: sbisson, gergesshamon: Backport for eswiki, commonswiki, wikidata: lift IP cap for edit-a-thon (T405095) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:13 sbisson@deploy1003: Started scap sync-world: Backport for eswiki, commonswiki, wikidata: lift IP cap for edit-a-thon (T405095)
  • 20:08 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1026.eqiad.wmnet with OS bookworm
  • 19:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
  • 19:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:40 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
  • 19:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bookworm
  • 19:23 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1026.eqiad.wmnet with OS bookworm
  • 19:23 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1026.eqiad.wmnet with OS bookworm
  • 19:18 kharlan@deploy1003: Finished scap sync-world: Backport for PrefUpdateInstrumentation: Track PSI related preferences (duration: 17m 47s)
  • 19:13 kharlan@deploy1003: kharlan: Continuing with sync
  • 19:06 kharlan@deploy1003: kharlan: Backport for PrefUpdateInstrumentation: Track PSI related preferences synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:01 kharlan@deploy1003: Started scap sync-world: Backport for PrefUpdateInstrumentation: Track PSI related preferences
  • 18:23 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1026.eqiad.wmnet with OS bookworm
  • 18:15 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1024.eqiad.wmnet with OS bookworm
  • 18:00 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bookworm
  • 17:58 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: host reimage
  • 17:55 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bookworm
  • 17:55 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1024.eqiad.wmnet with reason: host reimage
  • 17:38 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1024.eqiad.wmnet with OS bookworm
  • 17:36 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1023.eqiad.wmnet with OS bookworm
  • 17:24 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 17:23 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 17:18 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: host reimage
  • 17:11 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1023.eqiad.wmnet with reason: host reimage
  • 16:54 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1023.eqiad.wmnet with OS bookworm
  • 16:48 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bookworm
  • 16:45 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1020.eqiad.wmnet with OS bookworm
  • 16:43 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bookworm
  • 16:41 andrew@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1025.eqiad.wmnet']
  • 16:32 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1025.eqiad.wmnet']
  • 16:31 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bookworm
  • 16:28 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: host reimage
  • 16:22 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: host reimage
  • 16:12 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bookworm
  • 16:10 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on sretest2001.codfw.wmnet with reason: T383173
  • 16:05 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1020.eqiad.wmnet with OS bookworm
  • 16:01 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1019.eqiad.wmnet with OS bookworm
  • 15:45 toyofuku@deploy1003: Finished scap sync-world: Backport for Enable search recommendation on Wikipedia (T402048) (duration: 11m 35s)
  • 15:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: host reimage
  • 15:40 toyofuku@deploy1003: jdlrobson, toyofuku: Continuing with sync
  • 15:39 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: host reimage
  • 15:37 toyofuku@deploy1003: jdlrobson, toyofuku: Backport for Enable search recommendation on Wikipedia (T402048) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:33 toyofuku@deploy1003: Started scap sync-world: Backport for Enable search recommendation on Wikipedia (T402048)
  • 15:22 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1019.eqiad.wmnet with OS bookworm
  • 15:19 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on fasw2-c8a-codfw,fasw2-c8b-codfw with reason: pfw1-codfw relocation
  • 15:17 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pfw1-codfw with reason: pfw1-codfw relocation
  • 15:15 moritzm: installing clamav security updates
  • 15:11 pt1979@cumin2002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ‘pfw1-codfw’ with reason: ‘pfw1
  • 14:32 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:32 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:31 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:31 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 14:21 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 14:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:12 brouberol@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:11 brouberol@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:10 brouberol@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:09 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:08 brouberol@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:07 brouberol@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:58 sukhe: delete list: sectrainings@lists.wikimedia.org [no archives, project obsolete since 2022]
  • 13:54 phuedx@deploy1003: Finished scap sync-world: Backport for Revert^2 "WikimediaEvents: Disable client-side error logging for certain wikis" (duration: 12m 25s)
  • 13:49 phuedx@deploy1003: phuedx: Continuing with sync
  • 13:48 phuedx@deploy1003: phuedx: Backport for Revert^2 "WikimediaEvents: Disable client-side error logging for certain wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:42 phuedx@deploy1003: Started scap sync-world: Backport for Revert^2 "WikimediaEvents: Disable client-side error logging for certain wikis"
  • 13:39 stevemunene@dns1004: END - running authdns-update
  • 13:37 stevemunene@dns1004: START - running authdns-update
  • 13:37 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:37 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:30 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:30 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:29 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:28 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • {{safesubst:SAL entry|1=13:28 esanders@deploy1003: Finished scap sync-world: Backport for [enwiki] Throttle exemption for training events (T405069), [gewikimedia] Update logo and wordmark and change sitename (T405147), LQT->Flow converter: Add a dryRun flag, LQT->Flow converter: Add flag to ignore $wgFlowReadOnly, [[gerrit:1189858|LQT->Flow converter: Skip pages which have}}
  • 13:23 moritzm: trigger full planet import for maps eqiad/bookworm T381565
  • 13:22 esanders@deploy1003: esanders, superpes, jforrester: Continuing with sync
  • {{safesubst:SAL entry|1=13:21 esanders@deploy1003: esanders, superpes, jforrester: Backport for [enwiki] Throttle exemption for training events (T405069), [gewikimedia] Update logo and wordmark and change sitename (T405147), LQT->Flow converter: Add a dryRun flag, LQT->Flow converter: Add flag to ignore $wgFlowReadOnly, [[gerrit:1189858|LQT->Flow converter: Skip pages whic}}
  • {{safesubst:SAL entry|1=13:17 esanders@deploy1003: Started scap sync-world: Backport for [enwiki] Throttle exemption for training events (T405069), [gewikimedia] Update logo and wordmark and change sitename (T405147), LQT->Flow converter: Add a dryRun flag, LQT->Flow converter: Add flag to ignore $wgFlowReadOnly, [[gerrit:1189858|LQT->Flow converter: Skip pages which have}}
  • 13:10 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:09 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:04 brouberol@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:04 brouberol@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:04 dreamyjazz@deploy1003: Finished scap sync-world: Backport for SI: Add a configuration flag to hide SI even if the feature is enabled (T405076), CheckUser: Enable SI on enwiki and frwiki while hiding special page (T405109) (duration: 23m 58s)
  • 12:58 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 12:56 brouberol@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:56 brouberol@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:46 dreamyjazz@deploy1003: dreamyjazz: Backport for SI: Add a configuration flag to hide SI even if the feature is enabled (T405076), CheckUser: Enable SI on enwiki and frwiki while hiding special page (T405109) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 12:40 dreamyjazz@deploy1003: Started scap sync-world: Backport for SI: Add a configuration flag to hide SI even if the feature is enabled (T405076), CheckUser: Enable SI on enwiki and frwiki while hiding special page (T405109)
  • 12:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 12:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 12:17 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis tokwiki, mswikiquote, thwikimedia in section s5
  • 11:50 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki, mswikiquote, thwikimedia in section s5
  • 11:41 moritzm: upgrading Envoy on puppetboard hosts T403663
  • 11:41 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis tokwiki, mswikiquote, thwikimedia in section s5
  • 11:19 zabe@deploy1003: mwscript-k8s job started: extensions/CentralAuth/maintenance/attachAccount.php --wiki=thwikimedia --userlist users.txt
  • 11:15 zabe@deploy1003: Finished scap sync-world: Backport for Attach thwikimedia to SUL (T400001), Set timezone and project namespace for thwikimedia (T400001) (duration: 11m 55s)
  • 11:09 zabe@deploy1003: zabe: Continuing with sync
  • 11:08 zabe@deploy1003: zabe: Backport for Attach thwikimedia to SUL (T400001), Set timezone and project namespace for thwikimedia (T400001) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:03 zabe@deploy1003: Started scap sync-world: Backport for Attach thwikimedia to SUL (T400001), Set timezone and project namespace for thwikimedia (T400001)
  • 10:48 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis tokwiki, mswikiquote, thwikimedia in section s5
  • 10:48 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Checking sanitization for wikis tokwiki in section s5
  • 10:43 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tokwiki in section s5
  • 10:38 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-backup-datanode[1032,1034-1046].eqiad.wmnet
  • 10:38 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:38 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-backup-datanode[1032,1034-1046].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 10:36 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-backup-datanode[1032,1034-1046].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 10:32 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 10:29 fabfur: temporary updating haproxykafka version on cp5021 to check for non-parsable character (T404427)
  • 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps1011.eqiad.wmnet with OS bookworm
  • 10:08 moritzm: installing qemu security updates
  • 09:59 joelyrookewmde: Finished populateSitesTable for mswikiquote (T404704)
  • 09:56 moritzm: installing modsecurity-apache security updates
  • 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1011.eqiad.wmnet with reason: host reimage
  • 09:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1011.eqiad.wmnet with reason: host reimage
  • 09:45 moritzm: installing imagemagick security updates
  • 09:43 ladsgroup@deploy1003: Finished scap sync-world: Backport for User: Reduce locking severity of ::getInstanceForUpdate() (duration: 11m 59s)
  • 09:39 klausman@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool inference in eqiad: maintenance
  • 09:38 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 09:37 ladsgroup@deploy1003: ladsgroup: Backport for User: Reduce locking severity of ::getInstanceForUpdate() synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:36 joelyrookewmde@deploy1003: mwscript-k8s job started: foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https # T404704
  • 09:34 klausman@cumin2002: START - Cookbook sre.discovery.service-route pool inference in eqiad: maintenance
  • 09:31 ladsgroup@deploy1003: Started scap sync-world: Backport for User: Reduce locking severity of ::getInstanceForUpdate()
  • 09:30 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:29 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-backup-datanode[1032,1034-1046].eqiad.wmnet
  • 09:29 moritzm: installing libfastjson security updates
  • 09:28 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:27 klausman@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool inference in eqiad: maintenance
  • 09:27 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps1011.eqiad.wmnet with OS bookworm
  • 09:23 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Log error message from upstream (duration: 14m 53s)
  • 09:22 klausman@cumin2002: START - Cookbook sre.discovery.service-route depool inference in eqiad: maintenance
  • 09:18 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:18 kharlan@deploy1003: kharlan: Continuing with sync
  • 09:18 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:16 moritzm: prune now obsolete nginx packages from install* T329529
  • 09:14 kharlan@deploy1003: kharlan: Backport for hCaptcha: Log error message from upstream synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:11 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:09 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:08 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Log error message from upstream
  • 09:06 klausman@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance
  • 09:06 klausman@cumin2002: START - Cookbook sre.discovery.service-route check 2 services: maintenance
  • 09:05 klausman@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check 2 services: maintenance
  • 09:05 klausman@cumin2002: START - Cookbook sre.discovery.service-route check 2 services: maintenance
  • 09:05 tappof: bump space for prometheus k8s-dse in eqiad
  • 09:01 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:01 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 08:53 awight: finished special dewiki sub-referencing window
  • 08:53 awight@deploy1003: Finished scap sync-world: Backport for Fix pulsating dot not disappearing for logged in users (T403693) (duration: 12m 26s)
  • 08:47 awight@deploy1003: awight, wmde-fisch: Continuing with sync
  • 08:46 awight@deploy1003: awight, wmde-fisch: Backport for Fix pulsating dot not disappearing for logged in users (T403693) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:40 awight@deploy1003: Started scap sync-world: Backport for Fix pulsating dot not disappearing for logged in users (T403693)
  • 08:33 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 08:23 awight@deploy1003: Finished scap sync-world: Backport for Enable sub-references on dewiki (T398669) (duration: 17m 26s)
  • 08:18 awight@deploy1003: wmde-fisch, awight: Continuing with sync
  • 08:12 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 08:12 awight@deploy1003: wmde-fisch, awight: Backport for Enable sub-references on dewiki (T398669) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:06 awight@deploy1003: Started scap sync-world: Backport for Enable sub-references on dewiki (T398669)
  • 08:02 moritzm: upgrading Envoy on webperf hosts T403663
  • 08:02 awight: beginning dewiki sub-referencing deployment window
  • 08:00 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 07:39 phuedx@deploy1003: Finished scap sync-world: Backport for xLab: Fix instrument to produce valid events (T404420) (duration: 15m 28s)
  • 07:31 phuedx@deploy1003: phuedx: Continuing with sync
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install1004.wikimedia.org
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:29 phuedx@deploy1003: phuedx: Backport for xLab: Fix instrument to produce valid events (T404420) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:23 phuedx@deploy1003: Started scap sync-world: Backport for xLab: Fix instrument to produce valid events (T404420)
  • 07:18 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Enable on API account creation on test2wiki (T405107) (duration: 37m 41s)
  • 07:17 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:08 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install1004.wikimedia.org
  • 07:05 kharlan@deploy1003: kharlan: Continuing with sync
  • 07:04 kharlan@deploy1003: kharlan: Backport for hCaptcha: Enable on API account creation on test2wiki (T405107) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:40 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Enable on API account creation on test2wiki (T405107)
  • 01:29 eileen: civicrm upgraded from 793af994 to 2c830cae
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 12m 14s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-09-21

  • 18:40 ryankemper: T395772 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1189979 to fix puppet failures on deploy servers
  • 18:20 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs2009` to restore service to https://query-legacy-full.wikidata.org/
  • 18:15 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on wdqs[2009,2016].codfw.wmnet,wdqs[1018-1020].eqiad.wmnet with reason: T395772
  • 01:01 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 01m 02s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-09-20

  • 08:04 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 37s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-09-19

  • 18:35 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 18:07 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "remove sretest2009 - cmooney@cumin1003"
  • 18:07 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "remove sretest2009 - cmooney@cumin1003"
  • 17:59 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:57 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 17:56 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts sretest2009.codfw.wmnet
  • 17:56 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:56 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cmooney@cumin1003"
  • 17:56 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cmooney@cumin1003"
  • 17:51 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 17:48 cmooney@cumin1003: START - Cookbook sre.hosts.decommission for hosts sretest2009.codfw.wmnet
  • 17:36 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "force sync to remove sretest2009 - cmooney@cumin1003"
  • 17:34 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "force sync to remove sretest2009 - cmooney@cumin1003"
  • 17:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set s1 to RW', diff saved to https://phabricator.wikimedia.org/P83443 and previous config saved to /var/cache/conftool/dbconfig/20250919-171624-ladsgroup.json
  • 17:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2009.codfw.wmnet with OS trixie
  • 17:12 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS trixie
  • 17:09 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2009.codfw.wmnet with OS bookworm
  • 17:04 taavi@cumin1003: dbctl commit (dc=all): 'set s1 ro', diff saved to https://phabricator.wikimedia.org/P83441 and previous config saved to /var/cache/conftool/dbconfig/20250919-170402-taavi.json
  • 17:02 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm
  • 16:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 16:54 cmooney@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2009.codfw.wmnet with OS bookworm
  • 16:52 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 16:29 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm
  • 16:29 cmooney@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2009.codfw.wmnet with OS bookworm
  • 16:26 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm
  • 16:25 cmooney@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2009.codfw.wmnet with OS bookworm
  • 16:14 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm
  • 16:04 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker[1079-1094].eqiad.wmnet
  • 16:04 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:04 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1079-1094].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 16:03 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1079-1094].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 15:52 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 15:52 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-misc2001.codfw.wmnet with OS bookworm
  • 15:38 cmooney@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync subnet info for vlan in codfw rack e2 - cmooney@cumin1003"
  • 15:38 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync subnet info for vlan in codfw rack e2 - cmooney@cumin1003"
  • 15:31 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for sretest2009 - cmooney@cumin1003"
  • 15:30 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for sretest2009 - cmooney@cumin1003"
  • 15:25 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 15:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:12 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:01 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:58 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:53 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:31 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
  • 14:30 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:21 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker[1079-1094].eqiad.wmnet
  • 14:20 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:17 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1043.eqiad.wmnet
  • 13:11 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:37 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:14 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:14 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:13 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 10:12 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2033.codfw.wmnet
  • 10:11 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:11 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2033.codfw.wmnet
  • 10:03 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:02 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 09:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1043.eqiad.wmnet
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1043.eqiad.wmnet
  • 09:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1043.eqiad.wmnet
  • 09:41 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-backup-datanode[1001-1007].eqiad.wmnet
  • 09:41 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:41 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-backup-datanode[1001-1007].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 09:40 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-backup-datanode[1001-1007].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 09:36 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 09:28 moritzm: installing distro-info-data updates on Bullseye
  • 09:08 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-backup-datanode[1001-1007].eqiad.wmnet
  • 09:08 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:06 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 08:35 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/echoserver: apply
  • 08:35 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/echoserver: apply
  • 08:33 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 08:32 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 08:03 eileen: civicrm upgraded from 108ab009 to 2ddf5ea5
  • 07:31 eileen: civicrm upgraded from 541264cd to 108ab009
  • 06:47 moritzm: trigger full planet import for maps codfw/bookworm T381565
  • 05:17 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2017.codfw.wmnet with OS bullseye
  • 05:13 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 05:12 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 05:12 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 05:11 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 05:09 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 05:09 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 03:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs2017
  • 03:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2017
  • 03:56 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2017
  • 03:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs2017.codfw.wmnet 154.32.192.10.in-addr.arpa 4.5.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 03:56 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs2017.codfw.wmnet 154.32.192.10.in-addr.arpa 4.5.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 03:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2017 - ryankemper@cumin2002"
  • 03:55 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2017 - ryankemper@cumin2002"
  • 03:45 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 03:44 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs2017
  • 03:44 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2017.codfw.wmnet with OS bullseye
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 56s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-09-18

  • 23:59 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2016.codfw.wmnet with reason: host reimage
  • 23:53 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2016.codfw.wmnet with reason: host reimage
  • 23:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs2016
  • 23:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2016
  • 23:35 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2016
  • 23:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs2016.codfw.wmnet 193.16.192.10.in-addr.arpa 3.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 23:35 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs2016.codfw.wmnet 193.16.192.10.in-addr.arpa 3.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 23:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2016 - ryankemper@cumin2002"
  • 23:35 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2016 - ryankemper@cumin2002"
  • 23:33 mutante: upgrading envoyproxy on production phabricator (phab1004) - T403663
  • 23:28 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 23:27 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs2016
  • 23:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2016.codfw.wmnet with OS bullseye
  • 23:22 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1043.eqiad.wmnet with OS bookworm
  • 23:04 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1043.eqiad.wmnet with reason: host reimage
  • 22:59 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1043.eqiad.wmnet with reason: host reimage
  • 22:40 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1043.eqiad.wmnet with OS bookworm
  • 22:28 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 22:15 ryankemper@cumin1002: conftool action : GET; selector: name=wdqs2009.codfw.wmnet
  • 22:13 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3074.*
  • 21:53 eileen: civicrm upgraded from 1448ace2 to d2f459cb
  • 21:48 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1042.eqiad.wmnet with reason: host reimage
  • 21:44 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1042.eqiad.wmnet with reason: host reimage
  • 21:34 brett: Deleting wdqs, wdqs-heavy-queries, and wdqs-ssl ipvs services from A:lvs-low-traffic-eqiad - T395772
  • 21:33 brett: Deleting wdqs, wdqs-heavy-queries, and wdqs-ssl ipvs services from A:lvs-low-traffic-codfw - T395772
  • 21:32 brett: Deleting wdqs, wdqs-heavy-queries, and wdqs-ssl ipvs services from A:lvs-secondary-eqiad - T395772
  • 21:30 brett: Deleting wdqs, wdqs-heavy-queries, and wdqs-ssl ipvs services from A:lvs-secondary-codfw - T395772
  • 21:26 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 21:25 brett: Restarting pybal on low-traffic eqiad/codfw lvs servers - T395772
  • 21:18 brett: Restarting pybal on secondary eqiad/codfw lvs servers - T395772
  • 21:17 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1035.eqiad.wmnet with OS bookworm
  • 21:09 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:09 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:08 ejegg: payments-wiki upgraded from 7fe92797 to 3e13fadf
  • 21:08 brett@dns1004: END - running authdns-update
  • 21:04 jgleeson: SmashPig upgraded from 9d901f99 to f805ba74
  • 21:02 jgleeson: payments-wiki upgraded from a31d7db6 to 7fe92797
  • 20:58 brett@dns1004: START - running authdns-update
  • 20:57 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:57 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 20:39 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 20:17 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bookworm
  • 19:17 ejegg: payments-wiki upgraded from 1b7a47a6 to a31d7db6
  • 18:45 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.19 refs T396380
  • 18:28 jhuneidi@deploy1003: Finished scap sync-world: Backport for Fix ip_reputation.score validation errors in production (T403664) (duration: 17m 17s)
  • 18:24 damilare: donorwiki upgraded from 10d200b1 to df2482ce
  • 18:23 jhuneidi@deploy1003: kharlan, jhuneidi: Continuing with sync
  • 18:18 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy: Only present rename icon on eligible entities - swfrench@cumin2002"
  • 18:18 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Only present rename icon on eligible entities - swfrench@cumin2002
  • 18:17 jhuneidi@deploy1003: kharlan, jhuneidi: Backport for Fix ip_reputation.score validation errors in production (T403664) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:17 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Only present rename icon on eligible entities - swfrench@cumin2002
  • 18:17 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy: Only present rename icon on eligible entities - swfrench@cumin2002"
  • 18:12 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:12 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:11 jhuneidi@deploy1003: Started scap sync-world: Backport for Fix ip_reputation.score validation errors in production (T403664)
  • 18:07 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:07 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:06 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:06 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:02 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: WIP
  • 18:01 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2001.codfw.wmnet with reason: WIP
  • 17:32 mutante: upgrading envoyproxy on vrts1003 (active ticket.wikimedia.org ) T403663
  • 17:19 mutante: upgrading envoyproxy on lists1004 (active lists server) T403663
  • 17:15 mutante: upgrading envoyproxy on aphlict1002 (active phab notifications) and contint2002 (active CI) T403663
  • 17:02 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:02 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:36 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0) for datacenter switchover from codfw to eqiad
  • 16:32 jasmine@deploy1003: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover - T399891 (duration: 16m 26s)
  • 16:24 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters for datacenter switchover from codfw to eqiad
  • 16:24 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0) for datacenter switchover from codfw to eqiad
  • 16:23 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl for datacenter switchover from codfw to eqiad
  • 16:23 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0) for datacenter switchover from codfw to eqiad
  • 16:23 root@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 16:23 root@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 16:23 root@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
  • 16:23 root@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
  • 16:23 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance for datacenter switchover from codfw to eqiad
  • 16:22 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-mw-jobrunner (exit_code=0) for datacenter switchover from codfw to eqiad
  • 16:22 root@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: sync
  • 16:22 root@deploy1003: helmfile [codfw] START helmfile.d/services/mw-jobrunner: sync
  • 16:22 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.08-restart-mw-jobrunner for datacenter switchover from codfw to eqiad
  • 16:21 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0) for datacenter switchover from codfw to eqiad
  • 16:21 jasmine@cumin1003: [DRY-RUN] MediaWiki read-only period ends at: 2025-09-18 16:21:21.591133
  • 16:21 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite for datacenter switchover from codfw to eqiad
  • 16:21 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0) for datacenter switchover from codfw to eqiad
  • 16:21 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite for datacenter switchover from codfw to eqiad
  • 16:20 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0) for datacenter switchover from codfw to eqiad
  • 16:20 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki for datacenter switchover from codfw to eqiad
  • 16:20 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0) for datacenter switchover from codfw to eqiad
  • 16:19 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly for datacenter switchover from codfw to eqiad
  • 16:19 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0) for datacenter switchover from codfw to eqiad
  • 16:19 jasmine@cumin1003: [DRY-RUN] MediaWiki read-only period starts at: 2025-09-18 16:19:18.465479
  • 16:19 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.02-set-readonly for datacenter switchover from codfw to eqiad
  • 16:17 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0) for datacenter switchover from codfw to eqiad
  • 16:17 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from codfw to eqiad
  • 16:17 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:16 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0) for datacenter switchover from codfw to eqiad
  • 16:16 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:16 jasmine@deploy1003: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover - T399891
  • 16:13 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:13 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:11 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl for datacenter switchover from codfw to eqiad
  • 16:10 jasmine@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0) for datacenter switchover from codfw to eqiad
  • 16:10 jasmine@cumin1003: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks for datacenter switchover from codfw to eqiad
  • 16:03 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:01 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:41 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Log hcaptcha.execute() events (T402767) (duration: 12m 20s)
  • 15:39 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@b41bbe7] (releasing): Update Jenkins version (duration: 00m 42s)
  • 15:39 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@b41bbe7] (releasing): Update Jenkins version
  • 15:36 kharlan@deploy1003: kharlan: Continuing with sync
  • 15:35 kharlan@deploy1003: kharlan: Backport for hCaptcha: Log hcaptcha.execute() events (T402767) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:29 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Log hcaptcha.execute() events (T402767)
  • 15:24 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 7 hosts with reason: reboot cr1-codfw as requested by Juniper
  • 15:24 zabe: zabe@deploy1003:~$ mwscript createAndPromote.php --wiki=thwikimedia --bureaucrat --sysop --reason="T400001" Sarawut.Kha REDACTED
  • 15:23 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:23 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2011.codfw.wmnet with OS bookworm
  • 15:21 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:21 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:18 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:17 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:07 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1022.eqiad.wmnet with OS bookworm
  • 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2011.codfw.wmnet with reason: host reimage
  • 14:58 topranks: drain cr1-codfw of traffic before work to test power cupplies T401937
  • 14:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2011.codfw.wmnet with reason: host reimage
  • 14:50 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@b41bbe7] (releasing): Test deploy (duration: 00m 30s)
  • 14:49 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@b41bbe7] (releasing): Test deploy
  • 14:49 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: host reimage
  • 14:46 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@b41bbe7] (releasing): Test deploy (duration: 00m 19s)
  • 14:46 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@b41bbe7] (releasing): Test deploy
  • 14:43 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1022.eqiad.wmnet with reason: host reimage
  • 14:38 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-backup-namenode[1001-1002].eqiad.wmnet
  • 14:38 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:38 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-backup-namenode[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 14:37 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-backup-namenode[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 14:37 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps2011.codfw.wmnet with OS bookworm
  • 14:34 moritzm: upgrading Envoy on cloudweb hosts T403663
  • 14:33 jforrester@deploy1003: Finished scap sync-world: Backport for Graph: Use new placeholder i18n from WikimediaMessages (T362317) (duration: 11m 40s)
  • 14:28 jforrester@deploy1003: jforrester: Continuing with sync
  • 14:26 elukey@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 14:26 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 14:26 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1022.eqiad.wmnet with OS bookworm
  • 14:24 jforrester@deploy1003: jforrester: Backport for Graph: Use new placeholder i18n from WikimediaMessages (T362317) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:22 jforrester@deploy1003: Started scap sync-world: Backport for Graph: Use new placeholder i18n from WikimediaMessages (T362317)
  • 14:18 jforrester@deploy1003: Finished scap sync-world: Backport for test2wiki: Enable Wikifunctions client mode here too (T397401) (duration: 11m 16s)
  • 14:17 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 14:16 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:16 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2048.codfw.wmnet']
  • 14:12 jforrester@deploy1003: jforrester: Continuing with sync
  • 14:11 jforrester@deploy1003: jforrester: Backport for test2wiki: Enable Wikifunctions client mode here too (T397401) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:06 jforrester@deploy1003: Started scap sync-world: Backport for test2wiki: Enable Wikifunctions client mode here too (T397401)
  • 14:05 zabe@deploy1003: Finished scap sync-world: Backport for Actiave thwikimedia (T400001), Update interwiki cache (duration: 08m 39s)
  • 13:59 zabe@deploy1003: zabe: Continuing with sync
  • 13:58 zabe@deploy1003: zabe: Backport for Actiave thwikimedia (T400001), Update interwiki cache synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:56 zabe@deploy1003: Started scap sync-world: Backport for Actiave thwikimedia (T400001), Update interwiki cache
  • 13:52 zabe@deploy1003: Finished scap sync-world: Backport for Initial configuration for thwikimedia (T400001) (duration: 10m 34s)
  • 13:51 moritzm: imported imposm 0.14.1-3 (cherrypick of upstream fix to hopefully fix deadlock in OSM import) T381565
  • 13:51 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:50 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:49 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 13:47 zabe@deploy1003: zabe: Continuing with sync
  • 13:46 zabe@deploy1003: zabe: Backport for Initial configuration for thwikimedia (T400001) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:42 zabe@deploy1003: Started scap sync-world: Backport for Initial configuration for thwikimedia (T400001)
  • 13:38 zabe@deploy1003: Finished scap sync-world: Backport for Activate mswikiquote (T404698) (duration: 10m 58s)
  • 13:37 ladsgroup@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw for all core sections
  • 13:36 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-backup-namenode[1001-1002].eqiad.wmnet
  • 13:33 zabe@deploy1003: zabe: Continuing with sync
  • 13:32 zabe@deploy1003: zabe: Backport for Activate mswikiquote (T404698) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:27 zabe@deploy1003: Started scap sync-world: Backport for Activate mswikiquote (T404698)
  • 13:25 zabe@deploy1003: Finished scap sync-world: Backport for Initial configuration for mswikiquote (T404698) (duration: 07m 56s)
  • 13:24 kart_: Updated Recommendation API to 2025-09-15-194552-production (T404223. T404448. T400562)
  • 13:21 kartik@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:20 zabe@deploy1003: zabe: Continuing with sync
  • 13:19 zabe@deploy1003: zabe: Backport for Initial configuration for mswikiquote (T404698) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:17 ladsgroup@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw for all core sections
  • 13:17 zabe@deploy1003: Started scap sync-world: Backport for Initial configuration for mswikiquote (T404698)
  • 13:16 zabe@deploy1003: Finished scap sync-world: Backport for Release CampaignEvents extension to Wikimedia Commons - Sept 18 (T403667) (duration: 11m 15s)
  • 13:16 kartik@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:11 zabe@deploy1003: zabe, cmelo: Continuing with sync
  • 13:11 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62856
  • 13:10 zabe@deploy1003: zabe, cmelo: Backport for Release CampaignEvents extension to Wikimedia Commons - Sept 18 (T403667) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:05 zabe@deploy1003: Started scap sync-world: Backport for Release CampaignEvents extension to Wikimedia Commons - Sept 18 (T403667)
  • 13:04 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:04 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:01 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 62856
  • 12:49 kartik@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:43 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:42 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for Do not access bundle on non-Parsoid content (T404902) (duration: 16m 33s)
  • 12:07 moritzm: installing libyaml-libyaml-perl security updates
  • 12:06 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1192* gradually with 4 steps - Work done
  • 12:06 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1203* gradually with 4 steps - Work done
  • 12:04 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repool es2027', diff saved to https://phabricator.wikimedia.org/P83433 and previous config saved to /var/cache/conftool/dbconfig/20250918-120441-ladsgroup.json
  • 12:02 moritzm: installing libjson-xs-perl security updates
  • 12:02 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 11:57 ladsgroup@deploy1003: ladsgroup: Backport for Do not access bundle on non-Parsoid content (T404902) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:51 ladsgroup@deploy1003: Started scap sync-world: Backport for Do not access bundle on non-Parsoid content (T404902)
  • 11:47 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 11:46 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 11:46 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 11:44 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply
  • 11:43 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 11:42 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply
  • 11:21 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1192* gradually with 4 steps - Work done
  • 11:21 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1203* gradually with 4 steps - Work done
  • 10:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1203', diff saved to https://phabricator.wikimedia.org/P83426 and previous config saved to /var/cache/conftool/dbconfig/20250918-105922-ladsgroup.json
  • 10:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1192', diff saved to https://phabricator.wikimedia.org/P83425 and previous config saved to /var/cache/conftool/dbconfig/20250918-105905-ladsgroup.json
  • 10:45 topranks: drain ssw1-f1-eqiad of traffic to perform reboot T400783
  • 10:40 btullis@deploy1003: Finished deploy [analytics/refinery@5feb53f] (thin): Regular analytics weekly train THIN [analytics/refinery@5feb53f9] (duration: 00m 55s)
  • 10:39 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on install1004.wikimedia.org with reason: being shut down
  • 10:39 btullis@deploy1003: Started deploy [analytics/refinery@5feb53f] (thin): Regular analytics weekly train THIN [analytics/refinery@5feb53f9]
  • 10:37 btullis@deploy1003: Finished deploy [analytics/refinery@5feb53f]: Regular analytics weekly train [analytics/refinery@5feb53f9] (duration: 05m 08s)
  • 10:32 btullis@deploy1003: Started deploy [analytics/refinery@5feb53f]: Regular analytics weekly train [analytics/refinery@5feb53f9]
  • 10:27 btullis@deploy1003: Finished deploy [analytics/refinery@5feb53f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@5feb53f9] (duration: 00m 50s)
  • 10:26 btullis@deploy1003: Started deploy [analytics/refinery@5feb53f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@5feb53f9]
  • 10:19 jayme@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1002"
  • 10:19 jayme@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1002
  • 10:18 jayme@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1002
  • 10:18 jayme@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1002"
  • 08:55 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:55 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:34 slyngshede@dns1004: END - running authdns-update
  • 08:33 slyngshede@dns1004: START - running authdns-update
  • 07:50 jmm@dns1004: END - running authdns-update
  • 07:48 jmm@dns1004: START - running authdns-update
  • 07:22 dcausse@deploy1003: Finished scap sync-world: Backport for cirrus: Reduce galleries weight in search on commons (T401590) (duration: 16m 20s)
  • 07:17 dcausse@deploy1003: dcausse, ebernhardson: Continuing with sync
  • 07:12 dcausse@deploy1003: dcausse, ebernhardson: Backport for cirrus: Reduce galleries weight in search on commons (T401590) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:06 dcausse@deploy1003: Started scap sync-world: Backport for cirrus: Reduce galleries weight in search on commons (T401590)
  • 06:34 jynus@cumin1003: dbctl commit (dc=all): 'Depool es2027 T404940', diff saved to https://phabricator.wikimedia.org/P83420 and previous config saved to /var/cache/conftool/dbconfig/20250918-063436-jynus.json
  • 06:08 kart_: Updated cxserver to 2025-09-16-161231-production (T394008, T404567, T404298, T404181)
  • 06:06 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:05 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:05 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2027.codfw.wmnet onto es2050.codfw.wmnet
  • 06:05 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2027 gradually with 4 steps - Pool es2027.codfw.wmnet in after cloning
  • 06:05 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:04 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:00 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:59 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:19 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2027 gradually with 4 steps - Pool es2027.codfw.wmnet in after cloning
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 46s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-09-17

  • 23:37 kemayo@deploy1003: Finished scap sync-world: Backport for Paste check: log when a paste check would have been shown if enabled (T402460) (duration: 12m 38s)
  • 23:32 kemayo@deploy1003: kemayo: Continuing with sync
  • 23:31 kemayo@deploy1003: kemayo: Backport for Paste check: log when a paste check would have been shown if enabled (T402460) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:24 kemayo@deploy1003: Started scap sync-world: Backport for Paste check: log when a paste check would have been shown if enabled (T402460)
  • 22:46 krinkle@deploy1003: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on cawiki, hewiki, itwiki (group1) (T403510) (duration: 12m 39s)
  • 22:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1021.eqiad.wmnet with OS bookworm
  • 22:41 krinkle@deploy1003: krinkle: Continuing with sync
  • 22:40 krinkle@deploy1003: krinkle: Backport for Disable wmgUseMdotRouting on cawiki, hewiki, itwiki (group1) (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:34 krinkle@deploy1003: Started scap sync-world: Backport for Disable wmgUseMdotRouting on cawiki, hewiki, itwiki (group1) (T403510)
  • 22:28 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1021.eqiad.wmnet with reason: host reimage
  • 22:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Bump weight of db1167 in general group (T403966)', diff saved to https://phabricator.wikimedia.org/P83415 and previous config saved to /var/cache/conftool/dbconfig/20250917-222207-ladsgroup.json
  • 22:21 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1021.eqiad.wmnet with reason: host reimage
  • 22:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1209 and db2163 from api group (T403966)', diff saved to https://phabricator.wikimedia.org/P83414 and previous config saved to /var/cache/conftool/dbconfig/20250917-222107-ladsgroup.json
  • 22:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1192 and db2166 from api group (T403966)', diff saved to https://phabricator.wikimedia.org/P83413 and previous config saved to /var/cache/conftool/dbconfig/20250917-221924-ladsgroup.json
  • 22:10 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 22:00 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 21:55 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bookworm
  • 21:14 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1021.eqiad.wmnet with OS bookworm
  • 20:58 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:56 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 20:49 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS bookworm
  • 20:49 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker2003.codfw.wmnet with OS bookworm
  • 20:35 cjming: end of UTC late backport window
  • 20:30 cjming@deploy1003: Finished scap sync-world: Backport for EventStreamConfig: Enable experiment enrollment hoisting for MinT for Wiki Readers stream (duration: 11m 37s)
  • 20:30 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:27 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 20:25 cjming@deploy1003: cjming, phuedx: Continuing with sync
  • 20:24 cjming@deploy1003: cjming, phuedx: Backport for EventStreamConfig: Enable experiment enrollment hoisting for MinT for Wiki Readers stream synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:18 cjming@deploy1003: Started scap sync-world: Backport for EventStreamConfig: Enable experiment enrollment hoisting for MinT for Wiki Readers stream
  • 20:16 eileen: civicrm upgraded from 6756d5b0 to 1448ace2
  • 20:15 esanders@deploy1003: Finished scap sync-world: Backport for Enable Flow in read-only mode on wikis using LiquidThreads (T404687) (duration: 11m 10s)
  • 20:12 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bookworm
  • 20:12 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1021.eqiad.wmnet with OS bookworm
  • 20:10 esanders@deploy1003: esanders: Continuing with sync
  • 20:08 esanders@deploy1003: esanders: Backport for Enable Flow in read-only mode on wikis using LiquidThreads (T404687) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:04 esanders@deploy1003: Started scap sync-world: Backport for Enable Flow in read-only mode on wikis using LiquidThreads (T404687)
  • 20:02 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:02 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: PDU IP added to eqiad - vriley@cumin1003"
  • 20:01 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: PDU IP added to eqiad - vriley@cumin1003"
  • 20:01 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1021.eqiad.wmnet with OS bookworm
  • 19:58 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 19:07 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Log open events to Prometheus (T402767), hCaptcha: Log open events to Prometheus (T402767) (duration: 11m 28s)
  • 19:04 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:02 kharlan@deploy1003: kharlan: Continuing with sync
  • 19:01 kharlan@deploy1003: kharlan: Backport for hCaptcha: Log open events to Prometheus (T402767), hCaptcha: Log open events to Prometheus (T402767) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:55 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Log open events to Prometheus (T402767), hCaptcha: Log open events to Prometheus (T402767)
  • 18:53 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:17 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.19 refs T396380
  • 18:07 mutante: upgrading envoyproxy on etherpad* and stewards* hosts T403663
  • 17:46 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1236
  • 17:45 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1236
  • 17:37 swfrench-wmf: migrated shellbox-video to PHP 8.3 - T403284
  • 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 17:35 mutante: upgrading envoyproxy on planet* and people* hosts T403663
  • 17:33 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 17:32 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 17:29 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 17:29 mutante: upgrading envoyproxy on zuul* hosts T403663
  • 17:29 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 17:24 mutante: upgrading envoyproxy on doc* hosts T403663
  • 17:23 mutante: upgrading envoyproxy on releases* hosts T403663
  • 16:13 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:07 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:47 zabe@deploy1003: Finished scap sync-world: Backport for addWiki: Stop populating the interwiki table on new wikis, addWiki: Stop populating the interwiki table on new wikis, InstallPreConfigured: Allow subclasses to skip tasks, InstallPreConfigured: Allow subclasses to skip tasks (duration: 11m 22s)
  • 15:44 jgleeson: payments-wiki upgraded from 1c58560c to 1b7a47a6
  • 15:41 zabe@deploy1003: zabe: Continuing with sync
  • 15:41 zabe@deploy1003: zabe: Backport for addWiki: Stop populating the interwiki table on new wikis, addWiki: Stop populating the interwiki table on new wikis, InstallPreConfigured: Allow subclasses to skip tasks, InstallPreConfigured: Allow subclasses to skip tasks synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes
  • 15:35 zabe@deploy1003: Started scap sync-world: Backport for addWiki: Stop populating the interwiki table on new wikis, addWiki: Stop populating the interwiki table on new wikis, InstallPreConfigured: Allow subclasses to skip tasks, InstallPreConfigured: Allow subclasses to skip tasks
  • 15:00 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:59 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:59 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:57 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:57 swfrench-wmf: migrated shellbox-timeline to PHP 8.3 - T403284
  • 14:56 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 14:56 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:56 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 14:55 jforrester@deploy1003: Finished scap sync-world: Backport for Enable Wikifunctions client mode on Wiktionaries, Part II (T397401) (duration: 12m 38s)
  • 14:50 jforrester@deploy1003: jforrester: Continuing with sync
  • 14:48 jforrester@deploy1003: jforrester: Backport for Enable Wikifunctions client mode on Wiktionaries, Part II (T397401) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:42 jforrester@deploy1003: Started scap sync-world: Backport for Enable Wikifunctions client mode on Wiktionaries, Part II (T397401)
  • 14:35 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 14:34 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 14:33 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Add wiki label for special_create_account (T402767), hCaptcha: Add wiki label for special_create_account (T402767) (duration: 11m 36s)
  • 14:33 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 14:33 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS bookworm
  • 14:32 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 14:29 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 14:29 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS bookworm
  • 14:28 kharlan@deploy1003: kharlan: Continuing with sync
  • 14:27 kharlan@deploy1003: kharlan: Backport for hCaptcha: Add wiki label for special_create_account (T402767), hCaptcha: Add wiki label for special_create_account (T402767) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:22 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Add wiki label for special_create_account (T402767), hCaptcha: Add wiki label for special_create_account (T402767)
  • 14:17 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:16 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:16 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:08 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:07 moritzm: upgrading Envoy on IDP hosts T403663
  • 14:05 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:05 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:04 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:03 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:50 jgleeson: SmashPig upgraded from 70316e96 to 9d901f99
  • 13:49 moritzm: imported jenkins 2.516.3 for bullseye and bookworm T404856
  • 13:44 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) es2027 - Depool es2027.codfw.wmnet to then clone it to es2050.codfw.wmnet - fceratto@cumin1002
  • 13:35 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2027 - Depool es2027.codfw.wmnet to then clone it to es2050.codfw.wmnet - fceratto@cumin1002
  • 13:35 fceratto@cumin1002: START - Cookbook sre.mysql.clone_es of es2027.codfw.wmnet onto es2050.codfw.wmnet
  • 13:34 fceratto@cumin1002: dbctl commit (dc=all): 'Depool es2027 T402859', diff saved to https://phabricator.wikimedia.org/P83408 and previous config saved to /var/cache/conftool/dbconfig/20250917-133454-fceratto.json
  • 13:34 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 13:29 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) es2027 - Depool for cloning
  • 13:29 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2027 - Depool for cloning
  • 13:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1232 from api group (T403966)', diff saved to https://phabricator.wikimedia.org/P83407 and previous config saved to /var/cache/conftool/dbconfig/20250917-131718-ladsgroup.json
  • 13:04 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone_es (exit_code=99) of es2027.codfw.wmnet onto es2050.codfw.wmnet
  • 13:03 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) es2027 - Depool es2027.codfw.wmnet to then clone it to es2050.codfw.wmnet - fceratto@cumin1002
  • 13:03 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2027 - Depool es2027.codfw.wmnet to then clone it to es2050.codfw.wmnet - fceratto@cumin1002
  • 13:03 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) es2027 - Depool es2027.codfw.wmnet to then clone it to es2050.codfw.wmnet - fceratto@cumin1002
  • 13:01 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2027 - Depool es2027.codfw.wmnet to then clone it to es2050.codfw.wmnet - fceratto@cumin1002
  • 13:01 fceratto@cumin1002: START - Cookbook sre.mysql.clone_es of es2027.codfw.wmnet onto es2050.codfw.wmnet
  • 13:01 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2050.codfw.wmnet
  • 12:30 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for es2050.codfw.wmnet
  • 12:25 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Use correct DB domain in SuggestedInvestigationsCaseLookupService (T404846) (duration: 13m 37s)
  • 12:20 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 12:17 dreamyjazz@deploy1003: dreamyjazz: Backport for Use correct DB domain in SuggestedInvestigationsCaseLookupService (T404846) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:11 dreamyjazz@deploy1003: Started scap sync-world: Backport for Use correct DB domain in SuggestedInvestigationsCaseLookupService (T404846)
  • 12:03 Daimona: Creating new tables for the CampaignEvents extension in x1.testwiki, x1.test2wiki, x1.officewiki, and x1.wikishared # T400719
  • 11:57 dreamyjazz@deploy1003: Finished scap sync-world: Backport for SI: Load ext.checkUser.styles on Special:SuggestedInvestigations (T404712) (duration: 11m 29s)
  • 11:54 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2049 slowly with 10 steps - Pooling in new host
  • 11:52 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 11:51 dreamyjazz@deploy1003: dreamyjazz: Backport for SI: Load ext.checkUser.styles on Special:SuggestedInvestigations (T404712) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:46 dreamyjazz@deploy1003: Started scap sync-world: Backport for SI: Load ext.checkUser.styles on Special:SuggestedInvestigations (T404712)
  • 11:39 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:39 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Bump weight of db2152 in general group (T403966)', diff saved to https://phabricator.wikimedia.org/P83401 and previous config saved to /var/cache/conftool/dbconfig/20250917-112010-ladsgroup.json
  • 11:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Bump weight of db1167 in general group (T403966)', diff saved to https://phabricator.wikimedia.org/P83400 and previous config saved to /var/cache/conftool/dbconfig/20250917-111858-ladsgroup.json
  • 11:16 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS bookworm
  • 11:10 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:09 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:03 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:02 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 11:02 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:02 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 11:00 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 11:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1196 from api group (T403966)', diff saved to https://phabricator.wikimedia.org/P83398 and previous config saved to /var/cache/conftool/dbconfig/20250917-105946-ladsgroup.json
  • 10:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Rebalance s3 in codfw (T403966)', diff saved to https://phabricator.wikimedia.org/P83397 and previous config saved to /var/cache/conftool/dbconfig/20250917-105709-ladsgroup.json
  • 10:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Rebalance s1 in codfw (T403966)', diff saved to https://phabricator.wikimedia.org/P83395 and previous config saved to /var/cache/conftool/dbconfig/20250917-105102-ladsgroup.json
  • 10:47 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Set virtual domain mapping for virtual-checkuser (T404830), Deploy suggested investigations to testwiki and test2wiki (T404830) (duration: 15m 58s)
  • 10:46 moritzm: trigger full OSM import on maps2011 T381565
  • 10:42 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 10:37 dreamyjazz@deploy1003: dreamyjazz: Backport for Set virtual domain mapping for virtual-checkuser (T404830), Deploy suggested investigations to testwiki and test2wiki (T404830) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:36 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1169 from api group of s1 (T403966)', diff saved to https://phabricator.wikimedia.org/P83393 and previous config saved to /var/cache/conftool/dbconfig/20250917-103306-ladsgroup.json
  • 10:31 dreamyjazz@deploy1003: Started scap sync-world: Backport for Set virtual domain mapping for virtual-checkuser (T404830), Deploy suggested investigations to testwiki and test2wiki (T404830)
  • 10:27 dreamyjazz@deploy1003: Sync cancelled.
  • 10:26 dreamyjazz@deploy1003: dreamyjazz: Backport for Deploy suggested investigations to testwiki and test2wiki (T404830) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1235 from api group of s1 (T403966)', diff saved to https://phabricator.wikimedia.org/P83391 and previous config saved to /var/cache/conftool/dbconfig/20250917-102225-ladsgroup.json
  • 10:20 dreamyjazz@deploy1003: Started scap sync-world: Backport for Deploy suggested investigations to testwiki and test2wiki (T404830)
  • 10:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2057.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:16 moritzm: installing openjpeg2 security updates
  • 10:06 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS bookworm
  • 10:05 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: sync
  • 10:04 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: sync
  • 10:02 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 10:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2014.codfw.wmnet with OS bookworm
  • 09:57 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:57 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2057.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:54 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:53 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:52 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:51 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:42 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:42 elukey@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-codfw
  • 09:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1184 (s1 candidate master) from api group of s1 (T403966)', diff saved to https://phabricator.wikimedia.org/P83388 and previous config saved to /var/cache/conftool/dbconfig/20250917-094124-ladsgroup.json
  • 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2014.codfw.wmnet with reason: host reimage
  • 09:38 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2049 slowly with 10 steps - Pooling in new host
  • 09:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1251 from api group of s1 (T403966)', diff saved to https://phabricator.wikimedia.org/P83386 and previous config saved to /var/cache/conftool/dbconfig/20250917-093718-ladsgroup.json
  • 09:36 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2014.codfw.wmnet with reason: host reimage
  • 09:35 fceratto@cumin1002: dbctl commit (dc=all): 'Add es2049', diff saved to https://phabricator.wikimedia.org/P83385 and previous config saved to /var/cache/conftool/dbconfig/20250917-093550-fceratto.json
  • 09:33 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:29 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:28 Amir1: mass deleting watchlist of bots with > 50K watchlist rows (T404808)
  • 09:25 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:22 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:20 elukey@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-codfw
  • 09:19 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:19 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps2014.codfw.wmnet with OS bookworm
  • 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Bump weight of db1206 in general group (T403966)', diff saved to https://phabricator.wikimedia.org/P83384 and previous config saved to /var/cache/conftool/dbconfig/20250917-091137-ladsgroup.json
  • 09:06 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:02 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:53 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2013.codfw.wmnet with OS bookworm
  • 08:50 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:43 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:42 moritzm: upgrading Envoy on deployment hosts T403663
  • 08:36 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Track events via Prometheus (T402767), hCaptcha: Track events via Prometheus (T402767), hCaptcha: Remove non-existent message, hCaptcha: Remove non-existent message (duration: 13m 41s)
  • 08:33 fabfur: restart pybal on lvs1019/lvs2013/lvs2014 to clear out alert
  • 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2013.codfw.wmnet with reason: host reimage
  • 08:31 kharlan@deploy1003: kharlan: Continuing with sync
  • 08:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:28 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2013.codfw.wmnet with reason: host reimage
  • 08:28 kharlan@deploy1003: kharlan: Backport for hCaptcha: Track events via Prometheus (T402767), hCaptcha: Track events via Prometheus (T402767), hCaptcha: Remove non-existent message, hCaptcha: Remove non-existent message synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:27 moritzm: upgrading Envoy on IDM hosts T403663
  • 08:22 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Track events via Prometheus (T402767), hCaptcha: Track events via Prometheus (T402767), hCaptcha: Remove non-existent message, hCaptcha: Remove non-existent message
  • 08:15 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:13 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps2013.codfw.wmnet with OS bookworm
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2012.codfw.wmnet with OS bookworm
  • 07:53 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:52 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2012.codfw.wmnet with reason: host reimage
  • 07:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2012.codfw.wmnet with reason: host reimage
  • 07:32 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:30 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Enable on phase 1 wikis (T402366) (duration: 21m 08s)
  • 07:25 kharlan@deploy1003: kharlan: Continuing with sync
  • 07:24 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps2012.codfw.wmnet with OS bookworm
  • 07:15 kharlan@deploy1003: kharlan: Backport for hCaptcha: Enable on phase 1 wikis (T402366) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2011.codfw.wmnet with OS bookworm
  • 07:09 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Enable on phase 1 wikis (T402366)
  • 07:02 moritzm: upgrading Envoy on debmonitor T403663
  • 06:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2011.codfw.wmnet with reason: host reimage
  • 06:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2011.codfw.wmnet with reason: host reimage
  • 06:47 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps[2012-2014].codfw.wmnet with reason: in setup
  • 06:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps2011.codfw.wmnet with OS bookworm
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 35s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:38 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
  • 00:38 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
  • 00:37 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
  • 00:37 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply

2025-09-16

  • 23:55 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2002.codfw.wmnet with reason: WIP
  • 23:48 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1002.eqiad.wmnet with reason: WIP
  • 23:42 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: apply
  • 23:41 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: apply
  • 23:40 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 23:40 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 23:38 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 23:38 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 23:37 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 23:37 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 23:35 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 23:35 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 23:34 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 23:34 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 23:33 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 23:32 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 23:32 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 23:31 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 23:29 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 23:28 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 23:28 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 23:27 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 23:27 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:27 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:26 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:26 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:11 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 23:11 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 23:10 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 23:10 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 23:05 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 23:05 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 23:03 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 23:03 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 23:03 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 23:02 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 22:48 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 22:47 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply
  • 22:46 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:46 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:45 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:45 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:44 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 22:43 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 22:43 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 22:42 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 22:33 rzl@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 22:32 rzl@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 22:31 rzl@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 22:31 rzl@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 22:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Bump weight of db1206 in general group (T403966)', diff saved to https://phabricator.wikimedia.org/P83381 and previous config saved to /var/cache/conftool/dbconfig/20250916-223019-ladsgroup.json
  • 22:29 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 22:27 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 22:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Rebalance s4 in codfw (T403966)', diff saved to https://phabricator.wikimedia.org/P83380 and previous config saved to /var/cache/conftool/dbconfig/20250916-222612-ladsgroup.json
  • 22:25 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 22:23 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 22:21 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 22:16 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 22:15 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 22:08 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 22:08 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 22:07 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 21:56 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: apply
  • 21:46 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: apply
  • 21:45 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/kartotherian: apply
  • 21:44 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/kartotherian: apply
  • 21:44 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 21:44 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 21:43 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 21:43 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 21:42 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 21:41 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 21:41 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 21:40 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 21:40 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 21:39 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 21:39 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 21:39 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 21:38 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 21:38 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 21:38 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 21:37 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 21:37 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 21:37 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 21:36 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 21:36 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 21:36 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 21:35 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 21:35 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 21:35 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 21:34 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 21:34 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 21:33 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 21:32 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 21:32 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 21:31 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 21:31 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 21:31 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 21:30 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 21:29 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 21:28 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 21:28 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 21:28 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 21:27 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 21:26 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 21:26 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 21:25 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 21:25 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 21:25 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 21:24 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 21:24 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 21:24 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 21:23 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:22 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:21 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:21 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:21 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
  • 21:21 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
  • 21:20 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 21:19 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 21:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 21:18 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 21:17 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 21:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 21:16 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 21:15 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 21:15 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 21:14 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 21:06 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Set wgHCaptchaApiUrlIntegrityHash and pin secure-api.js version (T404251) (duration: 13m 42s)
  • 21:01 kharlan@deploy1003: kharlan: Continuing with sync
  • 20:58 kharlan@deploy1003: kharlan: Backport for hCaptcha: Set wgHCaptchaApiUrlIntegrityHash and pin secure-api.js version (T404251) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:52 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Set wgHCaptchaApiUrlIntegrityHash and pin secure-api.js version (T404251)
  • 20:49 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Enable version pinning and subresource integrity (T404251), hCaptcha: Enable version pinning and subresource integrity (T404251) (duration: 12m 31s)
  • 20:43 kharlan@deploy1003: kharlan: Continuing with sync
  • 20:42 kharlan@deploy1003: kharlan: Backport for hCaptcha: Enable version pinning and subresource integrity (T404251), hCaptcha: Enable version pinning and subresource integrity (T404251) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:36 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Enable version pinning and subresource integrity (T404251), hCaptcha: Enable version pinning and subresource integrity (T404251)
  • 20:26 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:21 jhuneidi@deploy1003: Finished scap sync-world: Backport for Throttle exemption for Editathon by Wikimedistas en Cruce - 26 September 2025 (T404592) (duration: 12m 27s)
  • 20:21 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 20:16 jhuneidi@deploy1003: jhuneidi, superpes: Continuing with sync
  • 20:16 jhuneidi@deploy1003: jhuneidi, superpes: Backport for Throttle exemption for Editathon by Wikimedistas en Cruce - 26 September 2025 (T404592) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:09 jhuneidi@deploy1003: Started scap sync-world: Backport for Throttle exemption for Editathon by Wikimedistas en Cruce - 26 September 2025 (T404592)
  • 20:00 swfrench-wmf: migrated shellbox (score) to PHP 8.3 - T403284
  • 19:59 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 19:58 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 19:52 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.19 refs T396380
  • 19:48 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:48 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:47 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:43 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:41 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:39 eileen: civicrm upgraded from a847a79d to 636ba9d5
  • 19:37 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 19:36 jhuneidi@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.19 refs T396380 (duration: 42m 03s)
  • 19:10 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1018.eqiad.wmnet with OS bookworm
  • 18:54 jhuneidi@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.19 refs T396380
  • 18:44 urbanecm@deploy1003: Finished scap sync-world: Backport for feat: Allow communities to opt out experienced users from mentorship (T403563) (duration: 20m 52s)
  • 18:38 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: host reimage
  • 18:32 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: host reimage
  • 18:23 urbanecm@deploy1003: Started scap sync-world: Backport for feat: Allow communities to opt out experienced users from mentorship (T403563)
  • 18:17 cdanis@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry2005.codfw.wmnet
  • 18:16 urbanecm@deploy1003: Sync cancelled.
  • 18:14 cdanis@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM registry2005.codfw.wmnet
  • 18:13 cdanis@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry2004.codfw.wmnet
  • 18:13 cdanis: T404742 💙cdanis@ganeti2032.codfw.wmnet ~ 🕑☕ sudo gnt-instance modify -B memory=16g registry2005.codfw.wmnet
  • 18:09 cdanis@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM registry2004.codfw.wmnet
  • 18:08 cdanis: T404742 💙cdanis@ganeti2032.codfw.wmnet ~ 🕑☕ sudo gnt-instance modify -B memory=16g registry2004.codfw.wmnet
  • 18:08 cdanis@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry1005.eqiad.wmnet
  • 18:05 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1018.eqiad.wmnet with OS bookworm
  • 18:03 cdanis@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM registry1005.eqiad.wmnet
  • 18:03 cdanis@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry1004.eqiad.wmnet
  • 17:58 cdanis@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM registry1004.eqiad.wmnet
  • 17:40 cdanis@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry1005.eqiad.wmnet
  • 17:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1056.eqiad.wmnet with OS bookworm
  • 17:39 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 17:36 cdanis@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM registry1005.eqiad.wmnet
  • 17:36 cdanis: T404742 💙cdanis@ganeti1046.eqiad.wmnet ~ 🕜☕ sudo gnt-instance modify -B memory=16g registry1005.eqiad.wmnet
  • 17:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 17:34 cdanis@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM registry1004.eqiad.wmnet
  • 17:33 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 17:33 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 17:32 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: host reimage
  • 17:30 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 17:30 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 17:29 cdanis@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM registry1004.eqiad.wmnet
  • 17:29 cdanis: T404742 💙cdanis@ganeti1046.eqiad.wmnet ~ 🕜☕ sudo gnt-instance modify -B memory=16g registry1004.eqiad.wmnet
  • 17:27 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: host reimage
  • 17:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1056.eqiad.wmnet with reason: host reimage
  • 17:00 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1018.eqiad.wmnet with OS bookworm
  • 16:58 urbanecm@deploy1003: urbanecm: Backport for feat: Allow communities to opt out experienced users from mentorship (T403563) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:29 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:18 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:15 urbanecm@deploy1003: Started scap sync-world: Backport for feat: Allow communities to opt out experienced users from mentorship (T403563)
  • 16:09 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1051.eqiad.wmnet with OS bookworm
  • 16:06 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1052.eqiad.wmnet with OS bookworm
  • 16:04 urbanecm@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.45.0-wmf.17,1.45.0-wmf.18,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/
  • 16:02 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1050.eqiad.wmnet with OS bookworm
  • 15:55 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:52 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 15:48 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1052.eqiad.wmnet with reason: host reimage
  • 15:45 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 15:44 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1052.eqiad.wmnet with reason: host reimage
  • 15:42 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 15:41 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqiad
  • 15:41 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 15:41 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr2-eqiad
  • 15:41 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:41 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-d5-eqiad
  • 15:41 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cloudsw1-d5-eqiad
  • 15:40 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f1-eqiad
  • 15:40 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f1-eqiad
  • 15:40 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-c8-eqiad
  • 15:40 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cloudsw1-c8-eqiad
  • 15:40 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-eqiad
  • 15:40 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:39 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr1-eqiad
  • 15:39 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-b1-codfw
  • 15:39 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cloudsw1-b1-codfw
  • 15:39 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-e4-eqiad
  • 15:39 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cloudsw1-e4-eqiad
  • 15:39 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw1-f4-eqiad
  • 15:39 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cloudsw1-f4-eqiad
  • 15:38 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e1-eqiad
  • 15:38 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e1-eqiad
  • 15:38 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e2-eqiad
  • 15:38 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e2-eqiad
  • 15:38 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e3-eqiad
  • 15:38 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e3-eqiad
  • 15:38 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f2-eqiad
  • 15:38 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f2-eqiad
  • 15:37 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f3-eqiad
  • 15:37 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f3-eqiad
  • 15:37 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr3-ulsfo
  • 15:36 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr3-ulsfo
  • 15:36 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr4-ulsfo
  • 15:36 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr4-ulsfo
  • 15:35 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr3-eqsin
  • 15:35 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:35 jhancock@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-misc2001
  • 15:34 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr3-eqsin
  • 15:34 jhancock@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host mc-misc2001
  • 15:34 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-drmrs
  • 15:34 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr1-drmrs
  • 15:33 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-esams
  • 15:32 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr1-esams
  • 15:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-drmrs
  • 15:31 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr2-drmrs
  • 15:30 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-bw27-esams
  • 15:30 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-bw27-esams
  • 15:30 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-esams
  • 15:29 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr2-esams
  • 15:29 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b12-drmrs
  • 15:29 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-b12-drmrs
  • 15:28 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b13-drmrs
  • 15:28 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-b13-drmrs
  • 15:27 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-by27-esams
  • 15:26 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-by27-esams
  • 15:26 urbanecm@deploy1003: Started scap sync-world: Backport for feat: Allow communities to opt out experienced users from mentorship (T403563)
  • 15:25 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-codfw
  • 15:24 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr2-codfw
  • 15:24 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-codfw
  • 15:24 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr1-codfw
  • 15:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:01 dancy@deploy1003: Finished scap sync-world: Testing for T403882 (duration: 12m 01s)
  • 14:57 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bookworm
  • 14:57 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1052.eqiad.wmnet with OS bookworm
  • 14:57 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1050.eqiad.wmnet with OS bookworm
  • 14:54 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:49 dancy@deploy1003: Started scap sync-world: Testing for T403882
  • 14:40 moritzm: installing libsndfile security updates
  • 14:39 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add inline pattern support - oblivian@cumin1003"
  • 14:39 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add inline pattern support - oblivian@cumin1003
  • 14:38 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add inline pattern support - oblivian@cumin1003
  • 14:38 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add inline pattern support - oblivian@cumin1003"
  • 14:30 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1017.eqiad.wmnet with OS bookworm
  • 14:13 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: host reimage
  • 14:10 tgr: UTC afternoon deploys done
  • 14:09 tgr@deploy1003: Finished scap sync-world: Backport for Enable JWT session cookies on testwiki and beta (T399631) (duration: 17m 04s)
  • 14:09 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: host reimage
  • 14:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Rebalance s4 in eqiad (T403966)', diff saved to https://phabricator.wikimedia.org/P83375 and previous config saved to /var/cache/conftool/dbconfig/20250916-140638-ladsgroup.json
  • 14:03 tgr@deploy1003: tgr: Continuing with sync
  • 14:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Fix db1242 weight in s4 (T403966)', diff saved to https://phabricator.wikimedia.org/P83374 and previous config saved to /var/cache/conftool/dbconfig/20250916-140237-ladsgroup.json
  • 14:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1247 from api group (T403966)', diff saved to https://phabricator.wikimedia.org/P83373 and previous config saved to /var/cache/conftool/dbconfig/20250916-140147-ladsgroup.json
  • 14:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1199 from api group (T403966)', diff saved to https://phabricator.wikimedia.org/P83372 and previous config saved to /var/cache/conftool/dbconfig/20250916-140020-ladsgroup.json
  • 13:57 tgr@deploy1003: tgr: Backport for Enable JWT session cookies on testwiki and beta (T399631) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1160 (candidate master of s4) from api group (T403966)', diff saved to https://phabricator.wikimedia.org/P83371 and previous config saved to /var/cache/conftool/dbconfig/20250916-135542-ladsgroup.json
  • 13:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Bump vslow replicas of s4 in eqiad to 300 (T403966)', diff saved to https://phabricator.wikimedia.org/P83370 and previous config saved to /var/cache/conftool/dbconfig/20250916-135433-ladsgroup.json
  • 13:53 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:52 tgr@deploy1003: Started scap sync-world: Backport for Enable JWT session cookies on testwiki and beta (T399631)
  • 13:51 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:51 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:51 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • {{safesubst:SAL entry|1=13:48 tgr@deploy1003: Finished scap sync-world: Backport for User: Simplify makeUpdateConditions() (T401748), session: Add a mechanism for forcing a refresh (T399200), Use short expiry for JWT cookies (T399200), tests: Update for SessionCookieJwtExpiration added in core (T399200 T404667), [[gerrit:1188765|xLab: Fix instrument to produce valid events}}
  • 13:43 tgr@deploy1003: hueitan, tgr: Continuing with sync
  • 13:41 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1017.eqiad.wmnet with OS bookworm
  • 13:35 tgr@deploy1003: hueitan, tgr: Backport for User: Simplify makeUpdateConditions() (T401748), session: Add a mechanism for forcing a refresh (T399200), Use short expiry for JWT cookies (T399200), tests: Update for SessionCookieJwtExpiration added in core (T399200 T404667), xLab: Fix instrument to produce valid events (T404420)
  • 13:35 claime: repooling cp2041, test inconclusive, rolled back - T402412
  • {{safesubst:SAL entry|1=13:29 tgr@deploy1003: Started scap sync-world: Backport for User: Simplify makeUpdateConditions() (T401748), session: Add a mechanism for forcing a refresh (T399200), Use short expiry for JWT cookies (T399200), tests: Update for SessionCookieJwtExpiration added in core (T399200 T404667), [[gerrit:1188765|xLab: Fix instrument to produce valid events}}
  • 13:17 tgr@deploy1003: Finished scap sync-world: Backport for Lift IP cap for workshop at University of Pretoria on 29-30 September (T404218), Remove feature flag to resolve changelist wikibase link labels (T395674) (duration: 12m 14s)
  • 13:13 ladsgroup@cumin1003: dbctl commit (dc=all): 'Rebalance s7 in codfw (T403966)', diff saved to https://phabricator.wikimedia.org/P83369 and previous config saved to /var/cache/conftool/dbconfig/20250916-131345-ladsgroup.json
  • 13:12 tgr@deploy1003: tgr, joelyrookewmde, anzx: Continuing with sync
  • 13:11 tgr@deploy1003: tgr, joelyrookewmde, anzx: Backport for Lift IP cap for workshop at University of Pretoria on 29-30 September (T404218), Remove feature flag to resolve changelist wikibase link labels (T395674) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1202 from api group (T403966)', diff saved to https://phabricator.wikimedia.org/P83368 and previous config saved to /var/cache/conftool/dbconfig/20250916-130935-ladsgroup.json
  • 13:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1191 from api group (T403966)', diff saved to https://phabricator.wikimedia.org/P83367 and previous config saved to /var/cache/conftool/dbconfig/20250916-130618-ladsgroup.json
  • 13:05 tgr@deploy1003: Started scap sync-world: Backport for Lift IP cap for workshop at University of Pretoria on 29-30 September (T404218), Remove feature flag to resolve changelist wikibase link labels (T395674)
  • 13:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1253 from api group (T403966)', diff saved to https://phabricator.wikimedia.org/P83366 and previous config saved to /var/cache/conftool/dbconfig/20250916-130201-ladsgroup.json
  • 12:18 claime: depooling cp2041 - T402412
  • 12:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1194 from api group (T403966)', diff saved to https://phabricator.wikimedia.org/P83365 and previous config saved to /var/cache/conftool/dbconfig/20250916-121545-ladsgroup.json
  • 12:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Rebalance s7 in eqiad (T403966)', diff saved to https://phabricator.wikimedia.org/P83364 and previous config saved to /var/cache/conftool/dbconfig/20250916-120842-ladsgroup.json
  • 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org
  • 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org
  • 11:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update for routed Ganeti - jmm@cumin2002"
  • 11:18 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update for routed Ganeti - jmm@cumin2002"
  • 10:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T402925)', diff saved to https://phabricator.wikimedia.org/P83363 and previous config saved to /var/cache/conftool/dbconfig/20250916-105944-ladsgroup.json
  • 10:53 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:52 claime: sudo cumin 'A:cp' "enable-puppet 'Deploying multi-dc.lua changes - T402412 - ${USER}'"
  • 10:49 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T402925)', diff saved to https://phabricator.wikimedia.org/P83361 and previous config saved to /var/cache/conftool/dbconfig/20250916-104715-ladsgroup.json
  • 10:47 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:44 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P83360 and previous config saved to /var/cache/conftool/dbconfig/20250916-104436-ladsgroup.json
  • 10:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS bookworm
  • 10:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P83359 and previous config saved to /var/cache/conftool/dbconfig/20250916-103208-ladsgroup.json
  • 10:31 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 10:31 claime: Enabling puppet for testing on cp6011 and cp2041 - T402412 - T400131
  • 10:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P83358 and previous config saved to /var/cache/conftool/dbconfig/20250916-102928-ladsgroup.json
  • 10:27 claime: sudo cumin 'A:cp' "disable-puppet 'Deploying multi-dc.lua changes - T402412 - ${USER}'"
  • 10:19 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1012.eqiad.wmnet with OS trixie
  • 10:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P83357 and previous config saved to /var/cache/conftool/dbconfig/20250916-101700-ladsgroup.json
  • 10:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T402925)', diff saved to https://phabricator.wikimedia.org/P83356 and previous config saved to /var/cache/conftool/dbconfig/20250916-101420-ladsgroup.json
  • 10:10 fabfur: tests looks good, enable puppet on A:cp (T401383)
  • 10:08 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS bookworm
  • 10:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 10:04 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1012.eqiad.wmnet with reason: host reimage
  • 10:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T402925)', diff saved to https://phabricator.wikimedia.org/P83355 and previous config saved to /var/cache/conftool/dbconfig/20250916-100152-ladsgroup.json
  • 09:58 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1012.eqiad.wmnet with reason: host reimage
  • 09:57 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:54 elukey@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
  • 09:52 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:52 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS bookworm
  • 09:49 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 09:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2220 (T402925)', diff saved to https://phabricator.wikimedia.org/P83354 and previous config saved to /var/cache/conftool/dbconfig/20250916-094846-ladsgroup.json
  • 09:48 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 09:47 fabfur: disable puppet on A:cp to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/1188379 (T401383)
  • 09:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2229 (T402925)', diff saved to https://phabricator.wikimedia.org/P83353 and previous config saved to /var/cache/conftool/dbconfig/20250916-094609-ladsgroup.json
  • 09:46 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 09:44 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Document that test2wiki has suggested investigations DB tables (T404594) (duration: 11m 54s)
  • 09:38 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 09:38 dreamyjazz@deploy1003: dreamyjazz: Backport for Document that test2wiki has suggested investigations DB tables (T404594) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS trixie
  • 09:32 dreamyjazz@deploy1003: Started scap sync-world: Backport for Document that test2wiki has suggested investigations DB tables (T404594)
  • 09:31 elukey@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
  • 09:27 elukey: uploaded spicerack_11.7.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
  • 09:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:08 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 09:01 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 07:35 awight: UTC morning deployments finished
  • 07:34 awight@deploy1003: Finished scap sync-world: Backport for XLab\ResourceLoader\Hooks: Add stream to XLAB_STREAMS (duration: 13m 10s)
  • 07:28 awight@deploy1003: hueitan, awight: Continuing with sync
  • 07:27 awight@deploy1003: hueitan, awight: Backport for XLab\ResourceLoader\Hooks: Add stream to XLAB_STREAMS synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:21 awight@deploy1003: Started scap sync-world: Backport for XLab\ResourceLoader\Hooks: Add stream to XLAB_STREAMS
  • 07:18 awight@deploy1003: Finished scap sync-world: Backport for xLab: Update the PageVisit target wiki for MinT readers (T404420) (duration: 14m 35s)
  • 07:12 awight@deploy1003: awight, hueitan: Continuing with sync
  • 07:09 awight@deploy1003: awight, hueitan: Backport for xLab: Update the PageVisit target wiki for MinT readers (T404420) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:03 awight@deploy1003: Started scap sync-world: Backport for xLab: Update the PageVisit target wiki for MinT readers (T404420)
  • 04:44 eileen: civicrm upgraded from eff6c786 to 3c895373
  • 04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.16 (duration: 04m 08s)
  • 03:06 eileen: config revision changed from aca52498 to 76dfebab fundraise-up jobs gone
  • 02:53 eileen: civicrm upgraded from 223550b5 to ebb37b83
  • 00:45 eileen: civicrm upgraded from 223550b5 to ebb37b83
  • 00:29 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 00:29 rzl@deploy1003: helmfile [staging] START helmfile.d/services/zotero: apply
  • 00:29 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 00:29 rzl@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 00:28 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 00:28 rzl@deploy1003: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 00:24 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 00:24 rzl@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 00:24 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 00:24 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 00:23 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 00:23 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 00:23 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 00:23 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 00:23 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 00:23 rzl@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 00:22 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 00:22 rzl@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 00:22 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
  • 00:22 rzl@deploy1003: helmfile [staging] START helmfile.d/services/recommendation-api: apply
  • 00:22 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 00:22 rzl@deploy1003: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 00:21 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 00:21 rzl@deploy1003: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 00:21 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 00:20 rzl@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply
  • 00:20 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 00:20 rzl@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 00:19 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 00:19 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 00:17 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 00:13 rzl@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 00:13 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 00:13 rzl@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 00:11 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 00:09 rzl@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 00:08 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/kartotherian: apply
  • 00:08 rzl@deploy1003: helmfile [staging] START helmfile.d/services/kartotherian: apply
  • 00:04 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 00:04 rzl@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply

2025-09-15

  • 23:56 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 23:56 rzl@deploy1003: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 23:54 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 23:54 rzl@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 23:46 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 23:46 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 23:46 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 23:45 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 23:45 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 23:45 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 23:45 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 23:45 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 23:44 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 23:44 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 23:44 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 23:44 rzl@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 23:44 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 23:44 rzl@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 23:43 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 23:43 rzl@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 23:43 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 23:43 rzl@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply
  • 23:40 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 23:40 rzl@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 23:40 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 23:40 rzl@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 23:40 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 23:39 rzl@deploy1003: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 23:39 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 23:38 rzl@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 23:26 jdrewniak@deploy1003: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 48s)
  • 23:24 jdrewniak@deploy1003: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 11m 48s)
  • 23:04 eileen: * civicrm upgraded from cff6002e to 223550b5
  • 21:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set back the forgotten candidate master weight on s7 codfw (T403966)', diff saved to https://phabricator.wikimedia.org/P83351 and previous config saved to /var/cache/conftool/dbconfig/20250915-212526-ladsgroup.json
  • 21:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db2151 from api group in codfw (T403966)', diff saved to https://phabricator.wikimedia.org/P83349 and previous config saved to /var/cache/conftool/dbconfig/20250915-211838-ladsgroup.json
  • 21:15 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 21:15 rzl@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 21:15 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:15 rzl@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Rebalance s6 in codfw (T403966)', diff saved to https://phabricator.wikimedia.org/P83348 and previous config saved to /var/cache/conftool/dbconfig/20250915-211457-ladsgroup.json
  • 21:14 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 21:14 rzl@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 21:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Rebalance s6 in eqiad (T403966)', diff saved to https://phabricator.wikimedia.org/P83347 and previous config saved to /var/cache/conftool/dbconfig/20250915-211144-ladsgroup.json
  • 21:10 tgr: late UTC deploys done
  • 21:07 tgr@deploy1003: Finished scap sync-world: Backport for Allow creating new WebAuthn passkeys on private wikis (T378402 T354701), Allow ClosedWikiProvider on the local domain on SUL wikis (T393473 T401640), session: Cache JWT JTI in CookieSessionProvider (T399200) (duration: 17m 23s)
  • 21:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Rebalance s5 in eqiad (T403966)', diff saved to https://phabricator.wikimedia.org/P83346 and previous config saved to /var/cache/conftool/dbconfig/20250915-210311-ladsgroup.json
  • 21:01 tgr@deploy1003: tgr: Continuing with sync
  • 20:58 ladsgroup@cumin1003: dbctl commit (dc=all): 'Rebalance s5 in codfw (T403966)', diff saved to https://phabricator.wikimedia.org/P83345 and previous config saved to /var/cache/conftool/dbconfig/20250915-205814-ladsgroup.json
  • 20:56 tgr@deploy1003: tgr: Backport for Allow creating new WebAuthn passkeys on private wikis (T378402 T354701), Allow ClosedWikiProvider on the local domain on SUL wikis (T393473 T401640), session: Cache JWT JTI in CookieSessionProvider (T399200) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:50 tgr@deploy1003: Started scap sync-world: Backport for Allow creating new WebAuthn passkeys on private wikis (T378402 T354701), Allow ClosedWikiProvider on the local domain on SUL wikis (T393473 T401640), session: Cache JWT JTI in CookieSessionProvider (T399200)
  • 20:49 ladsgroup@cumin1003: dbctl commit (dc=all): 'Revert: Removing one replica from api group in eqiad s2 (T403966)', diff saved to https://phabricator.wikimedia.org/P83344 and previous config saved to /var/cache/conftool/dbconfig/20250915-204922-ladsgroup.json
  • 20:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Removing one replica from api group in eqiad s2 (T403966)', diff saved to https://phabricator.wikimedia.org/P83343 and previous config saved to /var/cache/conftool/dbconfig/20250915-204613-ladsgroup.json
  • 20:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Removing one replica from api group in codfw s2 (T403966)', diff saved to https://phabricator.wikimedia.org/P83342 and previous config saved to /var/cache/conftool/dbconfig/20250915-204251-ladsgroup.json
  • 20:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Rebalance s2 in codfw in api group (T403966)', diff saved to https://phabricator.wikimedia.org/P83341 and previous config saved to /var/cache/conftool/dbconfig/20250915-203425-ladsgroup.json
  • 20:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Rebalance s2 in codfw (T403966)', diff saved to https://phabricator.wikimedia.org/P83340 and previous config saved to /var/cache/conftool/dbconfig/20250915-203337-ladsgroup.json
  • 20:33 cjming@deploy1003: Finished scap sync-world: Backport for Prevent Curation toolbar from preventDefaulting all left click pointer events (T404405) (duration: 11m 48s)
  • 20:27 cjming@deploy1003: cjming, soda: Continuing with sync
  • 20:27 cjming@deploy1003: cjming, soda: Backport for Prevent Curation toolbar from preventDefaulting all left click pointer events (T404405) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:21 cjming@deploy1003: Started scap sync-world: Backport for Prevent Curation toolbar from preventDefaulting all left click pointer events (T404405)
  • 18:21 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for people1005.eqiad.wmnet
  • 18:21 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for people1005.eqiad.wmnet
  • 18:21 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for people2004.codfw.wmnet
  • 18:21 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for people2004.codfw.wmnet
  • 18:00 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on people1005.eqiad.wmnet with reason: in setup
  • 18:00 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on people2004.codfw.wmnet with reason: in setup
  • 17:54 dzahn@dns1004: END - running authdns-update
  • 17:52 dzahn@dns1004: START - running authdns-update
  • 17:41 swfrench-wmf: migrated shellbox-media to PHP 8.3 - T403284
  • 17:41 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 17:40 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 17:34 cdanis: 💙cdanis@cumin1003.eqiad.wmnet ~ 🕜☕ sudo cumin 'C:profile::druid::turnilo' 'run-puppet-agent' && sudo cumin 'C:profile::druid::turnilo' 'systemctl restart turnilo'
  • 17:10 jgleeson: payments-wiki upgraded from bea3cdfa to 1c58560c
  • 17:07 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 17:06 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 17:05 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 17:05 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 16:01 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:47 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:41 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:19 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:17 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:17 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:16 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2220 gradually with 4 steps - Pooling in after schema change
  • 15:15 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:48 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2229* gradually with 4 steps - Pool in after flip
  • 14:31 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:31 joal@deploy1003: Finished deploy [analytics/refinery@edfea88] (thin): Unique-devices change for unified routing THIN [analytics/refinery@edfea882] (duration: 01m 17s)
  • 14:30 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2220 gradually with 4 steps - Pooling in after schema change
  • 14:30 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Set $wgPHPSessionHandling to 'disable' on remaining wikis (T362324) (duration: 15m 49s)
  • 14:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Remove db1254 from api group (T403966)', diff saved to https://phabricator.wikimedia.org/P83335 and previous config saved to /var/cache/conftool/dbconfig/20250915-143008-ladsgroup.json
  • 14:29 joal@deploy1003: Started deploy [analytics/refinery@edfea88] (thin): Unique-devices change for unified routing THIN [analytics/refinery@edfea882]
  • 14:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 14:26 joal@deploy1003: Finished deploy [analytics/refinery@edfea88]: Unique-devices change for unified routing [analytics/refinery@edfea882] (duration: 04m 12s)
  • 14:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Rebalance s2 replicas in eqiad (T403966)', diff saved to https://phabricator.wikimedia.org/P83334 and previous config saved to /var/cache/conftool/dbconfig/20250915-142530-ladsgroup.json
  • 14:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depool db2220 T404595', diff saved to https://phabricator.wikimedia.org/P83333 and previous config saved to /var/cache/conftool/dbconfig/20250915-142436-fceratto.json
  • 14:22 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, tgr: Continuing with sync
  • 14:22 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2218 to s7 primary T404595', diff saved to https://phabricator.wikimedia.org/P83332 and previous config saved to /var/cache/conftool/dbconfig/20250915-142221-fceratto.json
  • 14:22 joal@deploy1003: Started deploy [analytics/refinery@edfea88]: Unique-devices change for unified routing [analytics/refinery@edfea882]
  • 14:21 joal@deploy1003: Finished deploy [analytics/refinery@edfea88] (hadoop-test): Unique-devices change for unified routing TEST [analytics/refinery@edfea882] (duration: 01m 09s)
  • 14:21 federico3: Starting s7 codfw failover from db2220 to db2218 - T404595
  • 14:20 joal@deploy1003: Started deploy [analytics/refinery@edfea88] (hadoop-test): Unique-devices change for unified routing TEST [analytics/refinery@edfea882]
  • 14:20 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, tgr: Backport for Set $wgPHPSessionHandling to 'disable' on remaining wikis (T362324) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:14 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Set $wgPHPSessionHandling to 'disable' on remaining wikis (T362324)
  • 14:14 fceratto@cumin1002: dbctl commit (dc=all): 'Remove db2218 from API/vslow/dump T404595', diff saved to https://phabricator.wikimedia.org/P83330 and previous config saved to /var/cache/conftool/dbconfig/20250915-141412-fceratto.json
  • 14:13 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2218 with weight 0 T404595', diff saved to https://phabricator.wikimedia.org/P83329 and previous config saved to /var/cache/conftool/dbconfig/20250915-141343-fceratto.json
  • 14:12 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for SECURITY: Do not let getErrorMessages() etc. return HTML ever, at least for now (T404392), SECURITY: Do not let error type labels or arguments return HTML either (T404392) (duration: 36m 28s)
  • 14:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T404595
  • 14:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Bump general weight of db1259 to 500 (T403966)', diff saved to https://phabricator.wikimedia.org/P83328 and previous config saved to /var/cache/conftool/dbconfig/20250915-140822-ladsgroup.json
  • 14:05 fceratto@cumin1002: dbctl commit (dc=all): 'Setting db2203 weight', diff saved to https://phabricator.wikimedia.org/P83327 and previous config saved to /var/cache/conftool/dbconfig/20250915-140555-fceratto.json
  • 14:03 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2229* gradually with 4 steps - Pool in after flip
  • 14:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Removing db1259 from dumps/vslow group (T403966)', diff saved to https://phabricator.wikimedia.org/P83325 and previous config saved to /var/cache/conftool/dbconfig/20250915-140115-ladsgroup.json
  • 13:59 lucaswerkmeister-wmde@deploy1003: jforrester, lucaswerkmeister-wmde: Continuing with sync
  • 13:59 lucaswerkmeister-wmde@deploy1003: jforrester, lucaswerkmeister-wmde: Backport for SECURITY: Do not let getErrorMessages() etc. return HTML ever, at least for now (T404392), SECURITY: Do not let error type labels or arguments return HTML either (T404392) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:53 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:50 Dreamy_Jazz: Created suggested investigation database tables on test2wiki - T404594
  • 13:49 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Fix weight of s3 replicas in codfw (T403966)', diff saved to https://phabricator.wikimedia.org/P83324 and previous config saved to /var/cache/conftool/dbconfig/20250915-134537-ladsgroup.json
  • 13:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Bump weight of db1211 (T403966)', diff saved to https://phabricator.wikimedia.org/P83323 and previous config saved to /var/cache/conftool/dbconfig/20250915-134220-ladsgroup.json
  • 13:37 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:37 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:36 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:36 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:36 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:36 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for SECURITY: Do not let getErrorMessages() etc. return HTML ever, at least for now (T404392), SECURITY: Do not let error type labels or arguments return HTML either (T404392)
  • 13:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depool db2229 T404586', diff saved to https://phabricator.wikimedia.org/P83322 and previous config saved to /var/cache/conftool/dbconfig/20250915-133322-fceratto.json
  • 13:31 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2214 to s6 primary T404586', diff saved to https://phabricator.wikimedia.org/P83321 and previous config saved to /var/cache/conftool/dbconfig/20250915-133108-fceratto.json
  • 13:28 federico3: Starting s6 codfw failover from db2229 to db2214 - T404586
  • 13:24 jgleeson: SmashPig upgraded from 4206f06c to 70316e96
  • 12:59 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2214 with weight 0 T404586', diff saved to https://phabricator.wikimedia.org/P83320 and previous config saved to /var/cache/conftool/dbconfig/20250915-125903-fceratto.json
  • 12:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s6 T404586
  • 11:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T402925)', diff saved to https://phabricator.wikimedia.org/P83313 and previous config saved to /var/cache/conftool/dbconfig/20250915-115527-ladsgroup.json
  • 11:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P83312 and previous config saved to /var/cache/conftool/dbconfig/20250915-114020-ladsgroup.json
  • 11:35 fabfur: restarting pybal on lvs1020 to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1188309 (T404388)
  • 11:35 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1160* gradually with 4 steps - Work done
  • 11:32 btullis@cumin1003: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 11:26 btullis@cumin1003: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 11:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P83310 and previous config saved to /var/cache/conftool/dbconfig/20250915-112512-ladsgroup.json
  • 11:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T402925)', diff saved to https://phabricator.wikimedia.org/P83308 and previous config saved to /var/cache/conftool/dbconfig/20250915-111005-ladsgroup.json
  • 11:08 jiji@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
  • 10:55 jiji@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
  • 10:44 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2207 (T402925)', diff saved to https://phabricator.wikimedia.org/P83305 and previous config saved to /var/cache/conftool/dbconfig/20250915-104420-ladsgroup.json
  • 10:44 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 10:40 jayme@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1002"
  • 10:40 jayme@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1002
  • 10:39 jayme@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1002
  • 10:39 jayme@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1002"
  • 10:38 btullis@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 10:20 btullis@cumin1003: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 10:17 jiji@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
  • 10:15 ladsgroup@deploy1003: Finished scap sync-world: Backport: Reduce db lock timeout in LinksUpdate and CategoryMembershipChangeJob (T366938) (duration: 42m 06s)
  • 10:14 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1160* gradually with 4 steps - Work done
  • 10:09 jiji@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
  • 10:04 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
  • 10:02 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
  • 09:57 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/termbox: sync
  • 09:56 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/termbox: sync
  • 09:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: sync
  • 09:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: sync
  • 09:53 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync
  • 09:52 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/linkrecommendation: sync
  • 09:47 btullis@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-jumbo-eqiad
  • 09:41 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 09:40 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:37 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 09:37 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 09:34 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
  • 09:34 elukey@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: sync
  • 09:34 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 09:33 elukey@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:33 ladsgroup@deploy1003: Started scap sync-world: Backport: Reduce db lock timeout in LinksUpdate and CategoryMembershipChangeJob (T366938)
  • 09:33 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 09:33 elukey@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 09:32 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
  • 09:31 elukey@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
  • 09:30 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/termbox: sync
  • 09:30 stevemunene@cumin1003: END (PASS) - Cookbook sre.k8s.wipe-cluster (exit_code=0) Wipe the K8s cluster dse-codfw: Cleanup the dse-k8s-codfw cluster
  • 09:30 effie: stopping puppet on A:lvs-low-traffic-eqiad and A:lvs-low-traffic-codfw
  • 09:30 elukey@deploy1003: helmfile [staging] START helmfile.d/services/termbox: sync
  • 09:28 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 09:25 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 09:20 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync
  • 09:20 elukey@deploy1003: helmfile [staging] START helmfile.d/services/linkrecommendation: sync
  • 09:20 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 09:19 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 09:13 stevemunene@cumin1003: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster dse-codfw: Cleanup the dse-k8s-codfw cluster
  • 08:54 btullis@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
  • 08:15 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 08:14 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 08:07 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 08:04 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 07:44 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jly out of all services on: 2418 hosts
  • 07:39 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jly out of all services on: 2418 hosts
  • 07:18 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1011.eqiad.wmnet with reason: in setup
  • 07:17 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging jwheeler out of all services on: 2418 hosts
  • 06:56 moritzm: reindex gis database on maps1011 following initial OSM import T381565
  • 06:55 jynus: restarted atftpd on install1004

2025-09-13

  • 02:25 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 02:25 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 02:25 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 02:25 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 02:25 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 02:24 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply

2025-09-12

  • 21:25 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on zuul1002.eqiad.wmnet with reason: in setup
  • 21:25 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on zuul2002.codfw.wmnet with reason: in setup
  • 20:20 ladsgroup@cumin1003: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1160* gradually with 4 steps - Work done
  • 20:05 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 20:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T402763)', diff saved to https://phabricator.wikimedia.org/P83301 and previous config saved to /var/cache/conftool/dbconfig/20250912-200538-ladsgroup.json
  • 19:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P83300 and previous config saved to /var/cache/conftool/dbconfig/20250912-195030-ladsgroup.json
  • 19:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P83299 and previous config saved to /var/cache/conftool/dbconfig/20250912-193523-ladsgroup.json
  • 19:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T402763)', diff saved to https://phabricator.wikimedia.org/P83298 and previous config saved to /var/cache/conftool/dbconfig/20250912-192015-ladsgroup.json
  • 19:13 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1226 (T402763)', diff saved to https://phabricator.wikimedia.org/P83297 and previous config saved to /var/cache/conftool/dbconfig/20250912-191338-ladsgroup.json
  • 19:13 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 19:13 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T402763)', diff saved to https://phabricator.wikimedia.org/P83296 and previous config saved to /var/cache/conftool/dbconfig/20250912-191314-ladsgroup.json
  • 18:58 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P83295 and previous config saved to /var/cache/conftool/dbconfig/20250912-185807-ladsgroup.json
  • 18:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P83294 and previous config saved to /var/cache/conftool/dbconfig/20250912-184259-ladsgroup.json
  • 18:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T402763)', diff saved to https://phabricator.wikimedia.org/P83293 and previous config saved to /var/cache/conftool/dbconfig/20250912-182752-ladsgroup.json
  • 18:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1214 (T402763)', diff saved to https://phabricator.wikimedia.org/P83292 and previous config saved to /var/cache/conftool/dbconfig/20250912-182126-ladsgroup.json
  • 18:21 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T402763)', diff saved to https://phabricator.wikimedia.org/P83291 and previous config saved to /var/cache/conftool/dbconfig/20250912-182104-ladsgroup.json
  • 18:14 mutante: DNS - added new project language 'tok' (tok.wikipedia.org) (Toki Pona) https://en.wikipedia.org/wiki/Toki_Pona - T404457
  • 18:13 dzahn@dns1004: END - running authdns-update
  • 18:11 dzahn@dns1004: START - running authdns-update
  • 18:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P83290 and previous config saved to /var/cache/conftool/dbconfig/20250912-180557-ladsgroup.json
  • 17:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P83289 and previous config saved to /var/cache/conftool/dbconfig/20250912-175049-ladsgroup.json
  • 17:49 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 17:49 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 17:48 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:48 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T402763)', diff saved to https://phabricator.wikimedia.org/P83288 and previous config saved to /var/cache/conftool/dbconfig/20250912-173542-ladsgroup.json
  • 17:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1209 (T402763)', diff saved to https://phabricator.wikimedia.org/P83287 and previous config saved to /var/cache/conftool/dbconfig/20250912-172909-ladsgroup.json
  • 17:29 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T402763)', diff saved to https://phabricator.wikimedia.org/P83286 and previous config saved to /var/cache/conftool/dbconfig/20250912-172847-ladsgroup.json
  • 17:13 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P83285 and previous config saved to /var/cache/conftool/dbconfig/20250912-171339-ladsgroup.json
  • 16:58 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P83284 and previous config saved to /var/cache/conftool/dbconfig/20250912-165832-ladsgroup.json
  • 16:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T402763)', diff saved to https://phabricator.wikimedia.org/P83283 and previous config saved to /var/cache/conftool/dbconfig/20250912-164324-ladsgroup.json
  • 16:38 brett: Manually running clean-stale-certs.service on acmechief2002 - T399419
  • 16:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1203 (T402763)', diff saved to https://phabricator.wikimedia.org/P83281 and previous config saved to /var/cache/conftool/dbconfig/20250912-163605-ladsgroup.json
  • 16:35 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T402763)', diff saved to https://phabricator.wikimedia.org/P83280 and previous config saved to /var/cache/conftool/dbconfig/20250912-163541-ladsgroup.json
  • 16:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P83279 and previous config saved to /var/cache/conftool/dbconfig/20250912-162033-ladsgroup.json
  • 16:15 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1236.eqiad.wmnet with OS bullseye
  • 16:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P83278 and previous config saved to /var/cache/conftool/dbconfig/20250912-160526-ladsgroup.json
  • 16:01 herron@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw
  • 15:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T402763)', diff saved to https://phabricator.wikimedia.org/P83277 and previous config saved to /var/cache/conftool/dbconfig/20250912-155018-ladsgroup.json
  • 15:43 herron@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw
  • 15:43 herron@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad
  • 15:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1192 (T402763)', diff saved to https://phabricator.wikimedia.org/P83276 and previous config saved to /var/cache/conftool/dbconfig/20250912-154253-ladsgroup.json
  • 15:42 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T402763)', diff saved to https://phabricator.wikimedia.org/P83275 and previous config saved to /var/cache/conftool/dbconfig/20250912-154231-ladsgroup.json
  • 15:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P83274 and previous config saved to /var/cache/conftool/dbconfig/20250912-152724-ladsgroup.json
  • 15:21 herron@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad
  • 15:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P83272 and previous config saved to /var/cache/conftool/dbconfig/20250912-151216-ladsgroup.json
  • 15:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T402763)', diff saved to https://phabricator.wikimedia.org/P83271 and previous config saved to /var/cache/conftool/dbconfig/20250912-145709-ladsgroup.json
  • 14:49 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1178 (T402763)', diff saved to https://phabricator.wikimedia.org/P83270 and previous config saved to /var/cache/conftool/dbconfig/20250912-144934-ladsgroup.json
  • 14:49 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 14:49 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T402763)', diff saved to https://phabricator.wikimedia.org/P83269 and previous config saved to /var/cache/conftool/dbconfig/20250912-144911-ladsgroup.json
  • 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1236.eqiad.wmnet with OS bullseye
  • 14:40 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1235.eqiad.wmnet with OS bullseye
  • 14:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P83268 and previous config saved to /var/cache/conftool/dbconfig/20250912-143404-ladsgroup.json
  • 14:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P83267 and previous config saved to /var/cache/conftool/dbconfig/20250912-141856-ladsgroup.json
  • 14:11 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1235.eqiad.wmnet with OS bullseye
  • 14:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T402763)', diff saved to https://phabricator.wikimedia.org/P83266 and previous config saved to /var/cache/conftool/dbconfig/20250912-140348-ladsgroup.json
  • 13:57 jmm@dns1004: END - running authdns-update
  • 13:56 jmm@dns1004: START - running authdns-update
  • 13:53 jmm@dns1004: END - running authdns-update
  • 13:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1177 (T402763)', diff saved to https://phabricator.wikimedia.org/P83265 and previous config saved to /var/cache/conftool/dbconfig/20250912-135309-ladsgroup.json
  • 13:53 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 13:52 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T402763)', diff saved to https://phabricator.wikimedia.org/P83264 and previous config saved to /var/cache/conftool/dbconfig/20250912-135246-ladsgroup.json
  • 13:52 jmm@dns1004: START - running authdns-update
  • 13:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P83262 and previous config saved to /var/cache/conftool/dbconfig/20250912-133739-ladsgroup.json
  • 13:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P83261 and previous config saved to /var/cache/conftool/dbconfig/20250912-132231-ladsgroup.json
  • 13:21 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1235.eqiad.wmnet with OS bullseye
  • 13:07 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T402763)', diff saved to https://phabricator.wikimedia.org/P83260 and previous config saved to /var/cache/conftool/dbconfig/20250912-130724-ladsgroup.json
  • 12:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1172 (T402763)', diff saved to https://phabricator.wikimedia.org/P83259 and previous config saved to /var/cache/conftool/dbconfig/20250912-125949-ladsgroup.json
  • 12:59 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T402763)', diff saved to https://phabricator.wikimedia.org/P83256 and previous config saved to /var/cache/conftool/dbconfig/20250912-125309-ladsgroup.json
  • 12:38 logmsgbot: jforrester Deployed security patch for T404392
  • 12:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P83254 and previous config saved to /var/cache/conftool/dbconfig/20250912-123801-ladsgroup.json
  • 12:29 arnaudb@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Update
  • 12:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P83253 and previous config saved to /var/cache/conftool/dbconfig/20250912-122254-ladsgroup.json
  • 12:07 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T402763)', diff saved to https://phabricator.wikimedia.org/P83252 and previous config saved to /var/cache/conftool/dbconfig/20250912-120746-ladsgroup.json
  • 12:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1167 (T402763)', diff saved to https://phabricator.wikimedia.org/P83251 and previous config saved to /var/cache/conftool/dbconfig/20250912-120106-ladsgroup.json
  • 12:00 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1160* gradually with 4 steps - Work done
  • 11:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:51 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1235.eqiad.wmnet with OS bullseye
  • 11:48 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1235.eqiad.wmnet with OS bullseye
  • 11:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:35 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:35 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:29 jynus@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db2202.codfw.wmnet
  • 11:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:26 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:18 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host db2202.codfw.wmnet
  • 11:18 jynus@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host db2202.codfw.wmnet
  • 11:17 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host db2202.codfw.wmnet
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host db1300.eqiad.wmnet
  • 11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1300.eqiad.wmnet
  • 10:55 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1235.eqiad.wmnet with OS bullseye
  • 10:43 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1235.eqiad.wmnet with OS bullseye
  • 10:25 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1235.eqiad.wmnet with OS bullseye
  • 10:25 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1235.eqiad.wmnet with OS bullseye
  • 10:23 elukey: upgrade spicerack to 0.11.6 to all cumin hosts
  • 10:17 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:17 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:16 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:14 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:12 elukey: uploaded spicerack_11.6.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
  • 10:11 jgleeson: SmashPig upgraded from 19ea35fc to 4206f06c
  • 10:06 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1235.eqiad.wmnet with OS bullseye
  • 10:06 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1235.eqiad.wmnet with OS bullseye
  • 09:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1235.eqiad.wmnet with OS bullseye
  • 09:36 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1234.eqiad.wmnet with OS bullseye
  • 09:36 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 09:30 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 09:15 awight@deploy1003: Finished scap sync-world: Backport for Revert "Remove refs from reference lists if there are no references left to them" (T356471) (duration: 13m 14s)
  • 09:14 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1234.eqiad.wmnet with reason: host reimage
  • 09:10 awight@deploy1003: awight: Continuing with sync
  • 09:09 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1234.eqiad.wmnet with reason: host reimage
  • 09:08 awight@deploy1003: awight: Backport for Revert "Remove refs from reference lists if there are no references left to them" (T356471) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:02 awight@deploy1003: Started scap sync-world: Backport for Revert "Remove refs from reference lists if there are no references left to them" (T356471)
  • 08:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1234.eqiad.wmnet with OS bullseye
  • 08:43 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "T398438 - btullis@cumin1003"
  • 08:43 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "T398438 - btullis@cumin1003"
  • 08:43 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1233.eqiad.wmnet with OS bullseye
  • 08:43 btullis@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 08:30 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1014.eqiad.wmnet with OS bookworm
  • 08:30 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 08:22 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:19 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 08:18 elukey@cumin2002: START - Cookbook sre.hosts.provision for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:16 elukey: uploaded spicerack_11.5.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
  • 08:03 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1014.eqiad.wmnet with reason: host reimage
  • 07:59 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1014.eqiad.wmnet with reason: host reimage
  • 07:53 fabfur: temporary upgrading haproxykafka on cp7001 to a test version to check for possible encoding issues (T401383)
  • 07:40 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1014.eqiad.wmnet with OS bookworm
  • 07:37 moritzm: installing libcpanel-json-xs-perl security updates
  • 07:33 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1014.eqiad.wmnet with OS bookworm
  • 07:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1014.eqiad.wmnet with OS bookworm
  • 05:30 arnaudb@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Update

2025-09-11

  • 23:28 jforrester@deploy1003: Finished scap sync-world: Backport for Surface custom errors on ZObjectStringRenderer and FunctionInputParser fields (T395475) (duration: 13m 01s)
  • 23:21 jforrester@deploy1003: jforrester: Continuing with sync
  • 23:21 jforrester@deploy1003: jforrester: Backport for Surface custom errors on ZObjectStringRenderer and FunctionInputParser fields (T395475) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:15 jforrester@deploy1003: Started scap sync-world: Backport for Surface custom errors on ZObjectStringRenderer and FunctionInputParser fields (T395475)
  • 22:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1014.eqiad.wmnet with OS bookworm
  • 22:20 swfrench@deploy1003: Finished scap sync-world: Backport for Configure cookie-based enrollment in PHP 8.3 (T403657) (duration: 41m 28s)
  • 22:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1014.eqiad.wmnet with OS bookworm
  • 22:08 swfrench@deploy1003: swfrench: Continuing with sync
  • 22:04 swfrench@deploy1003: swfrench: Backport for Configure cookie-based enrollment in PHP 8.3 (T403657) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:39 swfrench@deploy1003: Started scap sync-world: Backport for Configure cookie-based enrollment in PHP 8.3 (T403657)
  • 21:30 logmsgbot: jforrester Deployed security patch for T404392
  • 20:40 cscott@deploy1003: Finished scap sync-world: Backport for Deploy Parsoid Read Views to 23 Wikipedias (T404390), Configure high-risk countries for CampaignEvents (T402353) (duration: 14m 02s)
  • 20:35 cscott@deploy1003: daimona, cscott: Continuing with sync
  • 20:32 cscott@deploy1003: daimona, cscott: Backport for Deploy Parsoid Read Views to 23 Wikipedias (T404390), Configure high-risk countries for CampaignEvents (T402353) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:26 cscott@deploy1003: Started scap sync-world: Backport for Deploy Parsoid Read Views to 23 Wikipedias (T404390), Configure high-risk countries for CampaignEvents (T402353)
  • 20:20 sbassett@deploy1003: Finished scap sync-world: Backport for Optionally encrypt OTP secret in the database (T145915) (duration: 12m 38s)
  • 20:14 sbassett@deploy1003: sbassett: Continuing with sync
  • 20:13 sbassett@deploy1003: sbassett: Backport for Optionally encrypt OTP secret in the database (T145915) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:07 sbassett@deploy1003: Started scap sync-world: Backport for Optionally encrypt OTP secret in the database (T145915)
  • 19:55 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 19:38 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 19:34 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 19:16 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 19:01 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2006-dev.codfw.wmnet with OS bookworm
  • 18:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2006-dev.codfw.wmnet with reason: host reimage
  • 18:40 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2006-dev.codfw.wmnet with reason: host reimage
  • 18:21 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2006-dev.codfw.wmnet with OS bookworm
  • 18:21 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1056.eqiad.wmnet with OS bookworm
  • 18:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1056.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:14 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.18 refs T396379
  • 18:13 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1056.eqiad.wmnet with OS bookworm
  • 18:06 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:05 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:05 jhancock@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker2003
  • 18:05 jhancock@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker2003
  • 18:04 jhancock@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:04 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dse-k8s-worker2003 to codfw - jhancock@cumin1002"
  • 18:02 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dse-k8s-worker2003 to codfw - jhancock@cumin1002"
  • 17:59 jhancock@cumin1002: START - Cookbook sre.dns.netbox
  • 17:57 jhancock@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host es1056.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:54 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1056.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:54 jhancock@cumin1002: START - Cookbook sre.dns.netbox
  • 17:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host es1056.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1056.eqiad.wmnet with OS bookworm
  • 17:20 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:17 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:16 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:15 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:14 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:05 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:56 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 16:27 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:20 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:17 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:16 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:15 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:14 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:14 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:13 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:12 jhancock@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2006
  • 16:12 jhancock@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2006
  • 16:10 jhancock@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:10 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-ctrl2006 to codfw - jhancock@cumin1002"
  • 16:10 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-ctrl2006 to codfw - jhancock@cumin1002"
  • 16:04 jhancock@cumin1002: START - Cookbook sre.dns.netbox
  • 15:52 sukhe@dns1004: END - running authdns-update
  • 15:51 sukhe@dns1004: START - running authdns-update
  • 15:50 sukhe@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=proxoid
  • 15:44 sukhe: sudo cumin 'A:dnsbox' run-puppet-agent
  • 15:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1233.eqiad.wmnet with reason: host reimage
  • 15:42 sukhe: lvs201[34]: restart pybal to test proxoid service
  • 15:42 sukhe: lvs1019: restart pybal to test proxoid service
  • 15:39 sukhe: sudo cumin 'A:lvs and (A:eqiad or A:codfw)' 'run-puppet-agent --enable "adding new service proxoid"'
  • 15:39 sukhe: restarting pybal on lvs201[34], lvs1016 for proxoid change
  • 15:38 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1233.eqiad.wmnet with reason: host reimage
  • 15:37 sukhe: lvs1020: restart pybal to test proxoid service
  • 15:35 sukhe: sudo cumin 'A:lvs and (A:eqiad or A:codfw)' 'disable-puppet "adding new service proxoid"': T403416
  • 15:31 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2005-dev.codfw.wmnet with OS bookworm
  • 15:31 sukhe: sudo cumin "O:url_downloader" "run-puppet-agent": T403416
  • 15:22 aokoth@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on gitlab2002.wikimedia.org with reason: Upgrade
  • 15:21 aokoth@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on gitlab1003.wikimedia.org with reason: Upgrade
  • 15:15 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1233.eqiad.wmnet with OS bullseye
  • 15:13 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2005-dev.codfw.wmnet with reason: host reimage
  • 15:06 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2005-dev.codfw.wmnet with reason: host reimage
  • 14:59 ejegg: fundraising civicrm upgraded from 04298941 to cff6002e
  • 14:57 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: sync
  • 14:57 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: sync
  • 14:57 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: sync
  • 14:56 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: sync
  • 14:54 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/page-analytics: sync
  • 14:54 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/page-analytics: sync
  • 14:54 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: sync
  • 14:54 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: sync
  • 14:53 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: sync
  • 14:53 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: sync
  • 14:53 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: sync
  • 14:52 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: sync
  • 14:52 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: sync
  • 14:51 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/image-suggestion: sync
  • 14:51 swfrench-wmf: incrementally running puppet agent on A:cp - T403655
  • 14:51 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/image-suggestion: sync
  • 14:51 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/image-suggestion: sync
  • 14:49 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: sync
  • 14:49 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: sync
  • 14:48 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/geo-analytics: sync
  • 14:48 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/geo-analytics: sync
  • 14:48 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2005-dev.codfw.wmnet with OS bookworm
  • 14:48 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: sync
  • 14:47 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: sync
  • 14:47 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/editor-analytics: sync
  • 14:46 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/editor-analytics: sync
  • 14:45 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: sync
  • 14:45 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: sync
  • 14:45 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/edit-analytics: sync
  • 14:45 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/edit-analytics: sync
  • 14:44 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 14:42 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: sync
  • 14:42 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 14:41 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: sync
  • 14:41 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/device-analytics: sync
  • 14:41 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/device-analytics: sync
  • 14:37 swfrench-wmf: disabled puppet on A:cp - T403655
  • 14:27 vgutierrez: repool cp7001
  • 14:26 swfrench-wmf: migrated shellbox-constraints to PHP 8.3 - T403284
  • 14:26 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:25 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 14:25 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 14:18 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proxoid.svc.eqiad.wmnet on all recursors
  • 14:18 sukhe@cumin1003: START - Cookbook sre.dns.wipe-cache proxoid.svc.eqiad.wmnet on all recursors
  • 14:17 sukhe@dns1004: END - running authdns-update
  • 14:16 sukhe@dns1004: START - running authdns-update
  • 14:08 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:08 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding proxoid service IPs - sukhe@cumin1003"
  • 14:08 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding proxoid service IPs - sukhe@cumin1003"
  • 14:06 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2006
  • 14:06 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2006
  • 14:04 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 14:04 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:03 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2006
  • 14:03 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 14:03 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2006
  • 14:01 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Maintenance
  • 14:01 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:01 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 14:00 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Maintenance
  • 13:58 sukhe@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: cluster=proxoid,service=nginx [reason: setting weight for proxoid]
  • 13:57 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2006
  • 13:57 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2006
  • 13:56 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host sretest2006
  • 13:56 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2006
  • 13:55 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Reboot
  • 13:52 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1012.eqiad.wmnet
  • 13:50 moritzm: installing kitty security updates
  • 13:44 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1012.eqiad.wmnet
  • 13:39 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1184* gradually with 4 steps - Work done
  • 13:32 vgutierrez: depool cp7001
  • 13:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 13:26 ayounsi@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 13:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 13:26 ayounsi@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 13:23 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2207* gradually with 4 steps - Work done
  • 13:22 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1002
  • 13:22 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1002
  • 13:15 moritzm: installing imagemagick security updates
  • 13:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1002
  • 13:05 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1002
  • 13:01 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 13:01 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie
  • 12:56 jgleeson: payments-wiki upgraded from 9988ba11 to bea3cdfa
  • 12:55 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 12:54 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1184* gradually with 4 steps - Work done
  • 12:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1236.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:46 btullis@cumin1003: START - Cookbook sre.hosts.provision for host an-worker1236.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:41 XioNoX: push pfw policy - T404256
  • 12:37 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db2207* gradually with 4 steps - Work done
  • 12:36 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2207.codfw.wmnet
  • 12:32 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 12:30 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2207 - Upgrading db2207.codfw.wmnet
  • 12:30 ladsgroup@cumin1003: START - Cookbook sre.mysql.depool db2207 - Upgrading db2207.codfw.wmnet
  • 12:30 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db2207.codfw.wmnet
  • 12:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Upgrade db2207 for semi-sync bug', diff saved to https://phabricator.wikimedia.org/P83240 and previous config saved to /var/cache/conftool/dbconfig/20250911-122956-ladsgroup.json
  • 12:25 jgleeson: SmashPig upgraded from d73c3d9a to 19ea35fc
  • 12:19 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Update
  • 12:13 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 12:13 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:13 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:12 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:11 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:11 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:10 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Update
  • 12:10 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:09 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Update
  • 12:08 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1215.eqiad.wmnet
  • 12:02 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db1215.eqiad.wmnet
  • 12:01 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update
  • 12:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1235.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:57 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 11:45 btullis@cumin1003: START - Cookbook sre.hosts.provision for host an-worker1235.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:39 moritzm: installing apache2 security updates
  • 11:31 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 11:30 moritzm: installing shadow security updates
  • 11:27 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1184.eqiad.wmnet
  • 11:17 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1184 - Upgrading db1184.eqiad.wmnet
  • 11:17 ladsgroup@cumin1003: START - Cookbook sre.mysql.depool db1184 - Upgrading db1184.eqiad.wmnet
  • 11:16 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db1184.eqiad.wmnet
  • 11:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1184 T404326', diff saved to https://phabricator.wikimedia.org/P83239 and previous config saved to /var/cache/conftool/dbconfig/20250911-111545-ladsgroup.json
  • 11:13 ladsgroup@dns1004: END - running authdns-update
  • 11:12 ladsgroup@dns1004: START - running authdns-update
  • 11:09 stevemunene@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on druid[1012-1013].eqiad.wmnet with reason: New druid_public hosts in setup
  • 11:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Promote db1163 to s1 primary and set section read-write T404326', diff saved to https://phabricator.wikimedia.org/P83238 and previous config saved to /var/cache/conftool/dbconfig/20250911-110821-ladsgroup.json
  • 11:07 Amir1: Starting s1 eqiad failover from db1184 to db1163 - T404326
  • 11:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T404326', diff saved to https://phabricator.wikimedia.org/P83237 and previous config saved to /var/cache/conftool/dbconfig/20250911-110036-ladsgroup.json
  • 11:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set db1163 with weight 0 T404326', diff saved to https://phabricator.wikimedia.org/P83236 and previous config saved to /var/cache/conftool/dbconfig/20250911-105959-ladsgroup.json
  • 10:59 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s1 T404326
  • 10:59 ladsgroup@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1:00:00 on 32 hosts with reason: Primary switchover s1 T404326
  • 10:49 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2179 gradually with 4 steps - Pooling in
  • 10:47 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 10:42 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2230.codfw.wmnet
  • 10:37 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db2230.codfw.wmnet
  • 10:35 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1176.eqiad.wmnet
  • 10:35 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1234.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:31 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: sync
  • 10:30 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: sync
  • 10:30 claime: roll-restarting proton@eqiad
  • 10:29 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db1176.eqiad.wmnet
  • 10:28 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1160.eqiad.wmnet
  • 10:23 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1160 - Upgrading db1160.eqiad.wmnet
  • 10:23 ladsgroup@cumin1003: START - Cookbook sre.mysql.depool db1160 - Upgrading db1160.eqiad.wmnet
  • 10:23 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db1160.eqiad.wmnet
  • 10:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1160 T404274', diff saved to https://phabricator.wikimedia.org/P83233 and previous config saved to /var/cache/conftool/dbconfig/20250911-102232-ladsgroup.json
  • 10:22 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: sync
  • 10:21 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: sync
  • 10:17 ladsgroup@dns1004: END - running authdns-update
  • 10:15 ladsgroup@dns1004: START - running authdns-update
  • 10:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Promote db1244 to s4 primary and set section read-write T404274', diff saved to https://phabricator.wikimedia.org/P83231 and previous config saved to /var/cache/conftool/dbconfig/20250911-101247-ladsgroup.json
  • 10:09 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2179 gradually with 4 steps - Pooling in
  • 10:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T404274', diff saved to https://phabricator.wikimedia.org/P83229 and previous config saved to /var/cache/conftool/dbconfig/20250911-100348-ladsgroup.json
  • 10:09 fceratto@cumin1002: dbctl commit (dc=all): 'Switch db2237 and db2179 weights', diff saved to https://phabricator.wikimedia.org/P83228 and previous config saved to /var/cache/conftool/dbconfig/20250911-100328-fceratto.json
  • 10:00 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie
  • 09:44 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set db1244 with weight 0 T404274', diff saved to https://phabricator.wikimedia.org/P83227 and previous config saved to /var/cache/conftool/dbconfig/20250911-094414-ladsgroup.json
  • 09:43 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T404274
  • 09:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 09:29 fabfur: upgrading haproxykafka to v0.3.16 on A:cp to test new feature (https://gitlab.wikimedia.org/repos/sre/haproxykafka/-/merge_requests/101) (T403176)
  • 09:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 09:20 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 09:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depool db2179 T404299', diff saved to https://phabricator.wikimedia.org/P83226 and previous config saved to /var/cache/conftool/dbconfig/20250911-091626-fceratto.json
  • 09:13 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2240 to s4 primary T404299', diff saved to https://phabricator.wikimedia.org/P83225 and previous config saved to /var/cache/conftool/dbconfig/20250911-091347-fceratto.json
  • 09:12 federico3: Starting s4 codfw failover from db2179 to db2240 - T404299
  • 09:07 fabfur: upgrading haproxykafka to v0.3.16 on cp3066 to test new feature (https://gitlab.wikimedia.org/repos/sre/haproxykafka/-/merge_requests/101) (T403176)
  • 09:07 fceratto@cumin1002: dbctl commit (dc=all): 'Remove db2240 from API/vslow/dump T404299', diff saved to https://phabricator.wikimedia.org/P83224 and previous config saved to /var/cache/conftool/dbconfig/20250911-090708-fceratto.json
  • 09:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T404299
  • 09:01 btullis@cumin1003: START - Cookbook sre.hosts.provision for host an-worker1234.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:38 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1233.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:29 btullis@cumin1003: START - Cookbook sre.hosts.provision for host an-worker1233.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:17 moritzm: installing systemd bugfix updates on trixie
  • 08:11 moritzm: kick off full OSM import for the new maps cluster in eqiad T381565
  • 07:52 moritzm: upload bacula 9.6.7-7+wmf13u1 to component/bacula9 for trixie-wikimedia T404114
  • 07:47 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 07:15 kartik@deploy1003: Finished scap sync-world: Backport for Add namespace alias for scn wiki (T375979) (duration: 11m 26s)
  • 07:10 kartik@deploy1003: srishakatux, kartik: Continuing with sync
  • 07:08 kartik@deploy1003: srishakatux, kartik: Backport for Add namespace alias for scn wiki (T375979) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:03 kartik@deploy1003: Started scap sync-world: Backport for Add namespace alias for scn wiki (T375979)
  • 06:23 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 7679
  • 06:22 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 7679
  • 00:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T402763)', diff saved to https://phabricator.wikimedia.org/P83223 and previous config saved to /var/cache/conftool/dbconfig/20250911-004705-ladsgroup.json
  • 00:45 swfrench-wmf: finished single-replica PHP 8.3 pilot on shellbox-constraints - T403284
  • 00:45 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 00:45 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 00:44 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 00:44 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 00:44 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1209* gradually with 4 steps - Work done
  • 00:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P83221 and previous config saved to /var/cache/conftool/dbconfig/20250911-003157-ladsgroup.json
  • 00:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P83219 and previous config saved to /var/cache/conftool/dbconfig/20250911-001650-ladsgroup.json
  • 00:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T402763)', diff saved to https://phabricator.wikimedia.org/P83217 and previous config saved to /var/cache/conftool/dbconfig/20250911-000142-ladsgroup.json
  • 00:00 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1257* gradually with 4 steps - Work done

2025-09-10

  • 23:58 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1209* gradually with 4 steps - Work done
  • 23:58 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1209.eqiad.wmnet
  • 23:47 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1209 - Upgrading db1209.eqiad.wmnet
  • 23:47 ladsgroup@cumin1003: START - Cookbook sre.mysql.depool db1209 - Upgrading db1209.eqiad.wmnet
  • 23:47 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db1209.eqiad.wmnet
  • 23:44 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1209 T404277', diff saved to https://phabricator.wikimedia.org/P83213 and previous config saved to /var/cache/conftool/dbconfig/20250910-234456-ladsgroup.json
  • 23:44 ladsgroup@dns1004: END - running authdns-update
  • 23:43 ladsgroup@dns1004: START - running authdns-update
  • 23:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Promote db1193 to s8 primary and set section read-write T404277', diff saved to https://phabricator.wikimedia.org/P83212 and previous config saved to /var/cache/conftool/dbconfig/20250910-234049-ladsgroup.json
  • 23:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2240 (T402763)', diff saved to https://phabricator.wikimedia.org/P83211 and previous config saved to /var/cache/conftool/dbconfig/20250910-233943-ladsgroup.json
  • 23:39 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2240.codfw.wmnet with reason: Maintenance
  • 23:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T404277', diff saved to https://phabricator.wikimedia.org/P83210 and previous config saved to /var/cache/conftool/dbconfig/20250910-233902-ladsgroup.json
  • 23:38 Amir1: Starting s8 eqiad failover from db1209 to db1193 - T404277
  • 23:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set db1193 with weight 0 T404277', diff saved to https://phabricator.wikimedia.org/P83209 and previous config saved to /var/cache/conftool/dbconfig/20250910-233428-ladsgroup.json
  • 23:33 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 T404277
  • 23:26 rzl: sudo -i reprepro copy trixie-wikimedia bullseye-wikimedia envoyproxy # T403663
  • 23:26 rzl: sudo -i reprepro copy bookworm-wikimedia bullseye-wikimedia envoyproxy # T403663
  • 23:25 rzl: sudo -i reprepro -C main includedeb bullseye-wikimedia /srv/wikimedia/pool/component/envoy-future/e/envoyproxy/envoyproxy_1.29.12-1_amd64.deb # T403663
  • 23:15 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1257* gradually with 4 steps - Work done
  • 23:14 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1257.eqiad.wmnet
  • 23:13 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 23:13 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T402763)', diff saved to https://phabricator.wikimedia.org/P83206 and previous config saved to /var/cache/conftool/dbconfig/20250910-231301-ladsgroup.json
  • 23:09 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1257 - Upgrading db1257.eqiad.wmnet
  • 23:08 ladsgroup@cumin1003: START - Cookbook sre.mysql.depool db1257 - Upgrading db1257.eqiad.wmnet
  • 23:08 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db1257.eqiad.wmnet
  • 23:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Reboot', diff saved to https://phabricator.wikimedia.org/P83205 and previous config saved to /var/cache/conftool/dbconfig/20250910-230823-ladsgroup.json
  • 22:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P83204 and previous config saved to /var/cache/conftool/dbconfig/20250910-225753-ladsgroup.json
  • 22:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P83203 and previous config saved to /var/cache/conftool/dbconfig/20250910-224246-ladsgroup.json
  • 22:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T402763)', diff saved to https://phabricator.wikimedia.org/P83202 and previous config saved to /var/cache/conftool/dbconfig/20250910-222738-ladsgroup.json
  • 22:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2237 (T402763)', diff saved to https://phabricator.wikimedia.org/P83201 and previous config saved to /var/cache/conftool/dbconfig/20250910-220245-ladsgroup.json
  • 22:02 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 22:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T402763)', diff saved to https://phabricator.wikimedia.org/P83200 and previous config saved to /var/cache/conftool/dbconfig/20250910-220222-ladsgroup.json
  • 21:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P83199 and previous config saved to /var/cache/conftool/dbconfig/20250910-214714-ladsgroup.json
  • 21:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P83198 and previous config saved to /var/cache/conftool/dbconfig/20250910-213207-ladsgroup.json
  • 21:32 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 21:30 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 21:30 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 21:28 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 21:28 rzl@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 21:18 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 21:18 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 21:18 rzl@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 21:17 rzl@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 21:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T402763)', diff saved to https://phabricator.wikimedia.org/P83197 and previous config saved to /var/cache/conftool/dbconfig/20250910-211659-ladsgroup.json
  • 21:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 21:11 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 21:11 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 21:10 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 21:09 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 21:09 rzl@deploy1003: helmfile [staging] START helmfile.d/services/apertium: apply
  • 20:57 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1016.eqiad.wmnet with OS bookworm
  • 20:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2236 (T402763)', diff saved to https://phabricator.wikimedia.org/P83196 and previous config saved to /var/cache/conftool/dbconfig/20250910-205146-ladsgroup.json
  • 20:51 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2236.codfw.wmnet with reason: Maintenance
  • 20:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T402763)', diff saved to https://phabricator.wikimedia.org/P83195 and previous config saved to /var/cache/conftool/dbconfig/20250910-205134-ladsgroup.json
  • {{safesubst:SAL entry|1=20:41 reedy@deploy1003: Finished scap sync-world: Backport for ApiQueryTokens: Persist any new token, instead of depending on the type (T403519), ApiQueryTokens: Persist any new token, instead of depending on the type (T403519), Revert^2 "Set $wgPHPSessionHandling to 'disable' on group1 wikis" (T362324), [[gerrit:1187065|Revert^2 "Set $wgPHPSessionHandling to 'disable}}
  • 20:40 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: host reimage
  • 20:36 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: host reimage
  • 20:36 reedy@deploy1003: reedy, matmarex: Continuing with sync
  • 20:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P83194 and previous config saved to /var/cache/conftool/dbconfig/20250910-203626-ladsgroup.json
  • {{safesubst:SAL entry|1=20:32 reedy@deploy1003: reedy, matmarex: Backport for ApiQueryTokens: Persist any new token, instead of depending on the type (T403519), ApiQueryTokens: Persist any new token, instead of depending on the type (T403519), Revert^2 "Set $wgPHPSessionHandling to 'disable' on group1 wikis" (T362324), [[gerrit:1187065|Revert^2 "Set $wgPHPSessionHandling to 'disable' on grou}}
  • {{safesubst:SAL entry|1=20:27 reedy@deploy1003: Started scap sync-world: Backport for ApiQueryTokens: Persist any new token, instead of depending on the type (T403519), ApiQueryTokens: Persist any new token, instead of depending on the type (T403519), Revert^2 "Set $wgPHPSessionHandling to 'disable' on group1 wikis" (T362324), [[gerrit:1187065|Revert^2 "Set $wgPHPSessionHandling to 'disable'}}
  • 20:26 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=rest-gateway-ro,name=codfw [reason: Pooling codfw on new -ro service - T400131]
  • 20:26 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=rest-gateway-ro,name=eqiad [reason: Pooling codfw on new -ro service - T400131]
  • 20:25 swfrench-wmf: ran authdns-update to convert rest-gateway to active/passive - T400131
  • 20:24 swfrench@dns1004: END - running authdns-update
  • 20:23 swfrench@dns1004: START - running authdns-update
  • 20:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P83193 and previous config saved to /var/cache/conftool/dbconfig/20250910-202119-ladsgroup.json
  • 20:20 reedy@deploy1003: Finished scap sync-world: Backport for HookHandler: Do a CentralID lookup directly (T404252), HookHandler: Do a CentralID lookup directly (T404252), Add rights to bypass spam blacklists for azwiki sysops and interface-admins (T400428), build: Updating mediawiki/mediawiki-codesniffer to 48.0.0 (T403781) (duration: 12m 48s)
  • 20:14 reedy@deploy1003: reedy, umherirrender, nmw03: Continuing with sync
  • 20:13 reedy@deploy1003: reedy, umherirrender, nmw03: Backport for HookHandler: Do a CentralID lookup directly (T404252), HookHandler: Do a CentralID lookup directly (T404252), Add rights to bypass spam blacklists for azwiki sysops and interface-admins (T400428), build: Updating mediawiki/mediawiki-codesniffer to 48.0.0 (T403781) synced to the testse
  • 20:10 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1016.eqiad.wmnet with OS bookworm
  • 20:08 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1004.eqiad.wmnet with OS bookworm
  • 20:07 reedy@deploy1003: Started scap sync-world: Backport for HookHandler: Do a CentralID lookup directly (T404252), HookHandler: Do a CentralID lookup directly (T404252), Add rights to bypass spam blacklists for azwiki sysops and interface-admins (T400428), build: Updating mediawiki/mediawiki-codesniffer to 48.0.0 (T403781)
  • 20:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T402763)', diff saved to https://phabricator.wikimedia.org/P83192 and previous config saved to /var/cache/conftool/dbconfig/20250910-200612-ladsgroup.json
  • 19:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns for new IPs for ssw1-d1-eqiad - cmooney@cumin1003"
  • 19:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns for new IPs for ssw1-d1-eqiad - cmooney@cumin1003"
  • 19:49 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
  • 19:44 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 19:44 cmooney@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 19:44 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 19:42 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
  • 19:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2219 (T402763)', diff saved to https://phabricator.wikimedia.org/P83190 and previous config saved to /var/cache/conftool/dbconfig/20250910-194200-ladsgroup.json
  • 19:41 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 19:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T402763)', diff saved to https://phabricator.wikimedia.org/P83189 and previous config saved to /var/cache/conftool/dbconfig/20250910-194134-ladsgroup.json
  • 19:33 cmooney@dns2005: END - running authdns-update
  • 19:32 cmooney@dns2005: START - running authdns-update
  • 19:28 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:28 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate new snippet files for reverse dns zones added for ssw1-d1-eqiad links - cmooney@cumin1003"
  • 19:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P83188 and previous config saved to /var/cache/conftool/dbconfig/20250910-192626-ladsgroup.json
  • 19:25 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate new snippet files for reverse dns zones added for ssw1-d1-eqiad links - cmooney@cumin1003"
  • 19:21 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 19:11 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bookworm
  • 19:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P83187 and previous config saved to /var/cache/conftool/dbconfig/20250910-191119-ladsgroup.json
  • 18:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T402763)', diff saved to https://phabricator.wikimedia.org/P83186 and previous config saved to /var/cache/conftool/dbconfig/20250910-185611-ladsgroup.json
  • 18:50 swfrench-wmf: running puppet agent on A:dnsbox hosts - T400131
  • 18:43 swfrench-wmf: temporarily disabling puppet agent on A:dnsbox hosts - T400131
  • 18:39 swfrench-wmf: ran authdns-update to add rest-gateway-ro and point rest-gateway at it - T400131
  • 18:35 swfrench@dns1004: END - running authdns-update
  • 18:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2210 (T402763)', diff saved to https://phabricator.wikimedia.org/P83185 and previous config saved to /var/cache/conftool/dbconfig/20250910-183449-ladsgroup.json
  • 18:34 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 18:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T402763)', diff saved to https://phabricator.wikimedia.org/P83184 and previous config saved to /var/cache/conftool/dbconfig/20250910-183426-ladsgroup.json
  • 18:34 swfrench@dns1004: START - running authdns-update
  • 18:33 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1256* gradually with 4 steps - Work done
  • 18:28 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=rest-gateway-ro,name=eqiad [reason: Pooling eqiad on new -ro service - T400131]
  • 18:28 dduvall: elevated error rate during wmf.18 group1 promotion. all were `$aspect must use one of the XXX_USAGE constants` error occurring from wmf.17 (cc T404238)
  • 18:26 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2006-dev.codfw.wmnet with OS bookworm
  • 18:20 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1236
  • 18:20 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1236
  • 18:20 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1235
  • 18:19 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1235
  • 18:19 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1234
  • 18:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P83182 and previous config saved to /var/cache/conftool/dbconfig/20250910-181918-ladsgroup.json
  • 18:18 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1234
  • 18:16 swfrench-wmf: running puppet agent on A:dnsbox hosts - T400131
  • 18:15 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.18 refs T396379
  • 18:13 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1233
  • 18:11 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1233
  • 18:10 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:10 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moving 4 servers to the analytics vlan - btullis@cumin1003"
  • 18:10 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moving 4 servers to the analytics vlan - btullis@cumin1003"
  • 18:08 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2006-dev.codfw.wmnet with reason: host reimage
  • 18:06 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 18:04 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P83180 and previous config saved to /var/cache/conftool/dbconfig/20250910-180411-ladsgroup.json
  • 18:02 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2006-dev.codfw.wmnet with reason: host reimage
  • 17:54 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
  • 17:53 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: sync
  • 17:53 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker[1233-1236].eqiad.wmnet
  • 17:53 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:53 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1233-1236].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 17:49 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:49 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:49 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T402763)', diff saved to https://phabricator.wikimedia.org/P83178 and previous config saved to /var/cache/conftool/dbconfig/20250910-174903-ladsgroup.json
  • 17:48 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:48 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:47 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1256* gradually with 4 steps - Work done
  • 17:47 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1256.eqiad.wmnet
  • 17:45 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1233-1236].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 17:45 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon2006-dev.codfw.wmnet with OS bookworm
  • 17:41 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1256 - Upgrading db1256.eqiad.wmnet
  • 17:41 ladsgroup@cumin1003: START - Cookbook sre.mysql.depool db1256 - Upgrading db1256.eqiad.wmnet
  • 17:40 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db1256.eqiad.wmnet
  • 17:39 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2185.codfw.wmnet
  • 17:37 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2005-dev.codfw.wmnet with OS bookworm
  • 17:35 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 17:34 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db2185.codfw.wmnet
  • 17:28 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2206 (T402763)', diff saved to https://phabricator.wikimedia.org/P83175 and previous config saved to /var/cache/conftool/dbconfig/20250910-172817-ladsgroup.json
  • 17:28 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 17:23 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker[1233-1236].eqiad.wmnet
  • 17:19 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 17:15 Amir1: dropping user_autocreate_serial on sul wikis where empty (T397367)
  • 17:11 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 17:10 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 17:10 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 17:10 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 17:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T402763)', diff saved to https://phabricator.wikimedia.org/P83174 and previous config saved to /var/cache/conftool/dbconfig/20250910-170944-ladsgroup.json
  • 17:02 ejegg: fundraising civicrm upgraded from 4ac726d1 to 04298941
  • 17:00 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:58 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:55 swfrench-wmf: started single-replica PHP 8.3 pilot on shellbox-constraints - T403284
  • 16:55 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:55 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 16:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P83173 and previous config saved to /var/cache/conftool/dbconfig/20250910-165436-ladsgroup.json
  • 16:54 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon2005-dev.codfw.wmnet with OS bookworm
  • 16:48 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:48 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 16:47 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:46 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 16:46 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 16:45 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:45 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 16:43 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:43 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 16:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P83171 and previous config saved to /var/cache/conftool/dbconfig/20250910-163929-ladsgroup.json
  • 16:28 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 16:27 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 16:26 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply
  • 16:24 jgleeson: payments-wiki upgraded from 10d200b1 to 9988ba11
  • 16:24 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T402763)', diff saved to https://phabricator.wikimedia.org/P83170 and previous config saved to /var/cache/conftool/dbconfig/20250910-162421-ladsgroup.json
  • 16:23 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 16:11 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=rest-gateway,name=codfw [reason: Depooling codfw ahead of switch to active-passive - T400131]
  • 16:05 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 16:05 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:04 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:04 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:03 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 16:03 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 16:02 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:01 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:00 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:59 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2172 (T402763)', diff saved to https://phabricator.wikimedia.org/P83169 and previous config saved to /var/cache/conftool/dbconfig/20250910-155911-ladsgroup.json
  • 15:59 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 15:58 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T402763)', diff saved to https://phabricator.wikimedia.org/P83168 and previous config saved to /var/cache/conftool/dbconfig/20250910-155847-ladsgroup.json
  • 15:57 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:54 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS trixie
  • 15:50 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:50 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:49 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:49 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P83167 and previous config saved to /var/cache/conftool/dbconfig/20250910-154340-ladsgroup.json
  • 15:43 fceratto@cumin1002: dbctl commit (dc=all): 'Swap db2213 and 2223 weights', diff saved to https://phabricator.wikimedia.org/P83166 and previous config saved to /var/cache/conftool/dbconfig/20250910-154331-fceratto.json
  • 15:40 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:40 btullis@cumin1003: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 15:38 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:33 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 15:33 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:30 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P83165 and previous config saved to /var/cache/conftool/dbconfig/20250910-152725-ladsgroup.json
  • 15:26 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 15:26 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:22 pt1979@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqsin with reason: router upgrade
  • 15:22 papaul: disable OSPF on mr1-eqsin to test BGP
  • 15:21 ejegg: standalone SmashPig upgraded from 0fccf147 to d73c3d9a
  • 15:19 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 15:18 btullis@cumin1003: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 15:16 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=rest-gateway,name=codfw [reason: Repooling codfw while investigating provisioning of proton service - T400131]
  • 15:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T402763)', diff saved to https://phabricator.wikimedia.org/P83164 and previous config saved to /var/cache/conftool/dbconfig/20250910-151216-ladsgroup.json
  • 15:10 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS trixie
  • 15:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 15:08 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2204 and db2207 weights after flip', diff saved to https://phabricator.wikimedia.org/P83163 and previous config saved to /var/cache/conftool/dbconfig/20250910-150800-fceratto.json
  • 14:57 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:57 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2212 gradually with 4 steps - pooling in
  • 14:56 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2212 gradually with 4 steps - pooling in
  • 14:55 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=rest-gateway,name=codfw [reason: Depooling codfw ahead of switch to active-passive - T400131]
  • 14:55 jforrester@deploy1003: Finished scap sync-world: Backport for Improve performance of preferred labels subquery (duration: 13m 44s)
  • 14:50 jforrester@deploy1003: jforrester: Continuing with sync
  • 14:49 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T401906)', diff saved to https://phabricator.wikimedia.org/P83162 and previous config saved to /var/cache/conftool/dbconfig/20250910-144932-fceratto.json
  • 14:47 jforrester@deploy1003: jforrester: Backport for Improve performance of preferred labels subquery synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2155 (T402763)', diff saved to https://phabricator.wikimedia.org/P83161 and previous config saved to /var/cache/conftool/dbconfig/20250910-144732-ladsgroup.json
  • 14:47 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T402763)', diff saved to https://phabricator.wikimedia.org/P83160 and previous config saved to /var/cache/conftool/dbconfig/20250910-144710-ladsgroup.json
  • 14:46 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:41 jforrester@deploy1003: Started scap sync-world: Backport for Improve performance of preferred labels subquery
  • 14:36 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:36 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:35 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:35 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:34 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:34 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:34 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P83159 and previous config saved to /var/cache/conftool/dbconfig/20250910-143424-fceratto.json
  • 14:34 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:33 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:32 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P83158 and previous config saved to /var/cache/conftool/dbconfig/20250910-143202-ladsgroup.json
  • 14:30 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:30 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:26 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:25 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:25 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:25 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:24 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:24 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:20 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:19 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:19 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P83157 and previous config saved to /var/cache/conftool/dbconfig/20250910-141917-fceratto.json
  • 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:17 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:17 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P83156 and previous config saved to /var/cache/conftool/dbconfig/20250910-141655-ladsgroup.json
  • 14:12 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 kartik@deploy1003: Finished scap sync-world: Backport for CX3 Build 1.0.0+20250909 (T374886 T394998 T399122 T399125 T399133 T403730 T404045 T404093) (duration: 24m 08s)
  • 14:11 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:06 kartik@deploy1003: sbisson, kartik: Continuing with sync
  • 14:06 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:04 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:04 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:04 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:04 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:04 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:04 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T401906)', diff saved to https://phabricator.wikimedia.org/P83155 and previous config saved to /var/cache/conftool/dbconfig/20250910-140410-fceratto.json
  • 14:03 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T401906)', diff saved to https://phabricator.wikimedia.org/P83154 and previous config saved to /var/cache/conftool/dbconfig/20250910-140159-fceratto.json
  • 14:01 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 14:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T402763)', diff saved to https://phabricator.wikimedia.org/P83153 and previous config saved to /var/cache/conftool/dbconfig/20250910-140147-ladsgroup.json
  • 13:57 fceratto@cumin1002: dbctl commit (dc=all): 'Set weight on db2204', diff saved to https://phabricator.wikimedia.org/P83152 and previous config saved to /var/cache/conftool/dbconfig/20250910-135720-fceratto.json
  • 13:55 fceratto@cumin1002: dbctl commit (dc=all): 'Reset weights on db2212 and db2203', diff saved to https://phabricator.wikimedia.org/P83151 and previous config saved to /var/cache/conftool/dbconfig/20250910-135553-fceratto.json
  • 13:54 kartik@deploy1003: sbisson, kartik: Backport for CX3 Build 1.0.0+20250909 (T374886 T394998 T399122 T399125 T399133 T403730 T404045 T404093) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2161 weight based on db2165', diff saved to https://phabricator.wikimedia.org/P83150 and previous config saved to /var/cache/conftool/dbconfig/20250910-135119-fceratto.json
  • 13:50 bking@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 13:48 bking@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 13:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1235.eqiad.wmnet
  • 13:48 kartik@deploy1003: Started scap sync-world: Backport for CX3 Build 1.0.0+20250909 (T374886 T394998 T399122 T399125 T399133 T403730 T404045 T404093)
  • 13:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T402763)', diff saved to https://phabricator.wikimedia.org/P83149 and previous config saved to /var/cache/conftool/dbconfig/20250910-134734-fceratto.json
  • 13:46 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:45 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:42 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephmon2004-dev.codfw.wmnet with OS trixie
  • 13:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T402763)', diff saved to https://phabricator.wikimedia.org/P83148 and previous config saved to /var/cache/conftool/dbconfig/20250910-134046-fceratto.json
  • 13:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 13:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T401906)', diff saved to https://phabricator.wikimedia.org/P83147 and previous config saved to /var/cache/conftool/dbconfig/20250910-133746-fceratto.json
  • 13:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2147 (T402763)', diff saved to https://phabricator.wikimedia.org/P83146 and previous config saved to /var/cache/conftool/dbconfig/20250910-133729-ladsgroup.json
  • 13:37 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 13:36 kartik@deploy1003: Finished scap sync-world: Backport for Desktop publish_success: add revid and pageid (T402975) (duration: 15m 41s)
  • 13:33 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS trixie
  • 13:32 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2165 to s8 primary T404192', diff saved to https://phabricator.wikimedia.org/P83145 and previous config saved to /var/cache/conftool/dbconfig/20250910-133231-fceratto.json
  • 13:31 federico3: Starting s8 codfw failover from db2161 to db2165 - T404192
  • 13:31 kartik@deploy1003: kartik, sbisson: Continuing with sync
  • 13:30 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1235.eqiad.wmnet
  • 13:29 btullis@cumin1003: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1233-1236].eqiad.wmnet
  • 13:26 kartik@deploy1003: kartik, sbisson: Backport for Desktop publish_success: add revid and pageid (T402975) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:24 btullis@cumin1003: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1233-1236].eqiad.wmnet
  • 13:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 T404192
  • 13:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P83143 and previous config saved to /var/cache/conftool/dbconfig/20250910-132239-fceratto.json
  • 13:20 kartik@deploy1003: Started scap sync-world: Backport for Desktop publish_success: add revid and pageid (T402975)
  • 13:17 moritzm: installing apache2 security updates
  • 13:17 fceratto@cumin1002: dbctl commit (dc=all): 'Set weight on db2212', diff saved to https://phabricator.wikimedia.org/P83142 and previous config saved to /var/cache/conftool/dbconfig/20250910-131728-fceratto.json
  • 13:15 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T402763)', diff saved to https://phabricator.wikimedia.org/P83141 and previous config saved to /var/cache/conftool/dbconfig/20250910-131538-ladsgroup.json
  • 13:05 kart_: Updated Recommendation API to 2025-09-10-080042-production (T403730, T403976, T400562)
  • 13:02 tchanders@deploy1003: Finished scap sync-world: Backport for Enable temporary accounts on all medium-sized projects (T403399), Enable temporary accounts on metawiki (T402181) (duration: 16m 06s)
  • 13:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250910-130026-ladsgroup.json
  • 12:57 tchanders@deploy1003: tchanders, stran: Continuing with sync
  • 12:55 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1215.eqiad.wmnet with reason: Glow up (T399540 T394371)
  • 12:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T401906)', diff saved to https://phabricator.wikimedia.org/P83139 and previous config saved to /var/cache/conftool/dbconfig/20250910-125224-fceratto.json
  • 12:52 fceratto@cumin1002: dbctl commit (dc=all): 'Set weight on db2203', diff saved to https://phabricator.wikimedia.org/P83138 and previous config saved to /var/cache/conftool/dbconfig/20250910-125216-fceratto.json
  • 12:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T401906)', diff saved to https://phabricator.wikimedia.org/P83137 and previous config saved to /var/cache/conftool/dbconfig/20250910-125108-fceratto.json
  • 12:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 12:50 tchanders@deploy1003: tchanders, stran: Backport for Enable temporary accounts on all medium-sized projects (T403399), Enable temporary accounts on metawiki (T402181) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:48 moritzm: installing unbound security updates on bullseye
  • 12:48 moritzm: installing unbound security updates on bullseyre
  • 12:46 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2185.codfw.wmnet with reason: Glow up (T394371)
  • 12:46 tchanders@deploy1003: Started scap sync-world: Backport for Enable temporary accounts on all medium-sized projects (T403399), Enable temporary accounts on metawiki (T402181)
  • 12:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 12:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P83136 and previous config saved to /var/cache/conftool/dbconfig/20250910-124518-ladsgroup.json
  • 12:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 12:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T402763)', diff saved to https://phabricator.wikimedia.org/P83135 and previous config saved to /var/cache/conftool/dbconfig/20250910-123831-fceratto.json
  • 12:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T402763)', diff saved to https://phabricator.wikimedia.org/P83134 and previous config saved to /var/cache/conftool/dbconfig/20250910-123719-fceratto.json
  • 12:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 12:34 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1236* gradually with 4 steps - Work done
  • 12:31 kartik@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:30 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2191.codfw.wmnet
  • 12:30 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2191 gradually with 4 steps - Upgrade of db2191.codfw.wmnet completed
  • 12:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T402763)', diff saved to https://phabricator.wikimedia.org/P83131 and previous config saved to /var/cache/conftool/dbconfig/20250910-123011-ladsgroup.json
  • 12:26 kartik@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1014.eqiad.wmnet with OS bookworm
  • 12:21 kartik@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:20 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 12:19 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
  • 12:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1252 (T402763)', diff saved to https://phabricator.wikimedia.org/P83128 and previous config saved to /var/cache/conftool/dbconfig/20250910-120940-ladsgroup.json
  • 12:09 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1252.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T402763)', diff saved to https://phabricator.wikimedia.org/P83127 and previous config saved to /var/cache/conftool/dbconfig/20250910-120917-ladsgroup.json
  • 12:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T402763)', diff saved to https://phabricator.wikimedia.org/P83125 and previous config saved to /var/cache/conftool/dbconfig/20250910-120024-fceratto.json
  • 11:56 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1173.eqiad.wmnet
  • 11:56 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1173 gradually with 4 steps - Upgrade of db1173.eqiad.wmnet completed
  • 11:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P83122 and previous config saved to /var/cache/conftool/dbconfig/20250910-115409-ladsgroup.json
  • 11:48 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1236* gradually with 4 steps - Work done
  • 11:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2212 (T402925)', diff saved to https://phabricator.wikimedia.org/P83120 and previous config saved to /var/cache/conftool/dbconfig/20250910-114549-ladsgroup.json
  • 11:45 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 11:44 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2191 gradually with 4 steps - Upgrade of db2191.codfw.wmnet completed
  • 11:43 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P83117 and previous config saved to /var/cache/conftool/dbconfig/20250910-113902-ladsgroup.json
  • 11:33 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2191 - Upgrading db2191.codfw.wmnet
  • 11:33 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2191 - Upgrading db2191.codfw.wmnet
  • 11:33 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2191.codfw.wmnet
  • 11:25 moritzm: installing Linux 6.1.148 on Bookworm hosts
  • 11:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T402763)', diff saved to https://phabricator.wikimedia.org/P83114 and previous config saved to /var/cache/conftool/dbconfig/20250910-112354-ladsgroup.json
  • 11:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T402763)', diff saved to https://phabricator.wikimedia.org/P83113 and previous config saved to /var/cache/conftool/dbconfig/20250910-111503-fceratto.json
  • 11:13 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1236.eqiad.wmnet
  • 11:10 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1173 gradually with 4 steps - Upgrade of db1173.eqiad.wmnet completed
  • 11:09 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T402763)', diff saved to https://phabricator.wikimedia.org/P83111 and previous config saved to /var/cache/conftool/dbconfig/20250910-110937-fceratto.json
  • 11:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1236 - Upgrading db1236.eqiad.wmnet
  • 11:07 ladsgroup@cumin1003: START - Cookbook sre.mysql.depool db1236 - Upgrading db1236.eqiad.wmnet
  • 11:07 moritzm: kick off full OSM import for the new maps cluster in codfw T381565
  • 11:07 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db1236.eqiad.wmnet
  • 11:05 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1236.eqiad.wmnet with reason: Clean up the mess
  • 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet
  • 11:04 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1173 - Upgrading db1173.eqiad.wmnet
  • 11:04 fceratto@cumin1002: START - Cookbook sre.mysql.depool db1173 - Upgrading db1173.eqiad.wmnet
  • 11:04 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db1173.eqiad.wmnet
  • 11:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1249 (T402763)', diff saved to https://phabricator.wikimedia.org/P83110 and previous config saved to /var/cache/conftool/dbconfig/20250910-110343-ladsgroup.json
  • 11:03 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 11:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T402763)', diff saved to https://phabricator.wikimedia.org/P83109 and previous config saved to /var/cache/conftool/dbconfig/20250910-110320-ladsgroup.json
  • 10:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet
  • 10:57 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1173.eqiad.wmnet
  • 10:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1236 T404180', diff saved to https://phabricator.wikimedia.org/P83107 and previous config saved to /var/cache/conftool/dbconfig/20250910-105650-ladsgroup.json
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet
  • 10:55 ladsgroup@dns1004: END - running authdns-update
  • 10:54 ladsgroup@dns1004: START - running authdns-update
  • 10:52 ladsgroup@cumin1003: dbctl commit (dc=all): 'Promote db1181 to s7 primary and set section read-write T404180', diff saved to https://phabricator.wikimedia.org/P83106 and previous config saved to /var/cache/conftool/dbconfig/20250910-105205-ladsgroup.json
  • 10:51 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1173 - Upgrading db1173.eqiad.wmnet
  • 10:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T404180', diff saved to https://phabricator.wikimedia.org/P83104 and previous config saved to /var/cache/conftool/dbconfig/20250910-105042-ladsgroup.json
  • 10:50 fceratto@cumin1002: START - Cookbook sre.mysql.depool db1173 - Upgrading db1173.eqiad.wmnet
  • 10:50 Amir1: Starting s7 eqiad failover from db1236 to db1181 - T404180
  • 10:50 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db1173.eqiad.wmnet
  • 10:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet
  • 10:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P83103 and previous config saved to /var/cache/conftool/dbconfig/20250910-104813-ladsgroup.json
  • 10:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set db1181 with weight 0 T404180', diff saved to https://phabricator.wikimedia.org/P83101 and previous config saved to /var/cache/conftool/dbconfig/20250910-104223-ladsgroup.json
  • 10:40 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T404180
  • 10:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repool db1181 T399955', diff saved to https://phabricator.wikimedia.org/P83100 and previous config saved to /var/cache/conftool/dbconfig/20250910-103436-ladsgroup.json
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet
  • 10:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P83099 and previous config saved to /var/cache/conftool/dbconfig/20250910-103305-ladsgroup.json
  • 10:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet
  • 10:25 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2203 to s1 primary T404178', diff saved to https://phabricator.wikimedia.org/P83098 and previous config saved to /var/cache/conftool/dbconfig/20250910-102507-fceratto.json
  • 10:24 federico3: Starting s1 codfw failover from db2212 to db2203 - T404178
  • 10:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T402763)', diff saved to https://phabricator.wikimedia.org/P83097 and previous config saved to /var/cache/conftool/dbconfig/20250910-101758-ladsgroup.json
  • 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet
  • 10:14 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: Glow up
  • 10:13 ladsgroup@cumin1003: dbctl commit (dc=all): 'Glow up db1181 (T399955)', diff saved to https://phabricator.wikimedia.org/P83096 and previous config saved to /var/cache/conftool/dbconfig/20250910-101345-ladsgroup.json
  • 10:12 moritzm: imported imposm3 0.14.1-2 T381565
  • 10:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s1 T404178
  • 10:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet
  • 10:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2014.codfw.wmnet
  • 10:06 moritzm: upgrading Envoy on Phabricator T402584
  • 10:06 moritzm: upgrading Envoy on lists T402584
  • 10:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2014.codfw.wmnet
  • 10:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2013.codfw.wmnet
  • 09:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2013.codfw.wmnet
  • 09:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1248 (T402763)', diff saved to https://phabricator.wikimedia.org/P83095 and previous config saved to /var/cache/conftool/dbconfig/20250910-095700-ladsgroup.json
  • 09:56 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 09:56 moritzm: upgrading Envoy on lists T402584
  • 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2012.codfw.wmnet
  • 09:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2012.codfw.wmnet
  • 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2011.codfw.wmnet
  • 09:47 moritzm: upgrading Envoy on contint T402584
  • 09:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:44 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=0) rolling restart_daemons on A:logstash-collector
  • 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2011.codfw.wmnet
  • 09:38 moritzm: upgrading Envoy on Logstash T402584
  • 09:37 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector
  • 09:28 claime: cgoubert@deploy1003:/home$ sudo lvextend -L +20G /dev/vg0/root && sudo resize2fs /dev/vg0/root - T404060
  • 08:19 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1011.eqiad.wmnet
  • 08:19 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1011.eqiad.wmnet
  • 08:19 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1011.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:16 jayme@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1002"
  • 08:16 jayme@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1002
  • 08:15 jayme@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1002
  • 08:15 jayme@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1002"
  • 08:14 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1011.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:12 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1011.eqiad.wmnet
  • 08:06 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1011.eqiad.wmnet
  • 08:06 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1010.eqiad.wmnet
  • 08:06 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1010.eqiad.wmnet
  • 08:05 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1010.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:00 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1010.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 07:59 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1010.eqiad.wmnet
  • 07:54 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1010.eqiad.wmnet
  • 07:54 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1009.eqiad.wmnet
  • 07:54 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1009.eqiad.wmnet
  • 07:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 07:50 brouberol: upgraded envoy on dse-k8s-eqiad/dataset-config(-next) - T402584
  • 07:49 moritzm: upgrading Envoy on chartmuseum* T402584
  • 07:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 07:46 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1009.eqiad.wmnet
  • 07:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/datasets-config: apply
  • 07:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config: apply
  • 07:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/datasets-config-next: apply
  • 07:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config-next: apply
  • 07:41 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1009.eqiad.wmnet
  • 07:41 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1009.eqiad.wmnet
  • 07:41 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1009.eqiad.wmnet
  • 07:14 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: sync
  • 07:14 elukey@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: sync
  • 07:14 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: sync
  • 07:13 elukey@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: sync
  • 07:13 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/image-suggestion: sync
  • 07:13 elukey@deploy1003: helmfile [staging] START helmfile.d/services/image-suggestion: sync
  • 07:13 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: sync
  • 07:12 elukey@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: sync
  • 07:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: sync
  • 07:11 elukey@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: sync
  • 07:11 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: sync
  • 07:11 elukey@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: sync
  • 07:06 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: sync
  • 07:06 elukey@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: sync
  • 06:04 moritzm: installing node-minipass security updates
  • 05:46 moritzm: rebalance ganeti03 in esams T402259
  • 03:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T402763)', diff saved to https://phabricator.wikimedia.org/P83094 and previous config saved to /var/cache/conftool/dbconfig/20250910-030243-fceratto.json
  • 02:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P83093 and previous config saved to /var/cache/conftool/dbconfig/20250910-024735-fceratto.json
  • 02:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P83092 and previous config saved to /var/cache/conftool/dbconfig/20250910-023228-fceratto.json
  • 02:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T402763)', diff saved to https://phabricator.wikimedia.org/P83091 and previous config saved to /var/cache/conftool/dbconfig/20250910-021720-fceratto.json
  • 02:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T402763)', diff saved to https://phabricator.wikimedia.org/P83090 and previous config saved to /var/cache/conftool/dbconfig/20250910-021116-fceratto.json
  • 02:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 01:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T402763)', diff saved to https://phabricator.wikimedia.org/P83089 and previous config saved to /var/cache/conftool/dbconfig/20250910-012533-fceratto.json
  • 01:20 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T402763)', diff saved to https://phabricator.wikimedia.org/P83088 and previous config saved to /var/cache/conftool/dbconfig/20250910-012006-fceratto.json
  • 01:19 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 01:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 01:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T402763)', diff saved to https://phabricator.wikimedia.org/P83087 and previous config saved to /var/cache/conftool/dbconfig/20250910-011506-fceratto.json
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 12m 08s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P83086 and previous config saved to /var/cache/conftool/dbconfig/20250910-005958-fceratto.json
  • 00:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P83085 and previous config saved to /var/cache/conftool/dbconfig/20250910-004451-fceratto.json
  • 00:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T402763)', diff saved to https://phabricator.wikimedia.org/P83084 and previous config saved to /var/cache/conftool/dbconfig/20250910-002943-fceratto.json
  • 00:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T402763)', diff saved to https://phabricator.wikimedia.org/P83083 and previous config saved to /var/cache/conftool/dbconfig/20250910-002338-fceratto.json
  • 00:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 00:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T402763)', diff saved to https://phabricator.wikimedia.org/P83082 and previous config saved to /var/cache/conftool/dbconfig/20250910-002315-fceratto.json
  • 00:18 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 00:16 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 00:16 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 00:15 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 00:14 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 00:13 rzl@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 00:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 00:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T402763)', diff saved to https://phabricator.wikimedia.org/P83081 and previous config saved to /var/cache/conftool/dbconfig/20250910-001131-fceratto.json
  • 00:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P83080 and previous config saved to /var/cache/conftool/dbconfig/20250910-000807-fceratto.json

2025-09-09

  • 23:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P83079 and previous config saved to /var/cache/conftool/dbconfig/20250909-235624-fceratto.json
  • 23:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P83078 and previous config saved to /var/cache/conftool/dbconfig/20250909-235300-fceratto.json
  • 23:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P83077 and previous config saved to /var/cache/conftool/dbconfig/20250909-234116-fceratto.json
  • 23:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T402763)', diff saved to https://phabricator.wikimedia.org/P83076 and previous config saved to /var/cache/conftool/dbconfig/20250909-233752-fceratto.json
  • 23:31 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T402763)', diff saved to https://phabricator.wikimedia.org/P83075 and previous config saved to /var/cache/conftool/dbconfig/20250909-233101-fceratto.json
  • 23:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 23:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T402763)', diff saved to https://phabricator.wikimedia.org/P83074 and previous config saved to /var/cache/conftool/dbconfig/20250909-233049-fceratto.json
  • 23:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T402763)', diff saved to https://phabricator.wikimedia.org/P83073 and previous config saved to /var/cache/conftool/dbconfig/20250909-232608-fceratto.json
  • 23:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T402763)', diff saved to https://phabricator.wikimedia.org/P83072 and previous config saved to /var/cache/conftool/dbconfig/20250909-231854-fceratto.json
  • 23:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 23:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T402763)', diff saved to https://phabricator.wikimedia.org/P83071 and previous config saved to /var/cache/conftool/dbconfig/20250909-231831-fceratto.json
  • afk: standalone SmashPig upgraded from 6f2ecf43 to 0fccf147
  • 23:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P83070 and previous config saved to /var/cache/conftool/dbconfig/20250909-231542-fceratto.json
  • 23:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P83069 and previous config saved to /var/cache/conftool/dbconfig/20250909-230323-fceratto.json
  • 23:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P83068 and previous config saved to /var/cache/conftool/dbconfig/20250909-230034-fceratto.json
  • 22:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P83067 and previous config saved to /var/cache/conftool/dbconfig/20250909-224816-fceratto.json
  • 22:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T402763)', diff saved to https://phabricator.wikimedia.org/P83066 and previous config saved to /var/cache/conftool/dbconfig/20250909-224527-fceratto.json
  • 22:38 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T402763)', diff saved to https://phabricator.wikimedia.org/P83065 and previous config saved to /var/cache/conftool/dbconfig/20250909-223835-fceratto.json
  • 22:38 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 22:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T402763)', diff saved to https://phabricator.wikimedia.org/P83064 and previous config saved to /var/cache/conftool/dbconfig/20250909-223812-fceratto.json
  • 22:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T402763)', diff saved to https://phabricator.wikimedia.org/P83063 and previous config saved to /var/cache/conftool/dbconfig/20250909-223308-fceratto.json
  • 22:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T402763)', diff saved to https://phabricator.wikimedia.org/P83062 and previous config saved to /var/cache/conftool/dbconfig/20250909-222514-fceratto.json
  • 22:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 22:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T402763)', diff saved to https://phabricator.wikimedia.org/P83061 and previous config saved to /var/cache/conftool/dbconfig/20250909-222451-fceratto.json
  • 22:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P83060 and previous config saved to /var/cache/conftool/dbconfig/20250909-222305-fceratto.json
  • 22:20 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:17 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P83059 and previous config saved to /var/cache/conftool/dbconfig/20250909-220944-fceratto.json
  • 22:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P83058 and previous config saved to /var/cache/conftool/dbconfig/20250909-220757-fceratto.json
  • 22:04 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:04 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:55 krinkle@deploy1003: Finished scap sync-world: Backport for beta: Remove replica instance from wmgMainStashServers (T401227) (duration: 12m 03s)
  • 21:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P83057 and previous config saved to /var/cache/conftool/dbconfig/20250909-215436-fceratto.json
  • 21:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T402763)', diff saved to https://phabricator.wikimedia.org/P83056 and previous config saved to /var/cache/conftool/dbconfig/20250909-215249-fceratto.json
  • 21:49 krinkle@deploy1003: krinkle, bd808: Continuing with sync
  • 21:49 krinkle@deploy1003: krinkle, bd808: Backport for beta: Remove replica instance from wmgMainStashServers (T401227) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:44 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T402763)', diff saved to https://phabricator.wikimedia.org/P83055 and previous config saved to /var/cache/conftool/dbconfig/20250909-214449-fceratto.json
  • 21:44 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 21:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T402763)', diff saved to https://phabricator.wikimedia.org/P83054 and previous config saved to /var/cache/conftool/dbconfig/20250909-214425-fceratto.json
  • 21:43 krinkle@deploy1003: Started scap sync-world: Backport for beta: Remove replica instance from wmgMainStashServers (T401227)
  • 21:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T402763)', diff saved to https://phabricator.wikimedia.org/P83053 and previous config saved to /var/cache/conftool/dbconfig/20250909-213928-fceratto.json
  • 21:31 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T402763)', diff saved to https://phabricator.wikimedia.org/P83052 and previous config saved to /var/cache/conftool/dbconfig/20250909-213134-fceratto.json
  • 21:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 21:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T402763)', diff saved to https://phabricator.wikimedia.org/P83051 and previous config saved to /var/cache/conftool/dbconfig/20250909-213112-fceratto.json
  • 21:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P83050 and previous config saved to /var/cache/conftool/dbconfig/20250909-212918-fceratto.json
  • 21:24 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:21 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P83049 and previous config saved to /var/cache/conftool/dbconfig/20250909-211605-fceratto.json
  • 21:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P83048 and previous config saved to /var/cache/conftool/dbconfig/20250909-211410-fceratto.json
  • 21:11 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:09 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:08 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 21:07 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 21:06 tgr: UTC late deploys done
  • 21:04 tgr@deploy1003: Finished scap sync-world: Backport for Add $wgJwtPrivateKey / $wgJwtPublicKey in the fake privatre repo (T399631) (duration: 16m 16s)
  • 21:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P83047 and previous config saved to /var/cache/conftool/dbconfig/20250909-210058-fceratto.json
  • 20:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T402763)', diff saved to https://phabricator.wikimedia.org/P83046 and previous config saved to /var/cache/conftool/dbconfig/20250909-205903-fceratto.json
  • 20:58 tgr@deploy1003: tgr: Continuing with sync
  • 20:57 mutante: deploy1003/deploy2002 - find /srv/patches/ -name "*.patch" -not -perm 0664 -print0 | xargs -0 -r sudo chmod 0664
  • 20:54 tgr@deploy1003: tgr: Backport for Add $wgJwtPrivateKey / $wgJwtPublicKey in the fake privatre repo (T399631) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:52 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T402763)', diff saved to https://phabricator.wikimedia.org/P83045 and previous config saved to /var/cache/conftool/dbconfig/20250909-205226-fceratto.json
  • 20:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 20:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T402763)', diff saved to https://phabricator.wikimedia.org/P83044 and previous config saved to /var/cache/conftool/dbconfig/20250909-205203-fceratto.json
  • 20:49 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:48 tgr@deploy1003: Started scap sync-world: Backport for Add $wgJwtPrivateKey / $wgJwtPublicKey in the fake privatre repo (T399631)
  • 20:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T402763)', diff saved to https://phabricator.wikimedia.org/P83043 and previous config saved to /var/cache/conftool/dbconfig/20250909-204550-fceratto.json
  • 20:38 cjming@deploy1003: Finished scap sync-world: Backport for Set wgCampaignEventsCountrySchemaMigrationStage to MIGRATION_NEW (T397476) (duration: 15m 45s)
  • 20:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T402763)', diff saved to https://phabricator.wikimedia.org/P83042 and previous config saved to /var/cache/conftool/dbconfig/20250909-203754-fceratto.json
  • 20:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 20:36 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1056.eqiad.wmnet with OS bookworm
  • 20:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P83041 and previous config saved to /var/cache/conftool/dbconfig/20250909-203656-fceratto.json
  • 20:36 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:33 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:33 cjming@deploy1003: cjming, daimona: Continuing with sync
  • 20:28 cjming@deploy1003: cjming, daimona: Backport for Set wgCampaignEventsCountrySchemaMigrationStage to MIGRATION_NEW (T397476) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:22 cjming@deploy1003: Started scap sync-world: Backport for Set wgCampaignEventsCountrySchemaMigrationStage to MIGRATION_NEW (T397476)
  • 20:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P83040 and previous config saved to /var/cache/conftool/dbconfig/20250909-202148-fceratto.json
  • 20:16 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:15 dani@deploy1003: Finished scap sync-world: Backport for Deploy Newcomers survey on enwiki (T402915) (duration: 12m 25s)
  • 20:13 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:10 dani@deploy1003: dani: Continuing with sync
  • 20:09 dani@deploy1003: dani: Backport for Deploy Newcomers survey on enwiki (T402915) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:08 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 20:07 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:07 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T402763)', diff saved to https://phabricator.wikimedia.org/P83039 and previous config saved to /var/cache/conftool/dbconfig/20250909-200641-fceratto.json
  • 20:05 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1056.eqiad.wmnet with OS bookworm
  • 20:03 dani@deploy1003: Started scap sync-world: Backport for Deploy Newcomers survey on enwiki (T402915)
  • 20:02 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:01 damilare: SmashPig upgraded from 6031b3c4 to 6f2ecf43
  • 19:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T402763)', diff saved to https://phabricator.wikimedia.org/P83038 and previous config saved to /var/cache/conftool/dbconfig/20250909-195955-fceratto.json
  • 19:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 19:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T402763)', diff saved to https://phabricator.wikimedia.org/P83037 and previous config saved to /var/cache/conftool/dbconfig/20250909-195932-fceratto.json
  • 19:58 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T402763)', diff saved to https://phabricator.wikimedia.org/P83036 and previous config saved to /var/cache/conftool/dbconfig/20250909-195210-fceratto.json
  • 19:50 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:49 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 19:46 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T402763)', diff saved to https://phabricator.wikimedia.org/P83035 and previous config saved to /var/cache/conftool/dbconfig/20250909-194534-fceratto.json
  • 19:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 19:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T402763)', diff saved to https://phabricator.wikimedia.org/P83034 and previous config saved to /var/cache/conftool/dbconfig/20250909-194512-fceratto.json
  • 19:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P83033 and previous config saved to /var/cache/conftool/dbconfig/20250909-194425-fceratto.json
  • 19:42 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 19:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P83032 and previous config saved to /var/cache/conftool/dbconfig/20250909-193005-fceratto.json
  • 19:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P83031 and previous config saved to /var/cache/conftool/dbconfig/20250909-192917-fceratto.json
  • 19:25 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 19:25 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephmon2004-dev.codfw.wmnet with OS trixie
  • 19:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P83030 and previous config saved to /var/cache/conftool/dbconfig/20250909-191457-fceratto.json
  • 19:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T402763)', diff saved to https://phabricator.wikimedia.org/P83029 and previous config saved to /var/cache/conftool/dbconfig/20250909-191410-fceratto.json
  • 19:12 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS trixie
  • 19:09 dduvall@deploy1003: Finished scap sync-world: Backport for TOTP: Fix logic for displaying TOTPEnableForm (T404091 T230042) (duration: 28m 18s)
  • 19:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T402763)', diff saved to https://phabricator.wikimedia.org/P83026 and previous config saved to /var/cache/conftool/dbconfig/20250909-190716-fceratto.json
  • 19:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 19:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T402763)', diff saved to https://phabricator.wikimedia.org/P83025 and previous config saved to /var/cache/conftool/dbconfig/20250909-190654-fceratto.json
  • 19:03 dduvall@deploy1003: dduvall, reedy: Continuing with sync
  • 18:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T402763)', diff saved to https://phabricator.wikimedia.org/P83024 and previous config saved to /var/cache/conftool/dbconfig/20250909-185950-fceratto.json
  • 18:55 ejegg: payments-wiki upgraded from 0117408e to 10d200b1
  • 18:52 ejegg: donorwiki upgraded from bd6de034 to 10d200b1
  • 18:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T402763)', diff saved to https://phabricator.wikimedia.org/P83023 and previous config saved to /var/cache/conftool/dbconfig/20250909-185158-fceratto.json
  • 18:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P83022 and previous config saved to /var/cache/conftool/dbconfig/20250909-185146-fceratto.json
  • 18:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 18:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T402763)', diff saved to https://phabricator.wikimedia.org/P83021 and previous config saved to /var/cache/conftool/dbconfig/20250909-185128-fceratto.json
  • 18:46 dduvall@deploy1003: dduvall, reedy: Backport for TOTP: Fix logic for displaying TOTPEnableForm (T404091 T230042) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:46 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:45 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:40 dduvall@deploy1003: Started scap sync-world: Backport for TOTP: Fix logic for displaying TOTPEnableForm (T404091 T230042)
  • 18:39 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.18 refs T396379
  • 18:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P83020 and previous config saved to /var/cache/conftool/dbconfig/20250909-183639-fceratto.json
  • 18:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P83019 and previous config saved to /var/cache/conftool/dbconfig/20250909-183619-fceratto.json
  • 18:31 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon2004-dev.codfw.wmnet with OS trixie
  • 18:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T402763)', diff saved to https://phabricator.wikimedia.org/P83018 and previous config saved to /var/cache/conftool/dbconfig/20250909-182132-fceratto.json
  • 18:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P83017 and previous config saved to /var/cache/conftool/dbconfig/20250909-182112-fceratto.json
  • 18:14 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T402763)', diff saved to https://phabricator.wikimedia.org/P83016 and previous config saved to /var/cache/conftool/dbconfig/20250909-181448-fceratto.json
  • 18:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 18:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 18:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T402763)', diff saved to https://phabricator.wikimedia.org/P83015 and previous config saved to /var/cache/conftool/dbconfig/20250909-180605-fceratto.json
  • 17:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T402763)', diff saved to https://phabricator.wikimedia.org/P83014 and previous config saved to /var/cache/conftool/dbconfig/20250909-175808-fceratto.json
  • 17:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 17:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T402763)', diff saved to https://phabricator.wikimedia.org/P83013 and previous config saved to /var/cache/conftool/dbconfig/20250909-175746-fceratto.json
  • 17:47 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS trixie
  • 17:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P83012 and previous config saved to /var/cache/conftool/dbconfig/20250909-174239-fceratto.json
  • 17:38 swfrench-wmf: migrated shellbox-syntaxhighlight to PHP 8.3 in eqiad - T403284
  • 17:38 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:37 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T401906)', diff saved to https://phabricator.wikimedia.org/P83011 and previous config saved to /var/cache/conftool/dbconfig/20250909-173609-fceratto.json
  • 17:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P83010 and previous config saved to /var/cache/conftool/dbconfig/20250909-172731-fceratto.json
  • 17:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P83008 and previous config saved to /var/cache/conftool/dbconfig/20250909-172102-fceratto.json
  • 17:17 wfan: payments-wiki upgraded from bd6de034 to 0117408e
  • 17:15 swfrench-wmf: migrated shellbox-syntaxhighlight to PHP 8.3 in codfw - T403284
  • 17:14 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:14 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:13 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:12 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T402763)', diff saved to https://phabricator.wikimedia.org/P83007 and previous config saved to /var/cache/conftool/dbconfig/20250909-171224-fceratto.json
  • 17:12 swfrench-wmf: restored locally modified `helmfile.d/dse-k8s-services/_airflow_common_/values-dev.yaml` duplicating https://gerrit.wikimedia.org/r/1186497 on deploy1003 to unstick deployment-charts updates
  • 17:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T402763)', diff saved to https://phabricator.wikimedia.org/P83006 and previous config saved to /var/cache/conftool/dbconfig/20250909-171140-fceratto.json
  • 17:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T402763)', diff saved to https://phabricator.wikimedia.org/P83005 and previous config saved to /var/cache/conftool/dbconfig/20250909-170701-fceratto.json
  • 17:06 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 17:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P83004 and previous config saved to /var/cache/conftool/dbconfig/20250909-170554-fceratto.json
  • 17:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T402763)', diff saved to https://phabricator.wikimedia.org/P83003 and previous config saved to /var/cache/conftool/dbconfig/20250909-170429-fceratto.json
  • 17:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 17:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T402763)', diff saved to https://phabricator.wikimedia.org/P83002 and previous config saved to /var/cache/conftool/dbconfig/20250909-170406-fceratto.json
  • 16:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T401906)', diff saved to https://phabricator.wikimedia.org/P83001 and previous config saved to /var/cache/conftool/dbconfig/20250909-165047-fceratto.json
  • 16:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P83000 and previous config saved to /var/cache/conftool/dbconfig/20250909-164858-fceratto.json
  • 16:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2240 (T401906)', diff saved to https://phabricator.wikimedia.org/P82999 and previous config saved to /var/cache/conftool/dbconfig/20250909-164836-fceratto.json
  • 16:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2240.codfw.wmnet with reason: Maintenance
  • 16:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 16:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T401906)', diff saved to https://phabricator.wikimedia.org/P82998 and previous config saved to /var/cache/conftool/dbconfig/20250909-164806-fceratto.json
  • 16:37 topranks: drain set BGP to graceful shutdown mode on cr1-codfw to drain traffic ahead of power supply test T401937
  • 16:33 topranks: drain transport circuits landing on cr1-codfw ahead of power supply test on router T401937
  • 16:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P82997 and previous config saved to /var/cache/conftool/dbconfig/20250909-163351-fceratto.json
  • 16:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P82996 and previous config saved to /var/cache/conftool/dbconfig/20250909-163258-fceratto.json
  • 16:25 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2204 to s2 primary T404106', diff saved to https://phabricator.wikimedia.org/P82995 and previous config saved to /var/cache/conftool/dbconfig/20250909-162514-fceratto.json
  • 16:24 federico3: Starting s2 codfw failover from db2207 to db2204 - T404106
  • 16:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T402763)', diff saved to https://phabricator.wikimedia.org/P82994 and previous config saved to /var/cache/conftool/dbconfig/20250909-161844-fceratto.json
  • 16:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P82993 and previous config saved to /var/cache/conftool/dbconfig/20250909-161751-fceratto.json
  • 16:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T404106
  • 16:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T402763)', diff saved to https://phabricator.wikimedia.org/P82992 and previous config saved to /var/cache/conftool/dbconfig/20250909-161142-fceratto.json
  • 16:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 16:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T402925)', diff saved to https://phabricator.wikimedia.org/P82991 and previous config saved to /var/cache/conftool/dbconfig/20250909-161010-ladsgroup.json
  • 16:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T401906)', diff saved to https://phabricator.wikimedia.org/P82990 and previous config saved to /var/cache/conftool/dbconfig/20250909-160243-fceratto.json
  • 16:00 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T401906)', diff saved to https://phabricator.wikimedia.org/P82989 and previous config saved to /var/cache/conftool/dbconfig/20250909-160032-fceratto.json
  • 16:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 16:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T401906)', diff saved to https://phabricator.wikimedia.org/P82988 and previous config saved to /var/cache/conftool/dbconfig/20250909-160010-fceratto.json
  • 15:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P82987 and previous config saved to /var/cache/conftool/dbconfig/20250909-155503-ladsgroup.json
  • 15:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P82986 and previous config saved to /var/cache/conftool/dbconfig/20250909-154503-fceratto.json
  • 15:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P82985 and previous config saved to /var/cache/conftool/dbconfig/20250909-153955-ladsgroup.json
  • 15:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P82984 and previous config saved to /var/cache/conftool/dbconfig/20250909-152956-fceratto.json
  • 15:24 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T402925)', diff saved to https://phabricator.wikimedia.org/P82983 and previous config saved to /var/cache/conftool/dbconfig/20250909-152447-ladsgroup.json
  • 15:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T401906)', diff saved to https://phabricator.wikimedia.org/P82982 and previous config saved to /var/cache/conftool/dbconfig/20250909-151449-fceratto.json
  • 15:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2236 (T401906)', diff saved to https://phabricator.wikimedia.org/P82981 and previous config saved to /var/cache/conftool/dbconfig/20250909-151238-fceratto.json
  • 15:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2236.codfw.wmnet with reason: Maintenance
  • 15:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T401906)', diff saved to https://phabricator.wikimedia.org/P82980 and previous config saved to /var/cache/conftool/dbconfig/20250909-151214-fceratto.json
  • 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti3008.esams.wmnet to cluster esams03 and group B
  • 15:07 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3008.esams.wmnet to cluster esams03 and group B
  • 15:07 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti3007.esams.wmnet to cluster esams03 and group B
  • 15:07 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3007.esams.wmnet to cluster esams03 and group B
  • 15:05 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti3008.esams.wmnet to cluster esams03 and group B
  • 15:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3008.esams.wmnet to cluster esams03 and group B
  • 15:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
  • 14:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P82979 and previous config saved to /var/cache/conftool/dbconfig/20250909-145707-fceratto.json
  • 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
  • 14:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1172.eqiad.wmnet onto db1193.eqiad.wmnet
  • 14:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1193 gradually with 4 steps - Pool db1193.eqiad.wmnet in after cloning
  • 14:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P82977 and previous config saved to /var/cache/conftool/dbconfig/20250909-144159-fceratto.json
  • 14:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T401906)', diff saved to https://phabricator.wikimedia.org/P82975 and previous config saved to /var/cache/conftool/dbconfig/20250909-142652-fceratto.json
  • 14:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T401906)', diff saved to https://phabricator.wikimedia.org/P82974 and previous config saved to /var/cache/conftool/dbconfig/20250909-142441-fceratto.json
  • 14:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 14:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T401906)', diff saved to https://phabricator.wikimedia.org/P82973 and previous config saved to /var/cache/conftool/dbconfig/20250909-142417-fceratto.json
  • 14:15 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1181* gradually with 4 steps - Work done
  • 14:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P82969 and previous config saved to /var/cache/conftool/dbconfig/20250909-140910-fceratto.json
  • 14:03 moritzm: upgrading Envoy on schema* T402584
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
  • 14:00 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
  • 13:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3008.esams.wmnet with OS bookworm
  • 13:58 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1193 gradually with 4 steps - Pool db1193.eqiad.wmnet in after cloning
  • 13:57 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
  • 13:56 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
  • 13:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P82966 and previous config saved to /var/cache/conftool/dbconfig/20250909-135402-fceratto.json
  • 13:53 moritzm: upgrading Envoy on config-master* T402584
  • 13:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T402925)', diff saved to https://phabricator.wikimedia.org/P82964 and previous config saved to /var/cache/conftool/dbconfig/20250909-133952-ladsgroup.json
  • 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3008.esams.wmnet with reason: host reimage
  • 13:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T401906)', diff saved to https://phabricator.wikimedia.org/P82963 and previous config saved to /var/cache/conftool/dbconfig/20250909-133855-fceratto.json
  • 13:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T401906)', diff saved to https://phabricator.wikimedia.org/P82962 and previous config saved to /var/cache/conftool/dbconfig/20250909-133644-fceratto.json
  • 13:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 13:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3008.esams.wmnet with reason: host reimage
  • 13:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T401906)', diff saved to https://phabricator.wikimedia.org/P82961 and previous config saved to /var/cache/conftool/dbconfig/20250909-133621-fceratto.json
  • 13:35 XioNoX: rolling https://gerrit.wikimedia.org/r/c/operations/puppet/+/1186501 one routed ganeti host a a time
  • 13:29 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1181* gradually with 4 steps - Work done
  • 13:24 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P82959 and previous config saved to /var/cache/conftool/dbconfig/20250909-132444-ladsgroup.json
  • 13:23 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1181.eqiad.wmnet
  • 13:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P82958 and previous config saved to /var/cache/conftool/dbconfig/20250909-132113-fceratto.json
  • 13:19 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:17 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1181 - Upgrading db1181.eqiad.wmnet
  • 13:16 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 13:16 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS trixie
  • 13:16 ladsgroup@cumin1003: START - Cookbook sre.mysql.depool db1181 - Upgrading db1181.eqiad.wmnet
  • 13:15 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db1181.eqiad.wmnet
  • 13:15 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2184.codfw.wmnet
  • 13:15 jynus@cumin1003: START - Cookbook sre.hosts.remove-downtime for db2184.codfw.wmnet
  • 13:14 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:13 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:13 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:10 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 13:09 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS trixie
  • 13:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P82956 and previous config saved to /var/cache/conftool/dbconfig/20250909-130937-ladsgroup.json
  • 13:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P82954 and previous config saved to /var/cache/conftool/dbconfig/20250909-130606-fceratto.json
  • 13:05 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS trixie
  • 13:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3008.esams.wmnet with OS bookworm
  • 13:02 jynus: upgrading backup1-codfw db2184 mariadb package
  • 13:01 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2184.codfw.wmnet with reason: mariadb upgrade
  • 12:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1172 gradually with 4 steps - Pool db1172.eqiad.wmnet in after cloning
  • 12:57 XioNoX: test hotfix for doh3006 v6 bird
  • 12:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T401906)', diff saved to https://phabricator.wikimedia.org/P82951 and previous config saved to /var/cache/conftool/dbconfig/20250909-125621-fceratto.json
  • 12:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T402925)', diff saved to https://phabricator.wikimedia.org/P82950 and previous config saved to /var/cache/conftool/dbconfig/20250909-125429-ladsgroup.json
  • 12:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T401906)', diff saved to https://phabricator.wikimedia.org/P82949 and previous config saved to /var/cache/conftool/dbconfig/20250909-125058-fceratto.json
  • 12:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T401906)', diff saved to https://phabricator.wikimedia.org/P82948 and previous config saved to /var/cache/conftool/dbconfig/20250909-124847-fceratto.json
  • 12:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 12:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 12:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T401906)', diff saved to https://phabricator.wikimedia.org/P82947 and previous config saved to /var/cache/conftool/dbconfig/20250909-124817-fceratto.json
  • 12:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2213 (T402925)', diff saved to https://phabricator.wikimedia.org/P82945 and previous config saved to /var/cache/conftool/dbconfig/20250909-124309-ladsgroup.json
  • 12:43 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 12:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P82944 and previous config saved to /var/cache/conftool/dbconfig/20250909-124114-fceratto.json
  • 12:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2240 (T402925)', diff saved to https://phabricator.wikimedia.org/P82943 and previous config saved to /var/cache/conftool/dbconfig/20250909-123859-ladsgroup.json
  • 12:38 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2240.codfw.wmnet with reason: Maintenance
  • 12:38 ladsgroup@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 12:00:00 on db2240.codfw.wmnet with reason: Maintenance
  • 12:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P82941 and previous config saved to /var/cache/conftool/dbconfig/20250909-123309-fceratto.json
  • 12:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P82939 and previous config saved to /var/cache/conftool/dbconfig/20250909-122607-fceratto.json
  • 12:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2010.codfw.wmnet with OS bookworm
  • 12:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P82938 and previous config saved to /var/cache/conftool/dbconfig/20250909-121802-fceratto.json
  • 12:12 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1172 gradually with 4 steps - Pool db1172.eqiad.wmnet in after cloning
  • 12:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T401906)', diff saved to https://phabricator.wikimedia.org/P82936 and previous config saved to /var/cache/conftool/dbconfig/20250909-121059-fceratto.json
  • 12:08 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2213 (T401906)', diff saved to https://phabricator.wikimedia.org/P82935 and previous config saved to /var/cache/conftool/dbconfig/20250909-120825-fceratto.json
  • 12:08 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 12:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
  • 12:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T401906)', diff saved to https://phabricator.wikimedia.org/P82934 and previous config saved to /var/cache/conftool/dbconfig/20250909-120254-fceratto.json
  • 12:00 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T401906)', diff saved to https://phabricator.wikimedia.org/P82933 and previous config saved to /var/cache/conftool/dbconfig/20250909-120043-fceratto.json
  • 12:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 12:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T401906)', diff saved to https://phabricator.wikimedia.org/P82932 and previous config saved to /var/cache/conftool/dbconfig/20250909-120021-fceratto.json
  • 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
  • 11:53 btullis@cumin1003: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 11:52 fceratto@cumin1002: dbctl commit (dc=all): 'Set dbctl values for db2213 T404067', diff saved to https://phabricator.wikimedia.org/P82929 and previous config saved to /var/cache/conftool/dbconfig/20250909-115245-fceratto.json
  • 11:47 btullis@cumin1003: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 11:47 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 11:46 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS bookworm
  • 11:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P82928 and previous config saved to /var/cache/conftool/dbconfig/20250909-114514-fceratto.json
  • 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove ganeti02 VIP in esams - jmm@cumin2002"
  • 11:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove ganeti02 VIP in esams - jmm@cumin2002"
  • 11:39 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:39 btullis@cumin1003: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons.
  • 11:37 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2192 to s5 primary T404067', diff saved to https://phabricator.wikimedia.org/P82927 and previous config saved to /var/cache/conftool/dbconfig/20250909-113740-fceratto.json
  • 11:36 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti3008.esams.wmnet with OS bookworm
  • 11:35 federico3: Starting s5 codfw failover from db2213 to db2192 - T404067
  • 11:35 moritzm: update bookworm d-i image T403852
  • 11:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1056.eqiad.wmnet with OS bookworm
  • 11:32 btullis@cumin1003: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons.
  • 11:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P82926 and previous config saved to /var/cache/conftool/dbconfig/20250909-113006-fceratto.json
  • 11:28 fceratto@cumin1002: dbctl commit (dc=all): 'Remove db2192 from API/vslow/dump T404067', diff saved to https://phabricator.wikimedia.org/P82925 and previous config saved to /var/cache/conftool/dbconfig/20250909-112828-fceratto.json
  • 11:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s5 T404067
  • 11:15 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1008.eqiad.wmnet
  • 11:15 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1008.eqiad.wmnet
  • 11:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T401906)', diff saved to https://phabricator.wikimedia.org/P82924 and previous config saved to /var/cache/conftool/dbconfig/20250909-111459-fceratto.json
  • 11:14 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:14 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T401906)', diff saved to https://phabricator.wikimedia.org/P82923 and previous config saved to /var/cache/conftool/dbconfig/20250909-111250-fceratto.json
  • 11:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 11:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T401906)', diff saved to https://phabricator.wikimedia.org/P82922 and previous config saved to /var/cache/conftool/dbconfig/20250909-111226-fceratto.json
  • 11:10 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1008.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 11:09 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1008.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 11:07 btullis@cumin1003: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-flink-eqiad cluster: Roll restart of jvm daemons.
  • 11:07 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1008.eqiad.wmnet
  • {{safesubst:SAL entry|1=11:06 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Add the CheckUserSuggestedInvestigationsGetSignals hook (T403111), Define the CheckUserSuggestedInvestigationsBeforeCaseCreated hook (T403959), Follow-up: Add the CheckUserSuggestedInvestigationsSignalMatch hook (T403111), [[gerrit:1186458|Follow-up: Add the CheckUserSuggestedInvestigationsSignalMatch}}
  • 11:02 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1008.eqiad.wmnet
  • 11:01 btullis@cumin1003: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-eqiad cluster: Roll restart of jvm daemons.
  • 10:59 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • {{safesubst:SAL entry|1=10:58 dreamyjazz@deploy1003: dreamyjazz: Backport for Add the CheckUserSuggestedInvestigationsGetSignals hook (T403111), Define the CheckUserSuggestedInvestigationsBeforeCaseCreated hook (T403959), Follow-up: Add the CheckUserSuggestedInvestigationsSignalMatch hook (T403111), [[gerrit:1186458|Follow-up: Add the CheckUserSuggestedInvestigationsSignalMatch hook (T403111}}
  • 10:57 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 10:57 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS bookworm
  • 10:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P82921 and previous config saved to /var/cache/conftool/dbconfig/20250909-105719-fceratto.json
  • 10:54 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:54 btullis@cumin1003: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 10:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:53 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 10:53 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2010.codfw.wmnet with OS bookworm
  • {{safesubst:SAL entry|1=10:51 dreamyjazz@deploy1003: Started scap sync-world: Backport for Add the CheckUserSuggestedInvestigationsGetSignals hook (T403111), Define the CheckUserSuggestedInvestigationsBeforeCaseCreated hook (T403959), Follow-up: Add the CheckUserSuggestedInvestigationsSignalMatch hook (T403111), [[gerrit:1186458|Follow-up: Add the CheckUserSuggestedInvestigationsSignalMatch}}
  • 10:49 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:49 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:48 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 10:48 btullis@cumin1003: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 10:48 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1172.eqiad.wmnet onto db1193.eqiad.wmnet
  • 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 10:47 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1172.eqiad.wmnet onto db1193.eqiad.wmnet
  • 10:47 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1172.eqiad.wmnet onto db1193.eqiad.wmnet
  • 10:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2010.codfw.wmnet
  • 10:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1172 for clone', diff saved to https://phabricator.wikimedia.org/P82920 and previous config saved to /var/cache/conftool/dbconfig/20250909-104321-ladsgroup.json
  • 10:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P82919 and previous config saved to /var/cache/conftool/dbconfig/20250909-104211-fceratto.json
  • 10:41 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 10:41 btullis@cumin1003: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 10:39 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1193.eqiad.wmnet
  • 10:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2010.codfw.wmnet
  • 10:37 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 10:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3008.esams.wmnet with OS bookworm
  • 10:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:34 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2010.codfw.wmnet with OS bullseye
  • 10:33 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:31 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) db1193 - Upgrading db1193.eqiad.wmnet
  • 10:30 ladsgroup@cumin1002: START - Cookbook sre.mysql.depool db1193 - Upgrading db1193.eqiad.wmnet
  • 10:30 ladsgroup@cumin1002: START - Cookbook sre.mysql.upgrade for db1193.eqiad.wmnet
  • 10:29 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2170* gradually with 4 steps - Work done
  • 10:28 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1193.eqiad.wmnet with reason: Glow up
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti3008.esams.wmnet
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
  • 10:27 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1247 gradually with 4 steps - Maint over
  • 10:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1193 T404027', diff saved to https://phabricator.wikimedia.org/P82913 and previous config saved to /var/cache/conftool/dbconfig/20250909-102709-ladsgroup.json
  • 10:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T401906)', diff saved to https://phabricator.wikimedia.org/P82912 and previous config saved to /var/cache/conftool/dbconfig/20250909-102704-fceratto.json
  • 10:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T401906)', diff saved to https://phabricator.wikimedia.org/P82911 and previous config saved to /var/cache/conftool/dbconfig/20250909-102554-fceratto.json
  • 10:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 10:23 ladsgroup@dns1004: END - running authdns-update
  • 10:22 ladsgroup@dns1004: START - running authdns-update
  • 10:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Promote db1209 to s8 primary and set section read-write T404027', diff saved to https://phabricator.wikimedia.org/P82906 and previous config saved to /var/cache/conftool/dbconfig/20250909-101945-ladsgroup.json
  • 10:19 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T404027', diff saved to https://phabricator.wikimedia.org/P82904 and previous config saved to /var/cache/conftool/dbconfig/20250909-101828-ladsgroup.json
  • 10:18 Amir1: Starting s8 eqiad failover from db1193 to db1209 - T404027
  • 10:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
  • 10:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set db1209 with weight 0 T404027', diff saved to https://phabricator.wikimedia.org/P82900 and previous config saved to /var/cache/conftool/dbconfig/20250909-101103-ladsgroup.json
  • 10:10 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 T404027
  • 10:09 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2240 API T404050', diff saved to https://phabricator.wikimedia.org/P82899 and previous config saved to /var/cache/conftool/dbconfig/20250909-100925-fceratto.json
  • 10:05 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2179 to s4 primary T404050', diff saved to https://phabricator.wikimedia.org/P82896 and previous config saved to /var/cache/conftool/dbconfig/20250909-100534-fceratto.json
  • 10:04 federico3: Starting s4 codfw failover from db2240 to db2179 - T404050
  • 09:58 fceratto@cumin1002: dbctl commit (dc=all): 'Remove db2179 from API/vslow/dump T404050', diff saved to https://phabricator.wikimedia.org/P82893 and previous config saved to /var/cache/conftool/dbconfig/20250909-095829-fceratto.json
  • 09:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
  • 09:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T404050
  • 09:54 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
  • 09:53 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1247 gradually with 4 steps - Maint over
  • 09:52 vgutierrez: restarting turnilo
  • 09:52 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T402763)', diff saved to https://phabricator.wikimedia.org/P82891 and previous config saved to /var/cache/conftool/dbconfig/20250909-095158-ladsgroup.json
  • 09:45 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti3008.esams.wmnet
  • 09:43 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db2170* gradually with 4 steps - Work done
  • 09:42 ladsgroup@cumin1003: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2170* gradually with 4 steps - Work done
  • 09:36 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db2170* gradually with 4 steps - Work done
  • 09:36 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2170.codfw.wmnet
  • 09:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1247 (T402763)', diff saved to https://phabricator.wikimedia.org/P82889 and previous config saved to /var/cache/conftool/dbconfig/20250909-093259-ladsgroup.json
  • 09:32 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 09:29 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bullseye
  • 09:29 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2170 - Upgrading db2170.codfw.wmnet
  • 09:29 ladsgroup@cumin1003: START - Cookbook sre.mysql.depool db2170 - Upgrading db2170.codfw.wmnet
  • 09:29 ladsgroup@cumin1003: START - Cookbook sre.mysql.upgrade for db2170.codfw.wmnet
  • 09:28 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 09:14 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 09:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T402763)', diff saved to https://phabricator.wikimedia.org/P82883 and previous config saved to /var/cache/conftool/dbconfig/20250909-091405-ladsgroup.json
  • 09:12 moritzm: installing openssh security updates on Bullseye
  • 08:58 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P82882 and previous config saved to /var/cache/conftool/dbconfig/20250909-085858-ladsgroup.json
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install3003.wikimedia.org
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install3003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:46 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install3003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:45 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-redacteddb1001.eqiad.wmnet
  • 08:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P82881 and previous config saved to /var/cache/conftool/dbconfig/20250909-084350-ladsgroup.json
  • 08:42 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:36 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install3003.wikimedia.org
  • 08:32 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet
  • 08:30 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 08:28 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T402763)', diff saved to https://phabricator.wikimedia.org/P82880 and previous config saved to /var/cache/conftool/dbconfig/20250909-082842-ladsgroup.json
  • 08:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1244 (T402763)', diff saved to https://phabricator.wikimedia.org/P82879 and previous config saved to /var/cache/conftool/dbconfig/20250909-080815-ladsgroup.json
  • 08:08 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 08:07 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T402763)', diff saved to https://phabricator.wikimedia.org/P82878 and previous config saved to /var/cache/conftool/dbconfig/20250909-080752-ladsgroup.json
  • 08:01 XioNoX: push pfw policy - T403972
  • 07:52 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P82877 and previous config saved to /var/cache/conftool/dbconfig/20250909-075245-ladsgroup.json
  • 07:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2170 (T402925)', diff saved to https://phabricator.wikimedia.org/P82876 and previous config saved to /var/cache/conftool/dbconfig/20250909-074229-ladsgroup.json
  • 07:42 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T402925)', diff saved to https://phabricator.wikimedia.org/P82875 and previous config saved to /var/cache/conftool/dbconfig/20250909-074205-ladsgroup.json
  • 07:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P82874 and previous config saved to /var/cache/conftool/dbconfig/20250909-073737-ladsgroup.json
  • 07:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P82873 and previous config saved to /var/cache/conftool/dbconfig/20250909-072658-ladsgroup.json
  • 07:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T402763)', diff saved to https://phabricator.wikimedia.org/P82872 and previous config saved to /var/cache/conftool/dbconfig/20250909-072230-ladsgroup.json
  • 07:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P82871 and previous config saved to /var/cache/conftool/dbconfig/20250909-071150-ladsgroup.json
  • 07:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1243 (T402763)', diff saved to https://phabricator.wikimedia.org/P82870 and previous config saved to /var/cache/conftool/dbconfig/20250909-070251-ladsgroup.json
  • 07:02 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 07:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T402763)', diff saved to https://phabricator.wikimedia.org/P82869 and previous config saved to /var/cache/conftool/dbconfig/20250909-070228-ladsgroup.json
  • 06:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T402925)', diff saved to https://phabricator.wikimedia.org/P82868 and previous config saved to /var/cache/conftool/dbconfig/20250909-065643-ladsgroup.json
  • 06:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2153 (T402925)', diff saved to https://phabricator.wikimedia.org/P82867 and previous config saved to /var/cache/conftool/dbconfig/20250909-065533-ladsgroup.json
  • 06:55 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 06:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T402925)', diff saved to https://phabricator.wikimedia.org/P82866 and previous config saved to /var/cache/conftool/dbconfig/20250909-065509-ladsgroup.json
  • 06:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P82865 and previous config saved to /var/cache/conftool/dbconfig/20250909-064721-ladsgroup.json
  • 06:40 eileen: civicrm upgraded from 87047980 to 4ac726d1
  • 06:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P82864 and previous config saved to /var/cache/conftool/dbconfig/20250909-064002-ladsgroup.json
  • 06:39 jmm@dns1004: END - running authdns-update
  • 06:38 jmm@dns1004: START - running authdns-update
  • 06:37 jmm@dns1004: END - running authdns-update
  • 06:36 jmm@dns1004: START - running authdns-update
  • 06:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P82863 and previous config saved to /var/cache/conftool/dbconfig/20250909-063213-ladsgroup.json
  • 06:24 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P82862 and previous config saved to /var/cache/conftool/dbconfig/20250909-062454-ladsgroup.json
  • 06:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T402763)', diff saved to https://phabricator.wikimedia.org/P82861 and previous config saved to /var/cache/conftool/dbconfig/20250909-061706-ladsgroup.json
  • 06:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T402925)', diff saved to https://phabricator.wikimedia.org/P82860 and previous config saved to /var/cache/conftool/dbconfig/20250909-060947-ladsgroup.json
  • 06:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2146 (T402925)', diff saved to https://phabricator.wikimedia.org/P82859 and previous config saved to /var/cache/conftool/dbconfig/20250909-060837-ladsgroup.json
  • 06:08 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 06:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T402925)', diff saved to https://phabricator.wikimedia.org/P82858 and previous config saved to /var/cache/conftool/dbconfig/20250909-060814-ladsgroup.json
  • 05:58 eileen: civicrm upgraded from 88b8c27b to 87047980
  • 05:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1242 (T402763)', diff saved to https://phabricator.wikimedia.org/P82857 and previous config saved to /var/cache/conftool/dbconfig/20250909-055748-ladsgroup.json
  • 05:57 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 05:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T402763)', diff saved to https://phabricator.wikimedia.org/P82856 and previous config saved to /var/cache/conftool/dbconfig/20250909-055724-ladsgroup.json
  • 05:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P82855 and previous config saved to /var/cache/conftool/dbconfig/20250909-055307-ladsgroup.json
  • 05:48 eileen: civicrm upgraded from d5afef7b to 88b8c27b
  • 05:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P82854 and previous config saved to /var/cache/conftool/dbconfig/20250909-054216-ladsgroup.json
  • 05:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P82853 and previous config saved to /var/cache/conftool/dbconfig/20250909-053759-ladsgroup.json
  • 05:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 7063
  • 05:34 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 7063
  • 05:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P82852 and previous config saved to /var/cache/conftool/dbconfig/20250909-052708-ladsgroup.json
  • 05:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T402925)', diff saved to https://phabricator.wikimedia.org/P82851 and previous config saved to /var/cache/conftool/dbconfig/20250909-052251-ladsgroup.json
  • 05:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2145 (T402925)', diff saved to https://phabricator.wikimedia.org/P82850 and previous config saved to /var/cache/conftool/dbconfig/20250909-052142-ladsgroup.json
  • 05:21 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 05:21 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 05:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T402763)', diff saved to https://phabricator.wikimedia.org/P82849 and previous config saved to /var/cache/conftool/dbconfig/20250909-051201-ladsgroup.json
  • 04:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1241 (T402763)', diff saved to https://phabricator.wikimedia.org/P82848 and previous config saved to /var/cache/conftool/dbconfig/20250909-045348-ladsgroup.json
  • 04:53 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 04:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T402763)', diff saved to https://phabricator.wikimedia.org/P82847 and previous config saved to /var/cache/conftool/dbconfig/20250909-045325-ladsgroup.json
  • 04:52 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 04:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T402925)', diff saved to https://phabricator.wikimedia.org/P82846 and previous config saved to /var/cache/conftool/dbconfig/20250909-045155-ladsgroup.json
  • 04:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P82845 and previous config saved to /var/cache/conftool/dbconfig/20250909-043818-ladsgroup.json
  • 04:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P82844 and previous config saved to /var/cache/conftool/dbconfig/20250909-043647-ladsgroup.json
  • 04:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T402925)', diff saved to https://phabricator.wikimedia.org/P82843 and previous config saved to /var/cache/conftool/dbconfig/20250909-043302-ladsgroup.json
  • 04:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P82842 and previous config saved to /var/cache/conftool/dbconfig/20250909-042310-ladsgroup.json
  • 04:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P82841 and previous config saved to /var/cache/conftool/dbconfig/20250909-042140-ladsgroup.json
  • 04:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P82840 and previous config saved to /var/cache/conftool/dbconfig/20250909-041755-ladsgroup.json
  • 04:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T402763)', diff saved to https://phabricator.wikimedia.org/P82839 and previous config saved to /var/cache/conftool/dbconfig/20250909-040802-ladsgroup.json
  • 04:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T402925)', diff saved to https://phabricator.wikimedia.org/P82838 and previous config saved to /var/cache/conftool/dbconfig/20250909-040632-ladsgroup.json
  • 04:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P82837 and previous config saved to /var/cache/conftool/dbconfig/20250909-040247-ladsgroup.json
  • 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.15 (duration: 01m 04s)
  • 03:56 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.18 refs T396379 (duration: 52m 41s)
  • 03:49 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1238 (T402763)', diff saved to https://phabricator.wikimedia.org/P82836 and previous config saved to /var/cache/conftool/dbconfig/20250909-034924-ladsgroup.json
  • 03:49 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 03:49 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T402763)', diff saved to https://phabricator.wikimedia.org/P82835 and previous config saved to /var/cache/conftool/dbconfig/20250909-034913-ladsgroup.json
  • 03:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T402925)', diff saved to https://phabricator.wikimedia.org/P82834 and previous config saved to /var/cache/conftool/dbconfig/20250909-034740-ladsgroup.json
  • 03:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1251 (T402925)', diff saved to https://phabricator.wikimedia.org/P82833 and previous config saved to /var/cache/conftool/dbconfig/20250909-033605-ladsgroup.json
  • 03:35 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1251.eqiad.wmnet with reason: Maintenance
  • 03:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P82832 and previous config saved to /var/cache/conftool/dbconfig/20250909-033405-ladsgroup.json
  • 03:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P82831 and previous config saved to /var/cache/conftool/dbconfig/20250909-031858-ladsgroup.json
  • 03:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2216 (T402925)', diff saved to https://phabricator.wikimedia.org/P82830 and previous config saved to /var/cache/conftool/dbconfig/20250909-031731-ladsgroup.json
  • 03:17 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 03:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T402763)', diff saved to https://phabricator.wikimedia.org/P82829 and previous config saved to /var/cache/conftool/dbconfig/20250909-030350-ladsgroup.json
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.18 refs T396379
  • 02:58 eileen: civicrm upgraded from fd5ed1ac to e39c8557
  • 02:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1221 (T402763)', diff saved to https://phabricator.wikimedia.org/P82828 and previous config saved to /var/cache/conftool/dbconfig/20250909-024303-ladsgroup.json
  • 02:42 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 02:42 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 02:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T402763)', diff saved to https://phabricator.wikimedia.org/P82827 and previous config saved to /var/cache/conftool/dbconfig/20250909-024221-ladsgroup.json
  • 02:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T402925)', diff saved to https://phabricator.wikimedia.org/P82826 and previous config saved to /var/cache/conftool/dbconfig/20250909-023146-ladsgroup.json
  • 02:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P82825 and previous config saved to /var/cache/conftool/dbconfig/20250909-022713-ladsgroup.json
  • 02:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P82824 and previous config saved to /var/cache/conftool/dbconfig/20250909-021206-ladsgroup.json
  • 01:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2203 (T402925)', diff saved to https://phabricator.wikimedia.org/P82823 and previous config saved to /var/cache/conftool/dbconfig/20250909-015943-ladsgroup.json
  • 01:59 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 01:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T402763)', diff saved to https://phabricator.wikimedia.org/P82822 and previous config saved to /var/cache/conftool/dbconfig/20250909-015658-ladsgroup.json
  • 01:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1199 (T402763)', diff saved to https://phabricator.wikimedia.org/P82821 and previous config saved to /var/cache/conftool/dbconfig/20250909-013521-ladsgroup.json
  • 01:35 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 01:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T402763)', diff saved to https://phabricator.wikimedia.org/P82820 and previous config saved to /var/cache/conftool/dbconfig/20250909-013458-ladsgroup.json
  • 01:30 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 01:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T402925)', diff saved to https://phabricator.wikimedia.org/P82819 and previous config saved to /var/cache/conftool/dbconfig/20250909-013000-ladsgroup.json
  • 01:21 krinkle@deploy1003: Sync cancelled.
  • 01:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P82818 and previous config saved to /var/cache/conftool/dbconfig/20250909-011950-ladsgroup.json
  • 01:19 krinkle@deploy1003: krinkle: Backport for tests: Add test for wmfApplyEtcdDBConfig() synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P82817 and previous config saved to /var/cache/conftool/dbconfig/20250909-011452-ladsgroup.json
  • 01:13 krinkle@deploy1003: Started scap sync-world: Backport for tests: Add test for wmfApplyEtcdDBConfig()
  • 01:13 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 12m 10s)
  • 01:08 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1158 gradually with 4 steps - Maint over
  • 01:04 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P82815 and previous config saved to /var/cache/conftool/dbconfig/20250909-010439-ladsgroup.json
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P82814 and previous config saved to /var/cache/conftool/dbconfig/20250909-005944-ladsgroup.json
  • 00:49 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T402763)', diff saved to https://phabricator.wikimedia.org/P82812 and previous config saved to /var/cache/conftool/dbconfig/20250909-004931-ladsgroup.json
  • 00:48 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 00:47 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 00:44 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T402925)', diff saved to https://phabricator.wikimedia.org/P82811 and previous config saved to /var/cache/conftool/dbconfig/20250909-004437-ladsgroup.json
  • 00:42 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 00:39 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1223 gradually with 4 steps - Maint over
  • 00:38 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 00:31 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 00:31 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 00:31 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 00:31 eileen: civicrm upgraded from 1ec5de94 to fd5ed1ac
  • 00:30 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 00:29 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
  • 00:29 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 00:29 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 00:28 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1190 (T402763)', diff saved to https://phabricator.wikimedia.org/P82808 and previous config saved to /var/cache/conftool/dbconfig/20250909-002833-ladsgroup.json
  • 00:28 eileen: civicrm upgraded from 1ec5de94 to fd5ed1ac
  • 00:28 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 00:26 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
  • 00:23 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1158 gradually with 4 steps - Maint over
  • 00:15 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 00:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T402925)', diff saved to https://phabricator.wikimedia.org/P82804 and previous config saved to /var/cache/conftool/dbconfig/20250909-001522-ladsgroup.json
  • 00:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2188 (T402925)', diff saved to https://phabricator.wikimedia.org/P82803 and previous config saved to /var/cache/conftool/dbconfig/20250909-001457-ladsgroup.json
  • 00:14 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 00:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T402925)', diff saved to https://phabricator.wikimedia.org/P82802 and previous config saved to /var/cache/conftool/dbconfig/20250909-001434-ladsgroup.json
  • 00:11 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1158.eqiad.wmnet with reason: Upgrade to 10.11
  • 00:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Upgrade db1158 to MariaDB 10.11 (T399955)', diff saved to https://phabricator.wikimedia.org/P82801 and previous config saved to /var/cache/conftool/dbconfig/20250909-001050-ladsgroup.json
  • 00:07 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 00:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P82799 and previous config saved to /var/cache/conftool/dbconfig/20250909-000014-ladsgroup.json

2025-09-08

  • 23:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P82798 and previous config saved to /var/cache/conftool/dbconfig/20250908-235927-ladsgroup.json
  • 23:53 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1223 gradually with 4 steps - Maint over
  • 23:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P82796 and previous config saved to /var/cache/conftool/dbconfig/20250908-234506-ladsgroup.json
  • 23:44 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P82795 and previous config saved to /var/cache/conftool/dbconfig/20250908-234419-ladsgroup.json
  • 23:33 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1223.eqiad.wmnet with reason: Upgrade to 10.11
  • 23:31 rzl: mw-debug$ helmfile -e eqiad -i apply --set mesh.image_name=envoy-future --set mesh.image_version=1.29.12-1 --context=5 # T403663
  • 23:30 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 23:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Upgrade db1223 to MariaDB 10.11 (T399548)', diff saved to https://phabricator.wikimedia.org/P82794 and previous config saved to /var/cache/conftool/dbconfig/20250908-233042-ladsgroup.json
  • 23:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T402925)', diff saved to https://phabricator.wikimedia.org/P82793 and previous config saved to /var/cache/conftool/dbconfig/20250908-232958-ladsgroup.json
  • 23:29 ladsgroup@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1223 gradually with 4 steps - Maint over
  • 23:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T402925)', diff saved to https://phabricator.wikimedia.org/P82791 and previous config saved to /var/cache/conftool/dbconfig/20250908-232912-ladsgroup.json
  • 23:28 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 23:21 jdlrobson@deploy1003: Finished scap sync-world: Backport for Cleanup: Simplify configuration for wgSpecialContributeSkinsEnabled, Temporarily use production for summary endpoint (T400694) (duration: 16m 06s)
  • 23:16 jdlrobson@deploy1003: jdlrobson: Continuing with sync
  • 23:11 jdlrobson@deploy1003: jdlrobson: Backport for Cleanup: Simplify configuration for wgSpecialContributeSkinsEnabled, Temporarily use production for summary endpoint (T400694) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:10 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1223 gradually with 4 steps - Maint over
  • 23:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1223.eqiad.wmnet
  • 23:08 eileen: civicrm upgraded from c7ebd726 to 1ec5de94
  • 23:05 jdlrobson@deploy1003: Started scap sync-world: Backport for Cleanup: Simplify configuration for wgSpecialContributeSkinsEnabled, Temporarily use production for summary endpoint (T400694)
  • 23:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1223 - Upgrading db1223.eqiad.wmnet
  • 23:02 ladsgroup@cumin1002: START - Cookbook sre.mysql.depool db1223 - Upgrading db1223.eqiad.wmnet
  • 23:02 ladsgroup@cumin1002: START - Cookbook sre.mysql.upgrade for db1223.eqiad.wmnet
  • 22:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1223 T404025', diff saved to https://phabricator.wikimedia.org/P82789 and previous config saved to /var/cache/conftool/dbconfig/20250908-225603-ladsgroup.json
  • 22:54 ladsgroup@dns1004: END - running authdns-update
  • 22:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2176 (T402925)', diff saved to https://phabricator.wikimedia.org/P82788 and previous config saved to /var/cache/conftool/dbconfig/20250908-225313-ladsgroup.json
  • 22:53 ladsgroup@dns1004: START - running authdns-update
  • 22:53 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 22:52 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T402925)', diff saved to https://phabricator.wikimedia.org/P82787 and previous config saved to /var/cache/conftool/dbconfig/20250908-225250-ladsgroup.json
  • 22:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Promote db1189 to s3 primary and set section read-write T404025', diff saved to https://phabricator.wikimedia.org/P82786 and previous config saved to /var/cache/conftool/dbconfig/20250908-225054-ladsgroup.json
  • 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T404025', diff saved to https://phabricator.wikimedia.org/P82785 and previous config saved to /var/cache/conftool/dbconfig/20250908-224914-ladsgroup.json
  • 22:48 Amir1: Starting s3 eqiad failover from db1223 to db1189 - T404025
  • 22:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set db1189 with weight 0 T404025', diff saved to https://phabricator.wikimedia.org/P82784 and previous config saved to /var/cache/conftool/dbconfig/20250908-224330-ladsgroup.json
  • 22:42 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 T404025
  • 22:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P82783 and previous config saved to /var/cache/conftool/dbconfig/20250908-223742-ladsgroup.json
  • 22:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1235 (T402925)', diff saved to https://phabricator.wikimedia.org/P82782 and previous config saved to /var/cache/conftool/dbconfig/20250908-223528-ladsgroup.json
  • 22:35 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 22:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T402925)', diff saved to https://phabricator.wikimedia.org/P82781 and previous config saved to /var/cache/conftool/dbconfig/20250908-223504-ladsgroup.json
  • 22:23 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 22:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P82780 and previous config saved to /var/cache/conftool/dbconfig/20250908-222235-ladsgroup.json
  • 22:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P82779 and previous config saved to /var/cache/conftool/dbconfig/20250908-221956-ladsgroup.json
  • 22:07 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 22:07 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T402925)', diff saved to https://phabricator.wikimedia.org/P82778 and previous config saved to /var/cache/conftool/dbconfig/20250908-220728-ladsgroup.json
  • 22:04 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P82777 and previous config saved to /var/cache/conftool/dbconfig/20250908-220449-ladsgroup.json
  • 21:58 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:57 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:52 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:51 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:49 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T402925)', diff saved to https://phabricator.wikimedia.org/P82776 and previous config saved to /var/cache/conftool/dbconfig/20250908-214941-ladsgroup.json
  • 21:46 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:45 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:43 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:42 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:38 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:38 maryum: Deployed security fix for T403408
  • 21:37 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2174 (T402925)', diff saved to https://phabricator.wikimedia.org/P82775 and previous config saved to /var/cache/conftool/dbconfig/20250908-213103-ladsgroup.json
  • 21:30 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 21:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T402925)', diff saved to https://phabricator.wikimedia.org/P82774 and previous config saved to /var/cache/conftool/dbconfig/20250908-213040-ladsgroup.json
  • 21:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:23 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:21 catrope@deploy1003: Finished scap sync-world: Backport for Fix display of Codex message icons II (T401457) (duration: 16m 46s)
  • 21:20 bking@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for wdqs2025.codfw.wmnet: Renew puppet certificate - bking@cumin1002
  • 21:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1234 (T402925)', diff saved to https://phabricator.wikimedia.org/P82773 and previous config saved to /var/cache/conftool/dbconfig/20250908-211902-ladsgroup.json
  • 21:18 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 21:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T402925)', diff saved to https://phabricator.wikimedia.org/P82772 and previous config saved to /var/cache/conftool/dbconfig/20250908-211840-ladsgroup.json
  • 21:15 catrope@deploy1003: catrope: Continuing with sync
  • 21:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P82771 and previous config saved to /var/cache/conftool/dbconfig/20250908-211532-ladsgroup.json
  • 21:15 RoanKattouw: Ran updateCollation.php on lbwiki for T402083
  • 21:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:10 catrope@deploy1003: catrope: Backport for Fix display of Codex message icons II (T401457) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:04 catrope@deploy1003: Started scap sync-world: Backport for Fix display of Codex message icons II (T401457)
  • 21:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P82770 and previous config saved to /var/cache/conftool/dbconfig/20250908-210332-ladsgroup.json
  • 21:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:01 damilare: donorportal upgraded from cb76e2b7 to bd6de034,
  • 21:01 catrope@deploy1003: Finished scap sync-world: Backport for [lbwiki] Change to 'uca-lb-u-kn' category collation (T402083) (duration: 12m 11s)
  • 21:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P82769 and previous config saved to /var/cache/conftool/dbconfig/20250908-210024-ladsgroup.json
  • 20:55 catrope@deploy1003: catrope, superpes: Continuing with sync
  • 20:55 catrope@deploy1003: catrope, superpes: Backport for [lbwiki] Change to 'uca-lb-u-kn' category collation (T402083) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:49 catrope@deploy1003: Started scap sync-world: Backport for [lbwiki] Change to 'uca-lb-u-kn' category collation (T402083)
  • 20:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P82768 and previous config saved to /var/cache/conftool/dbconfig/20250908-204824-ladsgroup.json
  • 20:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T402925)', diff saved to https://phabricator.wikimedia.org/P82767 and previous config saved to /var/cache/conftool/dbconfig/20250908-204517-ladsgroup.json
  • 20:34 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 20:33 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 20:33 mstyles@deploy1003: Finished scap sync-world: Backport for OATHAuth: Enable 2FA opt-in for 10% of users (T400579) (duration: 13m 28s)
  • 20:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T402925)', diff saved to https://phabricator.wikimedia.org/P82766 and previous config saved to /var/cache/conftool/dbconfig/20250908-203317-ladsgroup.json
  • 20:27 mstyles@deploy1003: mstyles: Continuing with sync
  • 20:26 mstyles@deploy1003: mstyles: Backport for OATHAuth: Enable 2FA opt-in for 10% of users (T400579) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:24 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:24 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:23 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 20:23 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 20:19 mstyles@deploy1003: Started scap sync-world: Backport for OATHAuth: Enable 2FA opt-in for 10% of users (T400579)
  • 20:18 kemayo@deploy1003: Finished scap sync-world: Backport for Enable DT thanks at mediawikiwiki (T400849), Update VE core submodule to master (a5bd08c8b) (T302413 T391521 T397145 T401890 T402392 T397518 T402717 T403741 T403745) (duration: 13m 05s)
  • 20:12 kemayo@deploy1003: kemayo, esanders: Continuing with sync
  • 20:11 kemayo@deploy1003: kemayo, esanders: Backport for Enable DT thanks at mediawikiwiki (T400849), Update VE core submodule to master (a5bd08c8b) (T302413 T391521 T397145 T401890 T402392 T397518 T402717 T403741 T403745) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2173 (T402925)', diff saved to https://phabricator.wikimedia.org/P82765 and previous config saved to /var/cache/conftool/dbconfig/20250908-200934-ladsgroup.json
  • 20:09 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 20:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T402925)', diff saved to https://phabricator.wikimedia.org/P82764 and previous config saved to /var/cache/conftool/dbconfig/20250908-200910-ladsgroup.json
  • 20:05 kemayo@deploy1003: Started scap sync-world: Backport for Enable DT thanks at mediawikiwiki (T400849), Update VE core submodule to master (a5bd08c8b) (T302413 T391521 T397145 T401890 T402392 T397518 T402717 T403741 T403745)
  • 20:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1232 (T402925)', diff saved to https://phabricator.wikimedia.org/P82763 and previous config saved to /var/cache/conftool/dbconfig/20250908-200047-ladsgroup.json
  • 20:00 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 20:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T402925)', diff saved to https://phabricator.wikimedia.org/P82762 and previous config saved to /var/cache/conftool/dbconfig/20250908-200024-ladsgroup.json
  • 19:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P82761 and previous config saved to /var/cache/conftool/dbconfig/20250908-195402-ladsgroup.json
  • 19:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P82760 and previous config saved to /var/cache/conftool/dbconfig/20250908-194516-ladsgroup.json
  • 19:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P82759 and previous config saved to /var/cache/conftool/dbconfig/20250908-193855-ladsgroup.json
  • 19:37 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 19:37 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 19:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P82758 and previous config saved to /var/cache/conftool/dbconfig/20250908-193009-ladsgroup.json
  • 19:26 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 19:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T402925)', diff saved to https://phabricator.wikimedia.org/P82757 and previous config saved to /var/cache/conftool/dbconfig/20250908-192347-ladsgroup.json
  • 19:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T402925)', diff saved to https://phabricator.wikimedia.org/P82756 and previous config saved to /var/cache/conftool/dbconfig/20250908-191501-ladsgroup.json
  • 19:06 jgleeson: payments-wiki upgraded from 973d0a66 to bd6de034
  • 19:03 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1056.eqiad.wmnet with OS bookworm
  • 19:00 hashar@deploy1003: Finished deploy [integration/docroot@f89c693]: build: Updating mediawiki/mediawiki-codesniffer to 48.0.0 (duration: 00m 13s)
  • 19:00 hashar@deploy1003: Started deploy [integration/docroot@f89c693]: build: Updating mediawiki/mediawiki-codesniffer to 48.0.0
  • 18:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2170 (T402925)', diff saved to https://phabricator.wikimedia.org/P82755 and previous config saved to /var/cache/conftool/dbconfig/20250908-184718-ladsgroup.json
  • 18:47 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 18:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T402925)', diff saved to https://phabricator.wikimedia.org/P82754 and previous config saved to /var/cache/conftool/dbconfig/20250908-184655-ladsgroup.json
  • 18:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1219 (T402925)', diff saved to https://phabricator.wikimedia.org/P82753 and previous config saved to /var/cache/conftool/dbconfig/20250908-184517-ladsgroup.json
  • 18:45 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T402925)', diff saved to https://phabricator.wikimedia.org/P82752 and previous config saved to /var/cache/conftool/dbconfig/20250908-184454-ladsgroup.json
  • 18:44 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1004.eqiad.wmnet with OS bookworm
  • 18:36 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 18:35 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 18:35 rzl@deploy1003: Finished scap sync-world: https://gerrit.wikimedia.org/r/1184893 and https://gerrit.wikimedia.org/r/1185984 (duration: 14m 34s)
  • 18:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P82751 and previous config saved to /var/cache/conftool/dbconfig/20250908-183147-ladsgroup.json
  • 18:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P82750 and previous config saved to /var/cache/conftool/dbconfig/20250908-182946-ladsgroup.json
  • 18:29 rzl@deploy1003: Started scap sync-world: https://gerrit.wikimedia.org/r/1184893 and https://gerrit.wikimedia.org/r/1185984
  • 18:26 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1056.eqiad.wmnet with OS bookworm
  • 18:25 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
  • 18:22 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1056.eqiad.wmnet with OS bookworm
  • 18:21 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
  • 18:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P82749 and previous config saved to /var/cache/conftool/dbconfig/20250908-181640-ladsgroup.json
  • 18:16 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 18:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P82748 and previous config saved to /var/cache/conftool/dbconfig/20250908-181439-ladsgroup.json
  • 18:11 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 18:10 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 18:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1054.eqiad.wmnet with OS bookworm
  • 18:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 18:05 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 18:05 rzl@deploy1003: helmfile [staging-codfw] DONE helmfile.d/services/mw-debug: apply
  • 18:04 rzl@deploy1003: helmfile [staging-codfw] START helmfile.d/services/mw-debug: apply
  • 18:04 rzl@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
  • 18:04 rzl@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
  • 18:02 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 18:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T402925)', diff saved to https://phabricator.wikimedia.org/P82747 and previous config saved to /var/cache/conftool/dbconfig/20250908-180131-ladsgroup.json
  • 17:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T402925)', diff saved to https://phabricator.wikimedia.org/P82746 and previous config saved to /var/cache/conftool/dbconfig/20250908-175931-ladsgroup.json
  • 17:57 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:57 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove frdata2001 and frmx2001 - sukhe@cumin1003"
  • 17:57 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove frdata2001 and frmx2001 - sukhe@cumin1003"
  • 17:53 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 17:51 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bookworm
  • 17:51 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 17:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1054.eqiad.wmnet with reason: host reimage
  • 17:38 swfrench-wmf: updated all shellbox services to 2025-08-29-172844 (+ envoy 1.26.8-1) in eqiad - T403284
  • 17:37 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 17:36 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es1054.eqiad.wmnet with reason: host reimage
  • 17:36 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 17:36 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 17:35 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:35 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1056.eqiad.wmnet with OS bookworm
  • 17:35 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephmon1004.eqiad.wmnet with OS bookworm
  • 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:34 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 17:34 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 17:33 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 17:32 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 17:32 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 17:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1056.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:28 denisse: Upgrade envoyproxy on graphite hosts - T402584
  • 17:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host es1056.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1218 (T402925)', diff saved to https://phabricator.wikimedia.org/P82745 and previous config saved to /var/cache/conftool/dbconfig/20250908-172706-ladsgroup.json
  • 17:27 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2153 (T402925)', diff saved to https://phabricator.wikimedia.org/P82744 and previous config saved to /var/cache/conftool/dbconfig/20250908-172657-ladsgroup.json
  • 17:26 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T402925)', diff saved to https://phabricator.wikimedia.org/P82743 and previous config saved to /var/cache/conftool/dbconfig/20250908-172644-ladsgroup.json
  • 17:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T402925)', diff saved to https://phabricator.wikimedia.org/P82742 and previous config saved to /var/cache/conftool/dbconfig/20250908-172634-ladsgroup.json
  • 17:26 denisse: Upgrade envoyproxy on titan hosts - T402584
  • 17:26 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1056.eqiad.wmnet with OS bookworm
  • 17:23 denisse: Upgrade envoyproxy on titan1001 - T402584
  • 17:23 swfrench-wmf: updated all shellbox services to 2025-08-29-172844 (+ envoy 1.26.8-1) in codfw - T403284
  • 17:22 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 17:21 denisse: Upgrade envoyproxy on prometheus::pop hosts - T402584
  • 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 17:20 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 17:20 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 17:19 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:19 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 17:17 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 17:17 denisse: Upgrade envoyproxy on prometheus[1006-1008] and [2005-2008] - T402584
  • 17:17 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 17:16 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 17:14 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bookworm
  • 17:13 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 17:12 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 17:11 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 17:11 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 17:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P82741 and previous config saved to /var/cache/conftool/dbconfig/20250908-171136-ladsgroup.json
  • 17:11 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 17:11 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P82740 and previous config saved to /var/cache/conftool/dbconfig/20250908-171126-ladsgroup.json
  • 17:11 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 17:11 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 17:10 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 17:10 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 17:10 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 17:10 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 17:10 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1054.eqiad.wmnet with OS bookworm
  • 17:10 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 17:09 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1054.eqiad.wmnet with OS bookworm
  • 16:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P82739 and previous config saved to /var/cache/conftool/dbconfig/20250908-165629-ladsgroup.json
  • 16:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P82738 and previous config saved to /var/cache/conftool/dbconfig/20250908-165619-ladsgroup.json
  • 16:46 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 16:46 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 16:43 denisse: Upgrade envoyproxy on prometheus1005 - T402584
  • 16:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T402925)', diff saved to https://phabricator.wikimedia.org/P82737 and previous config saved to /var/cache/conftool/dbconfig/20250908-164121-ladsgroup.json
  • 16:41 denisse: Upgrade envoyproxy on grafana1002 - T402584
  • 16:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T402925)', diff saved to https://phabricator.wikimedia.org/P82736 and previous config saved to /var/cache/conftool/dbconfig/20250908-164111-ladsgroup.json
  • 16:40 denisse: Upgrade envoyproxy on grafana2001 - T402584
  • 16:28 larssandergreen: Updating civicrm from d4a2ed6e to c7ebd726
  • 16:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1056.eqiad.wmnet with OS bookworm
  • 16:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1056.eqiad.wmnet with OS bookworm
  • 16:22 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1054.eqiad.wmnet with OS bookworm
  • 16:21 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 16:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1057.eqiad.wmnet with OS bookworm
  • 16:16 vriley@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 16:12 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
  • 16:12 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
  • 16:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1206 (T402925)', diff saved to https://phabricator.wikimedia.org/P82735 and previous config saved to /var/cache/conftool/dbconfig/20250908-161212-ladsgroup.json
  • 16:12 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 16:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T402925)', diff saved to https://phabricator.wikimedia.org/P82734 and previous config saved to /var/cache/conftool/dbconfig/20250908-161149-ladsgroup.json
  • 16:06 jdrewniak@deploy1003: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 00s)
  • 16:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2146 (T402925)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250908-160512-ladsgroup.json
  • 16:05 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 16:04 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T402925)', diff saved to https://phabricator.wikimedia.org/P82733 and previous config saved to /var/cache/conftool/dbconfig/20250908-160449-ladsgroup.json
  • 16:04 jdrewniak@deploy1003: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 15m 14s)
  • 15:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P82732 and previous config saved to /var/cache/conftool/dbconfig/20250908-155641-ladsgroup.json
  • 15:49 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P82731 and previous config saved to /var/cache/conftool/dbconfig/20250908-154942-ladsgroup.json
  • 15:49 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 15:48 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 15:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P82730 and previous config saved to /var/cache/conftool/dbconfig/20250908-154134-ladsgroup.json
  • 15:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P82729 and previous config saved to /var/cache/conftool/dbconfig/20250908-153434-ladsgroup.json
  • 15:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T402925)', diff saved to https://phabricator.wikimedia.org/P82728 and previous config saved to /var/cache/conftool/dbconfig/20250908-152626-ladsgroup.json
  • 15:20 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 15:20 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 15:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T402925)', diff saved to https://phabricator.wikimedia.org/P82727 and previous config saved to /var/cache/conftool/dbconfig/20250908-151926-ladsgroup.json
  • 15:15 btullis@cumin1003: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P{cephosd100*.eqiad.wmnet} and (A:cephosd)
  • 14:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 14:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 14:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T401906)', diff saved to https://phabricator.wikimedia.org/P82726 and previous config saved to /var/cache/conftool/dbconfig/20250908-145454-fceratto.json
  • 14:52 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1196 (T402925)', diff saved to https://phabricator.wikimedia.org/P82725 and previous config saved to /var/cache/conftool/dbconfig/20250908-145215-ladsgroup.json
  • 14:52 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T402925)', diff saved to https://phabricator.wikimedia.org/P82724 and previous config saved to /var/cache/conftool/dbconfig/20250908-145144-ladsgroup.json
  • 14:47 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1014.eqiad.wmnet with OS bookworm
  • 14:46 urbanecm@deploy1003: Finished scap sync-world: Backport for Disable User Agent collection for MinT for Readers streams (T398057), hawiki: remove temporary logo files (T376049) (duration: 45m 20s)
  • 14:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2145 (T402925)', diff saved to https://phabricator.wikimedia.org/P82723 and previous config saved to /var/cache/conftool/dbconfig/20250908-144309-ladsgroup.json
  • 14:43 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 14:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P82722 and previous config saved to /var/cache/conftool/dbconfig/20250908-143947-fceratto.json
  • 14:39 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 14:39 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 14:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P82721 and previous config saved to /var/cache/conftool/dbconfig/20250908-143637-ladsgroup.json
  • 14:33 urbanecm@deploy1003: kcvelaga, urbanecm, anzx: Continuing with sync
  • 14:32 urbanecm@deploy1003: kcvelaga, urbanecm, anzx: Backport for Disable User Agent collection for MinT for Readers streams (T398057), hawiki: remove temporary logo files (T376049) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1233.eqiad.wmnet with OS bullseye
  • 14:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P82720 and previous config saved to /var/cache/conftool/dbconfig/20250908-142439-fceratto.json
  • 14:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P82719 and previous config saved to /var/cache/conftool/dbconfig/20250908-142129-ladsgroup.json
  • 14:19 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host durum1002.eqiad.wmnet
  • 14:15 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 14:15 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host durum1002.eqiad.wmnet
  • 14:13 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1233.eqiad.wmnet with reason: host reimage
  • 14:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T401906)', diff saved to https://phabricator.wikimedia.org/P82718 and previous config saved to /var/cache/conftool/dbconfig/20250908-140932-fceratto.json
  • 14:09 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:08 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1233.eqiad.wmnet with reason: host reimage
  • 14:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T401906)', diff saved to https://phabricator.wikimedia.org/P82717 and previous config saved to /var/cache/conftool/dbconfig/20250908-140705-fceratto.json
  • 14:06 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 14:06 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 14:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T402925)', diff saved to https://phabricator.wikimedia.org/P82716 and previous config saved to /var/cache/conftool/dbconfig/20250908-140622-ladsgroup.json
  • 14:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T401906)', diff saved to https://phabricator.wikimedia.org/P82715 and previous config saved to /var/cache/conftool/dbconfig/20250908-140607-fceratto.json
  • 14:00 urbanecm@deploy1003: Started scap sync-world: Backport for Disable User Agent collection for MinT for Readers streams (T398057), hawiki: remove temporary logo files (T376049)
  • 13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P82714 and previous config saved to /var/cache/conftool/dbconfig/20250908-135100-fceratto.json
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh3004.wikimedia.org
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh3004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1233.eqiad.wmnet with OS bullseye
  • 13:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1014.eqiad.wmnet with OS bookworm
  • 13:39 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts doh3004.wikimedia.org
  • 13:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P82713 and previous config saved to /var/cache/conftool/dbconfig/20250908-133552-fceratto.json
  • 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum3004.esams.wmnet
  • 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum3004.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1233.eqiad.wmnet with OS bullseye
  • 13:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum3004.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1195 (T402925)', diff saved to https://phabricator.wikimedia.org/P82712 and previous config saved to /var/cache/conftool/dbconfig/20250908-133021-ladsgroup.json
  • 13:30 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T402925)', diff saved to https://phabricator.wikimedia.org/P82711 and previous config saved to /var/cache/conftool/dbconfig/20250908-132958-ladsgroup.json
  • 13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum3004.esams.wmnet
  • 13:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T401906)', diff saved to https://phabricator.wikimedia.org/P82710 and previous config saved to /var/cache/conftool/dbconfig/20250908-132044-fceratto.json
  • 13:20 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host durum4001.ulsfo.wmnet
  • 13:19 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host durum2001.codfw.wmnet
  • 13:19 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host durum1001.eqiad.wmnet
  • 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T401906)', diff saved to https://phabricator.wikimedia.org/P82709 and previous config saved to /var/cache/conftool/dbconfig/20250908-131818-fceratto.json
  • 13:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 13:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T401906)', diff saved to https://phabricator.wikimedia.org/P82708 and previous config saved to /var/cache/conftool/dbconfig/20250908-131755-fceratto.json
  • 13:15 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host durum4001.ulsfo.wmnet
  • 13:15 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host durum2001.codfw.wmnet
  • 13:15 sukhe@cumin1003: START - Cookbook sre.hosts.reboot-single for host durum1001.eqiad.wmnet
  • 13:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P82707 and previous config saved to /var/cache/conftool/dbconfig/20250908-131451-ladsgroup.json
  • 13:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P82706 and previous config saved to /var/cache/conftool/dbconfig/20250908-130247-fceratto.json
  • 12:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P82705 and previous config saved to /var/cache/conftool/dbconfig/20250908-125943-ladsgroup.json
  • 12:51 btullis@cumin1003: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P{cephosd100*.eqiad.wmnet} and (A:cephosd)
  • 12:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P82703 and previous config saved to /var/cache/conftool/dbconfig/20250908-124739-fceratto.json
  • 12:44 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T402925)', diff saved to https://phabricator.wikimedia.org/P82702 and previous config saved to /var/cache/conftool/dbconfig/20250908-124436-ladsgroup.json
  • 12:43 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:42 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh3005.wikimedia.org to drbd
  • 12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T401906)', diff saved to https://phabricator.wikimedia.org/P82701 and previous config saved to /var/cache/conftool/dbconfig/20250908-123232-fceratto.json
  • 12:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh3005.wikimedia.org to drbd
  • 12:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T401906)', diff saved to https://phabricator.wikimedia.org/P82700 and previous config saved to /var/cache/conftool/dbconfig/20250908-123007-fceratto.json
  • 12:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 12:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T401906)', diff saved to https://phabricator.wikimedia.org/P82699 and previous config saved to /var/cache/conftool/dbconfig/20250908-122952-fceratto.json
  • 12:16 btullis@cumin1003: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
  • 12:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P82698 and previous config saved to /var/cache/conftool/dbconfig/20250908-121444-fceratto.json
  • 12:12 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:12 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1186 (T402925)', diff saved to https://phabricator.wikimedia.org/P82697 and previous config saved to /var/cache/conftool/dbconfig/20250908-121000-ladsgroup.json
  • 12:09 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T402925)', diff saved to https://phabricator.wikimedia.org/P82696 and previous config saved to /var/cache/conftool/dbconfig/20250908-120937-ladsgroup.json
  • 11:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P82695 and previous config saved to /var/cache/conftool/dbconfig/20250908-115937-fceratto.json
  • 11:57 btullis@cumin1003: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch
  • 11:55 moritzm: Upgrading trixie installer image to 13.1 T403815
  • 11:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P82694 and previous config saved to /var/cache/conftool/dbconfig/20250908-115429-ladsgroup.json
  • 11:49 btullis@cumin1003: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch
  • 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum3005.esams.wmnet to drbd
  • 11:45 btullis@cumin1003: START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
  • 11:44 topranks: restart netbox service on netbox-dev2003 (netbox-next) to update db from live server dump
  • 11:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T401906)', diff saved to https://phabricator.wikimedia.org/P82693 and previous config saved to /var/cache/conftool/dbconfig/20250908-114429-fceratto.json
  • 11:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1233.eqiad.wmnet with OS bullseye
  • 11:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1233.eqiad.wmnet with OS bullseye
  • 11:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P82692 and previous config saved to /var/cache/conftool/dbconfig/20250908-113922-ladsgroup.json
  • 11:35 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum3005.esams.wmnet to drbd
  • 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ncredir3004.esams.wmnet
  • 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir3004.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir3004.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:24 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:24 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T402925)', diff saved to https://phabricator.wikimedia.org/P82691 and previous config saved to /var/cache/conftool/dbconfig/20250908-112414-ladsgroup.json
  • 11:19 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ncredir3004.esams.wmnet
  • 11:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install3004.wikimedia.org
  • 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install3004.wikimedia.org with OS bookworm
  • 11:08 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:05 brouberol@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:04 brouberol@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:03 brouberol@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:01 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:58 brouberol@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:57 brouberol@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install3004.wikimedia.org with reason: host reimage
  • 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doh3006.wikimedia.org
  • 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host doh3006.wikimedia.org
  • 10:48 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install3004.wikimedia.org with reason: host reimage
  • 10:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1169 (T402925)', diff saved to https://phabricator.wikimedia.org/P82690 and previous config saved to /var/cache/conftool/dbconfig/20250908-104652-ladsgroup.json
  • 10:46 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 10:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T402925)', diff saved to https://phabricator.wikimedia.org/P82689 and previous config saved to /var/cache/conftool/dbconfig/20250908-104629-ladsgroup.json
  • 10:44 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T401906)', diff saved to https://phabricator.wikimedia.org/P82688 and previous config saved to /var/cache/conftool/dbconfig/20250908-104413-fceratto.json
  • 10:44 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 10:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T401906)', diff saved to https://phabricator.wikimedia.org/P82687 and previous config saved to /var/cache/conftool/dbconfig/20250908-104350-fceratto.json
  • 10:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2009.codfw.wmnet
  • 10:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe2009.codfw.wmnet
  • 10:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P82686 and previous config saved to /var/cache/conftool/dbconfig/20250908-103122-ladsgroup.json
  • 10:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P82685 and previous config saved to /var/cache/conftool/dbconfig/20250908-102842-fceratto.json
  • 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
  • 10:16 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Update wmf-plugin IBGP output - cmooney@cumin1003
  • 10:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P82684 and previous config saved to /var/cache/conftool/dbconfig/20250908-101614-ladsgroup.json
  • 10:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P82683 and previous config saved to /var/cache/conftool/dbconfig/20250908-101334-fceratto.json
  • 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install3004.wikimedia.org with OS bookworm
  • 10:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
  • 10:08 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netflow3003.esams.wmnet
  • 10:08 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:08 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow3003.esams.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1003"
  • 10:07 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow3003.esams.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1003"
  • 10:04 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1012.eqiad.wmnet
  • 10:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T402925)', diff saved to https://phabricator.wikimedia.org/P82682 and previous config saved to /var/cache/conftool/dbconfig/20250908-100107-ladsgroup.json
  • 09:59 ayounsi@cumin1003: START - Cookbook sre.hosts.decommission for hosts netflow3003.esams.wmnet
  • 09:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T401906)', diff saved to https://phabricator.wikimedia.org/P82681 and previous config saved to /var/cache/conftool/dbconfig/20250908-095826-fceratto.json
  • 09:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T401906)', diff saved to https://phabricator.wikimedia.org/P82680 and previous config saved to /var/cache/conftool/dbconfig/20250908-095602-fceratto.json
  • 09:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 09:55 ayounsi@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts netflow3003.esams.wmnet
  • 09:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T401906)', diff saved to https://phabricator.wikimedia.org/P82679 and previous config saved to /var/cache/conftool/dbconfig/20250908-095521-fceratto.json
  • 09:55 ayounsi@cumin1003: START - Cookbook sre.hosts.decommission for hosts netflow3003.esams.wmnet
  • 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve1012.eqiad.wmnet
  • 09:54 brouberol@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:52 brouberol@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:52 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1001.eqiad.wmnet
  • 09:51 brouberol@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1003.eqiad.wmnet
  • 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host durum3006.esams.wmnet
  • 09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1003.eqiad.wmnet
  • 09:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host durum3006.esams.wmnet
  • 09:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1001.eqiad.wmnet
  • 09:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P82678 and previous config saved to /var/cache/conftool/dbconfig/20250908-094013-fceratto.json
  • 09:40 brouberol@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P82677 and previous config saved to /var/cache/conftool/dbconfig/20250908-092506-fceratto.json
  • 09:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1163 (T402925)', diff saved to https://phabricator.wikimedia.org/P82676 and previous config saved to /var/cache/conftool/dbconfig/20250908-092311-ladsgroup.json
  • 09:23 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 09:18 Amir1: dropping all objectcache table everywhere (T397367)
  • 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1005.eqiad.wmnet
  • 09:11 kart_: Updated cxserver to 2025-09-08-084009-production (T403730)
  • 09:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1005.eqiad.wmnet
  • 09:10 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install3004.wikimedia.org - jmm@cumin2002"
  • 09:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install3004.wikimedia.org - jmm@cumin2002"
  • 09:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T401906)', diff saved to https://phabricator.wikimedia.org/P82675 and previous config saved to /var/cache/conftool/dbconfig/20250908-090958-fceratto.json
  • 09:09 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 09:09 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet
  • 09:09 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet
  • 09:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install3004.wikimedia.org on all recursors
  • 09:09 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install3004.wikimedia.org on all recursors
  • 09:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3004.wikimedia.org - jmm@cumin2002"
  • 09:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install3004.wikimedia.org - jmm@cumin2002"
  • 09:08 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 09:08 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 09:07 klausman@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1159 (T401906)', diff saved to https://phabricator.wikimedia.org/P82674 and previous config saved to /var/cache/conftool/dbconfig/20250908-090734-fceratto.json
  • 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2004.codfw.wmnet
  • 09:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 09:07 klausman@cumin1003: START - Cookbook sre.hosts.provision for host ml-lab1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:07 klausman@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ml-lab1002.eqiad.wmnet with reason: Maintenance work for T401964
  • 09:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2004.codfw.wmnet
  • 09:03 klausman@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1010.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:03 klausman@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1010.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:03 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:03 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install3004.wikimedia.org
  • 08:59 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 08:58 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 08:54 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:54 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:54 klausman@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:50 klausman@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:48 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:47 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:45 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:45 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:44 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:44 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:42 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 08:42 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:41 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:41 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 08:39 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
  • 08:39 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:38 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:38 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:38 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:38 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
  • 08:35 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 08:35 klausman@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:35 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
  • 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow3004.esams.wmnet
  • 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow3004.esams.wmnet with OS bookworm
  • 08:34 klausman@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:34 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:34 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:32 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 08:32 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 08:28 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 08:28 elukey@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
  • 08:28 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:28 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:27 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
  • 08:27 elukey@deploy1003: helmfile [staging] START helmfile.d/services/kartotherian: sync
  • 08:27 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1008.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 08:23 brouberol@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:22 brouberol@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow3004.esams.wmnet with reason: host reimage
  • 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2003.codfw.wmnet
  • 08:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow3004.esams.wmnet with reason: host reimage
  • 08:13 klausman@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1008.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 08:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2003.codfw.wmnet
  • 07:56 godog: finished rollout of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1184793 - puppet re-enabled on C:bird
  • 07:54 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host netflow3004.esams.wmnet with OS bookworm
  • 07:46 moritzm: upgrading Envoy on an-web, an-tool1007 (turnilo), an-tool1008 (yarn) T402584
  • 07:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow3004.esams.wmnet - jmm@cumin2002"
  • 07:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow3004.esams.wmnet - jmm@cumin2002"
  • 07:33 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow3004.esams.wmnet on all recursors
  • 07:33 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow3004.esams.wmnet on all recursors
  • 07:33 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow3004.esams.wmnet - jmm@cumin2002"
  • 07:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow3004.esams.wmnet - jmm@cumin2002"
  • 07:17 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:17 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host netflow3004.esams.wmnet
  • 07:09 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Fgoodwin out of all services on: 2421 hosts
  • 07:09 jmm@puppetserver1001: conftool action : set/pooled=no; selector: name=ncredir3004.esams.wmnet
  • 07:09 jmm@puppetserver1001: conftool action : set/pooled=yes; selector: name=ncredir3006.esams.wmnet
  • 07:08 jmm@puppetserver1001: conftool action : set/weight=1; selector: name=ncredir3006.esams.wmnet
  • 06:34 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 06:33 hashar@deploy1003: Finished deploy [integration/docroot@9830ef2]: Changes to CoveragePage to prepare the phase out of "skin" terminology - T402398 (duration: 00m 13s)
  • 06:33 hashar@deploy1003: Started deploy [integration/docroot@9830ef2]: Changes to CoveragePage to prepare the phase out of "skin" terminology - T402398
  • 06:18 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1057.eqiad.wmnet with reason: host reimage
  • 06:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1055.eqiad.wmnet with OS bookworm
  • 06:16 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 06:15 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 264927
  • 06:15 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 264927
  • 06:15 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es1057.eqiad.wmnet with reason: host reimage
  • 06:13 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 06:01 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1054.eqiad.wmnet with OS bookworm
  • 05:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1055.eqiad.wmnet with reason: host reimage
  • 05:49 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es1055.eqiad.wmnet with reason: host reimage
  • 05:47 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1057.eqiad.wmnet with OS bookworm
  • 05:36 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1056.eqiad.wmnet with OS bookworm
  • 05:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1054.eqiad.wmnet with OS bookworm
  • 05:32 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1054.eqiad.wmnet with OS bookworm
  • 05:31 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1057.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 05:23 kart_: Updated MinT to 2025-09-03-160715-production (T400562)
  • 05:22 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1055.eqiad.wmnet with OS bookworm
  • 05:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1053.eqiad.wmnet with OS bookworm
  • 05:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 05:16 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1056.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 05:15 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 05:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host es1057.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 05:08 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1057
  • 05:06 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es1057
  • 05:06 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:06 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1057 - vriley@cumin1003"
  • 05:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1057 - vriley@cumin1003"
  • 05:05 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1055.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 05:05 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 05:02 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 04:59 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 04:58 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1053.eqiad.wmnet with reason: host reimage
  • 04:56 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 04:54 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es1053.eqiad.wmnet with reason: host reimage
  • 04:53 vriley@cumin1003: START - Cookbook sre.hosts.provision for host es1056.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 04:52 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:52 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1056 - vriley@cumin1003"
  • 04:52 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1056 - vriley@cumin1003"
  • 04:50 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 04:48 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 04:46 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 04:43 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 04:42 vriley@cumin1003: START - Cookbook sre.hosts.provision for host es1055.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 04:38 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1055
  • 04:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es1055
  • 04:36 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:36 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1055 - vriley@cumin1003"
  • 04:36 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1055 - vriley@cumin1003"
  • 04:32 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1054.eqiad.wmnet with OS bookworm
  • 04:31 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 04:27 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1053.eqiad.wmnet with OS bookworm
  • 03:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:52 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:30 vriley@cumin1003: START - Cookbook sre.hosts.provision for host es1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:30 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1054
  • 03:29 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es1054
  • 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:28 vriley@cumin1003: START - Cookbook sre.hosts.provision for host es1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1053
  • 03:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es1053
  • 03:26 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:26 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1053 - vriley@cumin1003"
  • 03:26 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1053 - vriley@cumin1003"
  • 03:26 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 03:20 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 03:20 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 03:17 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 02:12 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1052.eqiad.wmnet with OS bookworm
  • 02:12 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 02:11 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 01:53 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1052.eqiad.wmnet with reason: host reimage
  • 01:50 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es1052.eqiad.wmnet with reason: host reimage
  • 01:18 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1052.eqiad.wmnet with OS bookworm
  • 01:13 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 12m 42s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1050.eqiad.wmnet with OS bookworm
  • 00:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1052.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:21 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1051.eqiad.wmnet with OS bookworm
  • 00:21 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 00:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 00:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host es1052.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:12 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1052
  • 00:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es1052
  • 00:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:10 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1052 - vriley@cumin1003"
  • 00:10 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1052 - vriley@cumin1003"
  • 00:07 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 00:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1051.eqiad.wmnet with reason: host reimage
  • 00:02 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
  • 00:00 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es1051.eqiad.wmnet with reason: host reimage

2025-09-07

  • 23:33 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1051.eqiad.wmnet with OS bookworm
  • 23:10 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:48 vriley@cumin1003: START - Cookbook sre.hosts.provision for host es1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:47 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1051
  • 22:46 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es1051
  • 22:45 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:45 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1051 - vriley@cumin1003"
  • 22:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1051 - vriley@cumin1003"
  • 22:41 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 44s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-09-06

  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 42s)
  • 01:07 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1050.eqiad.wmnet with OS bookworm
  • 01:07 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 01:06 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:43 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1050.eqiad.wmnet with reason: host reimage
  • 00:40 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es1050.eqiad.wmnet with reason: host reimage
  • 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1049.eqiad.wmnet with OS bookworm
  • 00:17 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 00:16 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 00:13 rzl: reprepro -C component/envoy-future include bullseye-wikimedia /home/rzl/envoyproxy/envoyproxy_1.29.12-1_amd64.changes
  • 00:07 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1050.eqiad.wmnet with OS bookworm
  • 00:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED

2025-09-05

  • 23:58 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1049.eqiad.wmnet with reason: host reimage
  • 23:53 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es1049.eqiad.wmnet with reason: host reimage
  • 23:40 vriley@cumin1003: START - Cookbook sre.hosts.provision for host es1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:39 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1050
  • 23:37 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es1050
  • 23:37 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:37 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1050 - vriley@cumin1003"
  • 23:37 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1050 - vriley@cumin1003"
  • 23:31 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 23:25 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host es1049.eqiad.wmnet with OS bookworm
  • 22:50 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1049.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host es1049.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:23 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1049
  • 22:22 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es1049
  • 22:22 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1049 - vriley@cumin1003"
  • 22:21 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1049 - vriley@cumin1003"
  • 22:18 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 22:13 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 22:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T402925)', diff saved to https://phabricator.wikimedia.org/P82667 and previous config saved to /var/cache/conftool/dbconfig/20250905-221244-ladsgroup.json
  • 21:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P82666 and previous config saved to /var/cache/conftool/dbconfig/20250905-215736-ladsgroup.json
  • 21:47 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1014.eqiad.wmnet with OS bookworm
  • 21:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P82665 and previous config saved to /var/cache/conftool/dbconfig/20250905-214229-ladsgroup.json
  • 21:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T402925)', diff saved to https://phabricator.wikimedia.org/P82664 and previous config saved to /var/cache/conftool/dbconfig/20250905-212721-ladsgroup.json
  • 21:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1014.eqiad.wmnet with OS bookworm
  • 20:59 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:48 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:46 jclark@cumin1002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:32 kemayo@deploy1003: Finished scap sync-world: Backport for Revert "Edit: Split footer lists into columns" (T401066 T403856) (duration: 15m 31s)
  • 20:24 kemayo@deploy1003: kemayo: Continuing with sync
  • 20:23 kemayo@deploy1003: kemayo: Backport for Revert "Edit: Split footer lists into columns" (T401066 T403856) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 20:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T401906)', diff saved to https://phabricator.wikimedia.org/P82663 and previous config saved to /var/cache/conftool/dbconfig/20250905-201818-fceratto.json
  • 20:17 kemayo@deploy1003: Started scap sync-world: Backport for Revert "Edit: Split footer lists into columns" (T401066 T403856)
  • 20:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P82662 and previous config saved to /var/cache/conftool/dbconfig/20250905-200311-fceratto.json
  • 19:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P82661 and previous config saved to /var/cache/conftool/dbconfig/20250905-194804-fceratto.json
  • 19:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T401906)', diff saved to https://phabricator.wikimedia.org/P82660 and previous config saved to /var/cache/conftool/dbconfig/20250905-193256-fceratto.json
  • 19:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T401906)', diff saved to https://phabricator.wikimedia.org/P82659 and previous config saved to /var/cache/conftool/dbconfig/20250905-193047-fceratto.json
  • 19:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 19:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 19:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T401906)', diff saved to https://phabricator.wikimedia.org/P82658 and previous config saved to /var/cache/conftool/dbconfig/20250905-193007-fceratto.json
  • 19:20 mutante: pooled ulsfo again - Lumen back up - Arelion still working
  • 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site ulsfo [reason: no reason specified, ]
  • 19:19 dzahn@cumin2002: START - Cookbook sre.dns.admin DNS admin: pool site ulsfo [reason: no reason specified, ]
  • 19:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P82657 and previous config saved to /var/cache/conftool/dbconfig/20250905-191500-fceratto.json
  • 18:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P82656 and previous config saved to /var/cache/conftool/dbconfig/20250905-185952-fceratto.json
  • 18:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T401906)', diff saved to https://phabricator.wikimedia.org/P82655 and previous config saved to /var/cache/conftool/dbconfig/20250905-184445-fceratto.json
  • 18:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2237 (T402925)', diff saved to https://phabricator.wikimedia.org/P82654 and previous config saved to /var/cache/conftool/dbconfig/20250905-184245-ladsgroup.json
  • 18:42 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 18:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T401906)', diff saved to https://phabricator.wikimedia.org/P82653 and previous config saved to /var/cache/conftool/dbconfig/20250905-184236-fceratto.json
  • 18:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T402925)', diff saved to https://phabricator.wikimedia.org/P82652 and previous config saved to /var/cache/conftool/dbconfig/20250905-184222-ladsgroup.json
  • 18:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T401906)', diff saved to https://phabricator.wikimedia.org/P82651 and previous config saved to /var/cache/conftool/dbconfig/20250905-184213-fceratto.json
  • 18:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P82650 and previous config saved to /var/cache/conftool/dbconfig/20250905-182715-ladsgroup.json
  • 18:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P82649 and previous config saved to /var/cache/conftool/dbconfig/20250905-182705-fceratto.json
  • 18:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P82648 and previous config saved to /var/cache/conftool/dbconfig/20250905-181207-ladsgroup.json
  • 18:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P82647 and previous config saved to /var/cache/conftool/dbconfig/20250905-181158-fceratto.json
  • 17:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T402925)', diff saved to https://phabricator.wikimedia.org/P82646 and previous config saved to /var/cache/conftool/dbconfig/20250905-175700-ladsgroup.json
  • 17:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T401906)', diff saved to https://phabricator.wikimedia.org/P82645 and previous config saved to /var/cache/conftool/dbconfig/20250905-175651-fceratto.json
  • 17:55 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T401906)', diff saved to https://phabricator.wikimedia.org/P82644 and previous config saved to /var/cache/conftool/dbconfig/20250905-175541-fceratto.json
  • 17:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 17:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T401906)', diff saved to https://phabricator.wikimedia.org/P82643 and previous config saved to /var/cache/conftool/dbconfig/20250905-175519-fceratto.json
  • 17:48 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 17:47 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 17:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P82642 and previous config saved to /var/cache/conftool/dbconfig/20250905-174011-fceratto.json
  • 17:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P82641 and previous config saved to /var/cache/conftool/dbconfig/20250905-172504-fceratto.json
  • 17:16 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on sretest2001.codfw.wmnet with reason: T383173
  • 17:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T401906)', diff saved to https://phabricator.wikimedia.org/P82640 and previous config saved to /var/cache/conftool/dbconfig/20250905-170956-fceratto.json
  • 17:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T401906)', diff saved to https://phabricator.wikimedia.org/P82639 and previous config saved to /var/cache/conftool/dbconfig/20250905-170747-fceratto.json
  • 17:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 17:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T401906)', diff saved to https://phabricator.wikimedia.org/P82638 and previous config saved to /var/cache/conftool/dbconfig/20250905-170725-fceratto.json
  • 16:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P82637 and previous config saved to /var/cache/conftool/dbconfig/20250905-165217-fceratto.json
  • 16:39 mutante: depooling ulsfo (fiber cut)
  • 16:38 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site ulsfo [reason: no reason specified, ]
  • 16:38 dzahn@cumin2002: START - Cookbook sre.dns.admin DNS admin: depool site ulsfo [reason: no reason specified, ]
  • 16:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P82636 and previous config saved to /var/cache/conftool/dbconfig/20250905-163709-fceratto.json
  • 16:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T401906)', diff saved to https://phabricator.wikimedia.org/P82635 and previous config saved to /var/cache/conftool/dbconfig/20250905-162202-fceratto.json
  • 16:19 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T401906)', diff saved to https://phabricator.wikimedia.org/P82634 and previous config saved to /var/cache/conftool/dbconfig/20250905-161952-fceratto.json
  • 16:19 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 16:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T401906)', diff saved to https://phabricator.wikimedia.org/P82633 and previous config saved to /var/cache/conftool/dbconfig/20250905-161929-fceratto.json
  • 16:16 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Update secure enclave API endpoint, hCaptcha: Fix secure enclave implementation (T378188) (duration: 61m 02s)
  • 16:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P82632 and previous config saved to /var/cache/conftool/dbconfig/20250905-160422-fceratto.json
  • 16:01 kharlan@deploy1003: dreamyjazz, kharlan: Continuing with sync
  • 16:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 16:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 15:59 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P82631 and previous config saved to /var/cache/conftool/dbconfig/20250905-154914-fceratto.json
  • 15:49 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:42 kharlan@deploy1003: dreamyjazz, kharlan: Backport for hCaptcha: Update secure enclave API endpoint, hCaptcha: Fix secure enclave implementation (T378188) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T401906)', diff saved to https://phabricator.wikimedia.org/P82630 and previous config saved to /var/cache/conftool/dbconfig/20250905-153407-fceratto.json
  • 15:31 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T401906)', diff saved to https://phabricator.wikimedia.org/P82629 and previous config saved to /var/cache/conftool/dbconfig/20250905-153157-fceratto.json
  • 15:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 15:21 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:21 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 15:15 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:15 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2236 (T402925)', diff saved to https://phabricator.wikimedia.org/P82628 and previous config saved to /var/cache/conftool/dbconfig/20250905-151508-ladsgroup.json
  • 15:15 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Update secure enclave API endpoint, hCaptcha: Fix secure enclave implementation (T378188)
  • 15:15 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2236.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T402925)', diff saved to https://phabricator.wikimedia.org/P82627 and previous config saved to /var/cache/conftool/dbconfig/20250905-151444-ladsgroup.json
  • 15:14 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:14 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:13 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2001.codfw.wmnet with reason: T383173
  • 15:06 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1052.eqiad.wmnet with OS bullseye
  • 14:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P82626 and previous config saved to /var/cache/conftool/dbconfig/20250905-145937-ladsgroup.json
  • 14:56 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:56 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:49 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:48 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P82625 and previous config saved to /var/cache/conftool/dbconfig/20250905-144552-fceratto.json
  • 14:44 krinkle@deploy1003: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on Test Wikidata, Wikitech, and Office Wiki (T401595), Disable wmgUseMdotRouting on mediawiki.org (T403510) (duration: 13m 40s)
  • 14:44 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P82624 and previous config saved to /var/cache/conftool/dbconfig/20250905-144429-ladsgroup.json
  • 14:42 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1052.eqiad.wmnet with reason: host reimage
  • 14:39 krinkle@deploy1003: krinkle: Continuing with sync
  • 14:37 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1052.eqiad.wmnet with reason: host reimage
  • 14:36 krinkle@deploy1003: krinkle: Backport for Disable wmgUseMdotRouting on Test Wikidata, Wikitech, and Office Wiki (T401595), Disable wmgUseMdotRouting on mediawiki.org (T403510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:35 sukhe: sudo cumin -b31 "A:cp-upload" "run-puppet-agent --enable 'merging CR 1184886-1184126-1184130'"
  • 14:30 krinkle@deploy1003: Started scap sync-world: Backport for Disable wmgUseMdotRouting on Test Wikidata, Wikitech, and Office Wiki (T401595), Disable wmgUseMdotRouting on mediawiki.org (T403510)
  • 14:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P82623 and previous config saved to /var/cache/conftool/dbconfig/20250905-143045-fceratto.json
  • 14:30 sukhe: sudo cumin -b31 "A:cp-text" "run-puppet-agent --enable 'merging CR 1184886-1184126-1184130'"
  • 14:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T402925)', diff saved to https://phabricator.wikimedia.org/P82622 and previous config saved to /var/cache/conftool/dbconfig/20250905-142921-ladsgroup.json
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir3006.esams.wmnet
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir3006.esams.wmnet with OS bookworm
  • 14:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T401906)', diff saved to https://phabricator.wikimedia.org/P82621 and previous config saved to /var/cache/conftool/dbconfig/20250905-141537-fceratto.json
  • 14:14 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T401906)', diff saved to https://phabricator.wikimedia.org/P82620 and previous config saved to /var/cache/conftool/dbconfig/20250905-141427-fceratto.json
  • 14:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T401906)', diff saved to https://phabricator.wikimedia.org/P82619 and previous config saved to /var/cache/conftool/dbconfig/20250905-141404-fceratto.json
  • 14:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3006.esams.wmnet with reason: host reimage
  • 14:13 sukhe@dns1004: END - running authdns-update
  • 14:12 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:12 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:12 sukhe@dns1004: START - running authdns-update
  • 14:09 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:08 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:08 sukhe: enabling puppet on cp3068: testing CR 1184886 T401595
  • 14:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3006.esams.wmnet with reason: host reimage
  • 14:06 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1052.eqiad.wmnet with OS bullseye
  • 14:04 sukhe: sudo cumin "A:cp" "disable-puppet 'merging CR 1184886-1184126-1184130'":T403510
  • 13:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P82617 and previous config saved to /var/cache/conftool/dbconfig/20250905-135857-fceratto.json
  • 13:56 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:56 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:52 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:51 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir3006.esams.wmnet with OS bookworm
  • 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir3006.esams.wmnet - jmm@cumin2002"
  • 13:44 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir3006.esams.wmnet - jmm@cumin2002"
  • 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir3006.esams.wmnet on all recursors
  • 13:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:44 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir3006.esams.wmnet on all recursors
  • 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3006.esams.wmnet - jmm@cumin2002"
  • 13:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P82616 and previous config saved to /var/cache/conftool/dbconfig/20250905-134350-fceratto.json
  • 13:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3006.esams.wmnet - jmm@cumin2002"
  • 13:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:37 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ncredir3006.esams.wmnet
  • 13:30 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 13:30 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 13:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T401906)', diff saved to https://phabricator.wikimedia.org/P82615 and previous config saved to /var/cache/conftool/dbconfig/20250905-132842-fceratto.json
  • 13:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 13:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T401906)', diff saved to https://phabricator.wikimedia.org/P82614 and previous config saved to /var/cache/conftool/dbconfig/20250905-132632-fceratto.json
  • 13:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 13:23 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 13:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 13:21 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 13:20 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 13:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1233.eqiad.wmnet with OS bullseye
  • 13:17 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1233.eqiad.wmnet with OS bullseye
  • 13:14 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 53066
  • 13:14 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 53066
  • 13:11 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3006.esams.wmnet
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum3006.esams.wmnet with OS bookworm
  • 13:09 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:04 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:04 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:59 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:58 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:55 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:54 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:53 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:53 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3006.esams.wmnet with reason: host reimage
  • 12:48 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3006.esams.wmnet with reason: host reimage
  • 12:47 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:28 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum3006.esams.wmnet with OS bookworm
  • 12:23 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:21 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum3006.esams.wmnet - jmm@cumin2002"
  • 12:15 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum3006.esams.wmnet - jmm@cumin2002"
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum3006.esams.wmnet on all recursors
  • 12:15 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum3006.esams.wmnet on all recursors
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum3006.esams.wmnet - jmm@cumin2002"
  • 12:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum3006.esams.wmnet - jmm@cumin2002"
  • 12:10 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum3006.esams.wmnet
  • 11:58 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1233.eqiad.wmnet with OS bullseye
  • 11:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2219 (T402925)', diff saved to https://phabricator.wikimedia.org/P82613 and previous config saved to /var/cache/conftool/dbconfig/20250905-114129-ladsgroup.json
  • 11:41 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 11:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T402925)', diff saved to https://phabricator.wikimedia.org/P82612 and previous config saved to /var/cache/conftool/dbconfig/20250905-114107-ladsgroup.json
  • 11:33 btullis@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 11:31 btullis@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 11:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P82611 and previous config saved to /var/cache/conftool/dbconfig/20250905-112559-ladsgroup.json
  • 11:18 btullis@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 11:16 btullis@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 11:15 btullis@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 11:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P82610 and previous config saved to /var/cache/conftool/dbconfig/20250905-111052-ladsgroup.json
  • 10:56 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T402925)', diff saved to https://phabricator.wikimedia.org/P82609 and previous config saved to /var/cache/conftool/dbconfig/20250905-105544-ladsgroup.json
  • 10:37 btullis@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 10:36 btullis@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 10:33 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:32 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:31 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host cp2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.k8s.wipe-cluster (exit_code=0) Wipe the K8s cluster dse-codfw: Kubernetes upgrade
  • 09:45 btullis@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 09:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1236.eqiad.wmnet with OS bullseye
  • 09:43 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:43 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:42 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:40 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1235.eqiad.wmnet with OS bullseye
  • 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2014.codfw.wmnet with OS bookworm
  • 09:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1236.eqiad.wmnet with reason: host reimage
  • 09:23 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1235.eqiad.wmnet with reason: host reimage
  • 09:23 btullis@cumin1003: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster dse-codfw: Kubernetes upgrade
  • 09:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:21 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:19 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1236.eqiad.wmnet with reason: host reimage
  • 09:17 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1235.eqiad.wmnet with reason: host reimage
  • 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2014.codfw.wmnet with reason: host reimage
  • 09:11 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2014.codfw.wmnet with reason: host reimage
  • 09:07 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1236.eqiad.wmnet with OS bullseye
  • 08:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1235.eqiad.wmnet with OS bullseye
  • 08:49 Emperor: remove dbg packages & repool ms-fe2016 T360913
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti3007.esams.wmnet
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3007.esams.wmnet
  • 08:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps2014.codfw.wmnet with OS bookworm
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2013.codfw.wmnet with OS bookworm
  • 08:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
  • 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2013.codfw.wmnet with reason: host reimage
  • 08:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2013.codfw.wmnet with reason: host reimage
  • 08:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2210 (T402925)', diff saved to https://phabricator.wikimedia.org/P82607 and previous config saved to /var/cache/conftool/dbconfig/20250905-081811-ladsgroup.json
  • 08:18 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 08:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T402925)', diff saved to https://phabricator.wikimedia.org/P82606 and previous config saved to /var/cache/conftool/dbconfig/20250905-081748-ladsgroup.json
  • 08:10 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti3007.esams.wmnet
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3007.esams.wmnet
  • 08:05 hashar: Restarted CI Jenkins to update plugins
  • 08:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P82605 and previous config saved to /var/cache/conftool/dbconfig/20250905-080241-ladsgroup.json
  • 08:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps2013.codfw.wmnet with OS bookworm
  • 07:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2012.codfw.wmnet with OS bookworm
  • 07:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P82604 and previous config saved to /var/cache/conftool/dbconfig/20250905-074733-ladsgroup.json
  • 07:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
  • 07:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3007.esams.wmnet
  • 07:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
  • 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3006.wikimedia.org
  • 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh3006.wikimedia.org with OS bookworm
  • 07:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2012.codfw.wmnet with reason: host reimage
  • 07:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2012.codfw.wmnet with reason: host reimage
  • 07:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T402925)', diff saved to https://phabricator.wikimedia.org/P82603 and previous config saved to /var/cache/conftool/dbconfig/20250905-073225-ladsgroup.json
  • 07:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3006.wikimedia.org with reason: host reimage
  • 07:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3006.wikimedia.org with reason: host reimage
  • 07:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps2012.codfw.wmnet with OS bookworm
  • 06:56 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host doh3006.wikimedia.org with OS bookworm
  • 06:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3006.wikimedia.org - jmm@cumin2002"
  • 06:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3006.wikimedia.org - jmm@cumin2002"
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh3006.wikimedia.org on all recursors
  • 06:50 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache doh3006.wikimedia.org on all recursors
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3006.wikimedia.org - jmm@cumin2002"
  • 06:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3006.wikimedia.org - jmm@cumin2002"
  • 06:46 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh3006.wikimedia.org
  • 06:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install2004.wikimedia.org
  • 06:44 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install2004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 06:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install2004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 06:39 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 06:38 kharlan@deploy1003: Sync cancelled.
  • 06:36 kharlan@deploy1003: kharlan: Backport for hCaptcha: Update secure enclave API endpoint synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:34 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install2004.wikimedia.org
  • 06:30 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Update secure enclave API endpoint
  • 06:04 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 139628
  • 05:54 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 139628
  • 04:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2206 (T402925)', diff saved to https://phabricator.wikimedia.org/P82602 and previous config saved to /var/cache/conftool/dbconfig/20250905-045020-ladsgroup.json
  • 04:50 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 03:19 ryankemper@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply new opensearch plugins pkg - ryankemper@cumin1002 - T403749
  • 02:18 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 02:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T402925)', diff saved to https://phabricator.wikimedia.org/P82601 and previous config saved to /var/cache/conftool/dbconfig/20250905-021830-ladsgroup.json
  • 02:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P82600 and previous config saved to /var/cache/conftool/dbconfig/20250905-020323-ladsgroup.json
  • 01:55 ryankemper@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 55 hosts with reason: rolling restart cirrus eqiad
  • 01:52 ryankemper@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply new opensearch plugins pkg - ryankemper@cumin1002 - T403749
  • 01:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P82599 and previous config saved to /var/cache/conftool/dbconfig/20250905-014815-ladsgroup.json
  • 01:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T402925)', diff saved to https://phabricator.wikimedia.org/P82598 and previous config saved to /var/cache/conftool/dbconfig/20250905-013307-ladsgroup.json
  • 00:07 ryankemper@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply new opensearch plugins pkg - ryankemper@cumin1002 - T403749

2025-09-04

  • 23:42 swfrench-wmf: finished single-replica PHP 8.3 pilot on shellbox-syntaxhighlight - T403284
  • 23:41 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:41 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:41 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:41 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:41 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:40 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 23:12 ryankemper@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply new opensearch plugins pkg - ryankemper@cumin1002 - T403749
  • 22:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2179 (T402925)', diff saved to https://phabricator.wikimedia.org/P82597 and previous config saved to /var/cache/conftool/dbconfig/20250904-224604-ladsgroup.json
  • 22:45 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 22:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T402925)', diff saved to https://phabricator.wikimedia.org/P82596 and previous config saved to /var/cache/conftool/dbconfig/20250904-224540-ladsgroup.json
  • 22:45 ryankemper@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply new opensearch plugins pkg - ryankemper@cumin1002 - T403749
  • 22:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P82595 and previous config saved to /var/cache/conftool/dbconfig/20250904-223032-ladsgroup.json
  • 22:21 ryankemper@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply new opensearch plugins pkg - ryankemper@cumin1002 - T403749
  • 22:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P82594 and previous config saved to /var/cache/conftool/dbconfig/20250904-221525-ladsgroup.json
  • 22:06 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new opensearch plugins pkg - bking@cumin1002 - T403749
  • 22:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T402925)', diff saved to https://phabricator.wikimedia.org/P82593 and previous config saved to /var/cache/conftool/dbconfig/20250904-220017-ladsgroup.json
  • 21:38 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new opensearch plugins pkg - bking@cumin1002 - T403749
  • 21:37 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: apply new opensearch plugins pkg - bking@cumin1002 - T403749
  • 21:31 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: apply new opensearch plugins pkg - bking@cumin1002 - T403749
  • 21:27 sbassett: Deployed security fix for T403411 to 1.45.0-wmf.17
  • 21:17 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:03 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:46 jgleeson: civicrm upgraded from a8f49cc4 to d4a2ed6e
  • 19:53 logmsgbot: dreamyjazz Deployed security patch for T403757
  • 19:33 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:11 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Create checkuser-suggested-investigations.dblist (T403471) (duration: 12m 36s)
  • 19:05 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 19:03 dreamyjazz@deploy1003: dreamyjazz: Backport for Create checkuser-suggested-investigations.dblist (T403471) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:58 dreamyjazz@deploy1003: Started scap sync-world: Backport for Create checkuser-suggested-investigations.dblist (T403471)
  • 18:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2172 (T402925)', diff saved to https://phabricator.wikimedia.org/P82591 and previous config saved to /var/cache/conftool/dbconfig/20250904-185431-ladsgroup.json
  • 18:54 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 18:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T402925)', diff saved to https://phabricator.wikimedia.org/P82590 and previous config saved to /var/cache/conftool/dbconfig/20250904-185418-ladsgroup.json
  • 18:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P82588 and previous config saved to /var/cache/conftool/dbconfig/20250904-183911-ladsgroup.json
  • 18:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 212635
  • 18:36 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 212635
  • 18:36 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 40731
  • 18:35 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 40731
  • 18:25 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1014.eqiad.wmnet with OS bookworm
  • 18:24 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P82587 and previous config saved to /var/cache/conftool/dbconfig/20250904-182403-ladsgroup.json
  • 18:09 dancy@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.17 refs T396378
  • 18:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T402925)', diff saved to https://phabricator.wikimedia.org/P82586 and previous config saved to /var/cache/conftool/dbconfig/20250904-180855-ladsgroup.json
  • 18:01 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus3003.esams.wmnet
  • 18:01 tappof@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:01 tappof@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus3003.esams.wmnet decommissioned, removing all IPs except the asset tag one - tappof@cumin1002"
  • 18:00 tappof@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus3003.esams.wmnet decommissioned, removing all IPs except the asset tag one - tappof@cumin1002"
  • 17:57 tappof@cumin1002: START - Cookbook sre.dns.netbox
  • 17:52 tappof@cumin1002: START - Cookbook sre.hosts.decommission for hosts prometheus3003.esams.wmnet
  • 17:46 ryankemper: [WDQS] T403738 Rolling restart of `envoyproxy.service` on `wdqs-main`, 2 hosts at a time
  • 17:28 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1236.eqiad.wmnet with OS bullseye
  • 17:27 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1235.eqiad.wmnet with OS bullseye
  • 17:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T401906)', diff saved to https://phabricator.wikimedia.org/P82585 and previous config saved to /var/cache/conftool/dbconfig/20250904-172250-fceratto.json
  • 17:09 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:08 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:08 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:08 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P82584 and previous config saved to /var/cache/conftool/dbconfig/20250904-170743-fceratto.json
  • 17:07 rzl: deployed chart 0.11.11 to api-gateway and rest-gateway prod, T403101
  • 17:04 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 17:04 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 17:04 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 17:04 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 17:04 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:03 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:02 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 17:02 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 17:00 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 16:59 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 16:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1014.eqiad.wmnet with OS bookworm
  • 16:53 swfrench-wmf: started single-replica PHP 8.3 pilot on shellbox-syntaxhighlight in eqiad - T403284
  • 16:52 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P82583 and previous config saved to /var/cache/conftool/dbconfig/20250904-165235-fceratto.json
  • 16:52 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:44 btullis: upgrading and restarting envoyproxy on cephosd200[1-3] for T402584
  • 16:44 rzl: deployed chart 0.11.11 to api-gateway and rest-gateway staging, T403101
  • 16:42 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:42 rzl@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:39 btullis: upgrading and restarting envoyproxy on cephosd100[2-5] for T402584
  • 16:39 swfrench-wmf: started single-replica PHP 8.3 pilot on shellbox-syntaxhighlight in codfw - T403284
  • 16:38 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:37 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T401906)', diff saved to https://phabricator.wikimedia.org/P82582 and previous config saved to /var/cache/conftool/dbconfig/20250904-163727-fceratto.json
  • 16:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T401906)', diff saved to https://phabricator.wikimedia.org/P82581 and previous config saved to /var/cache/conftool/dbconfig/20250904-163517-fceratto.json
  • 16:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 16:35 btullis: upgrading and restarting envoyproxy on cephosd1001 for T402584
  • 16:33 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:33 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:33 rzl@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:33 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:50 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T401906)', diff saved to https://phabricator.wikimedia.org/P82580 and previous config saved to /var/cache/conftool/dbconfig/20250904-154934-fceratto.json
  • 15:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T401906)', diff saved to https://phabricator.wikimedia.org/P82579 and previous config saved to /var/cache/conftool/dbconfig/20250904-154824-fceratto.json
  • 15:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 15:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 15:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T401906)', diff saved to https://phabricator.wikimedia.org/P82578 and previous config saved to /var/cache/conftool/dbconfig/20250904-154744-fceratto.json
  • 15:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P82577 and previous config saved to /var/cache/conftool/dbconfig/20250904-153236-fceratto.json
  • 15:31 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:27 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:25 tappof: migration from prometheus3003.esams to prometheus3004 has been completed T403620
  • 15:22 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:22 moritzm: upgrade Envoyproxy on cloudweb servers T402584
  • 15:22 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:20 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:17 moritzm: installing apache2 security updates
  • 15:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P82576 and previous config saved to /var/cache/conftool/dbconfig/20250904-151729-fceratto.json
  • 15:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:13 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1236.eqiad.wmnet with OS bullseye
  • 15:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2155 (T402925)', diff saved to https://phabricator.wikimedia.org/P82575 and previous config saved to /var/cache/conftool/dbconfig/20250904-151235-ladsgroup.json
  • 15:12 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 15:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T402925)', diff saved to https://phabricator.wikimedia.org/P82574 and previous config saved to /var/cache/conftool/dbconfig/20250904-151223-ladsgroup.json
  • 15:11 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1235.eqiad.wmnet with OS bullseye
  • 15:06 tappof@dns1004: END - running authdns-update
  • 15:05 tappof@dns1004: START - running authdns-update
  • 15:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T401906)', diff saved to https://phabricator.wikimedia.org/P82573 and previous config saved to /var/cache/conftool/dbconfig/20250904-150221-fceratto.json
  • 15:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:00 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 15:00 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:00 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T401906)', diff saved to https://phabricator.wikimedia.org/P82572 and previous config saved to /var/cache/conftool/dbconfig/20250904-150011-fceratto.json
  • 15:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 14:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T401906)', diff saved to https://phabricator.wikimedia.org/P82571 and previous config saved to /var/cache/conftool/dbconfig/20250904-145948-fceratto.json
  • 14:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P82570 and previous config saved to /var/cache/conftool/dbconfig/20250904-145716-ladsgroup.json
  • 14:54 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:51 moritzm: upgrade Envoyproxy on Puppet servers T402584
  • 14:51 XioNoX: disable OSPF on mr1-ulsfo to test BGP
  • 14:46 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mr1-ulsfo with reason: Bgp testing
  • 14:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P82569 and previous config saved to /var/cache/conftool/dbconfig/20250904-144441-fceratto.json
  • 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir3005.esams.wmnet to drbd
  • 14:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P82567 and previous config saved to /var/cache/conftool/dbconfig/20250904-144208-ladsgroup.json
  • 14:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:34 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir3005.esams.wmnet to drbd
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti3007.esams.wmnet to cluster esams03 and group B
  • 14:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P82566 and previous config saved to /var/cache/conftool/dbconfig/20250904-142933-fceratto.json
  • 14:28 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3007.esams.wmnet to cluster esams03 and group B
  • 14:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T402925)', diff saved to https://phabricator.wikimedia.org/P82565 and previous config saved to /var/cache/conftool/dbconfig/20250904-142701-ladsgroup.json
  • 14:25 moritzm: upgrade Envoyproxy on webperf* T402584
  • 14:25 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3007.esams.wmnet
  • 14:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T401906)', diff saved to https://phabricator.wikimedia.org/P82564 and previous config saved to /var/cache/conftool/dbconfig/20250904-141426-fceratto.json
  • 14:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
  • 14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T401906)', diff saved to https://phabricator.wikimedia.org/P82562 and previous config saved to /var/cache/conftool/dbconfig/20250904-141215-fceratto.json
  • 14:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 14:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T401906)', diff saved to https://phabricator.wikimedia.org/P82561 and previous config saved to /var/cache/conftool/dbconfig/20250904-141152-fceratto.json
  • 14:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:02 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 270735
  • 14:01 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 270735
  • 14:00 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from dumpsdata1007 to an-worker1236
  • 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1236
  • 13:56 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1236
  • 13:56 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-worker1236 on all recursors
  • 13:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P82560 and previous config saved to /var/cache/conftool/dbconfig/20250904-135645-fceratto.json
  • 13:56 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-worker1236 on all recursors
  • 13:56 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:56 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming dumpsdata1007 to an-worker1236 - btullis@cumin1003"
  • 13:56 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming dumpsdata1007 to an-worker1236 - btullis@cumin1003"
  • 13:52 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 13:52 btullis@cumin1003: START - Cookbook sre.hosts.rename from dumpsdata1007 to an-worker1236
  • 13:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove VIP for esams01 - jmm@cumin2002"
  • 13:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove VIP for esams01 - jmm@cumin2002"
  • 13:46 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:43 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from dumpsdata1006 to an-worker1235
  • 13:42 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1235
  • 13:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P82558 and previous config saved to /var/cache/conftool/dbconfig/20250904-134137-fceratto.json
  • 13:38 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1235
  • 13:38 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-worker1235 on all recursors
  • 13:38 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-worker1235 on all recursors
  • 13:38 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:35 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 13:35 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:34 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 13:34 mforns@deploy1003: Finished deploy [analytics/refinery@a1f5011] (thin): Fix for pageview actor automated reasons THIN [analytics/refinery@a1f5011b] (duration: 00m 57s)
  • 13:33 mforns@deploy1003: Started deploy [analytics/refinery@a1f5011] (thin): Fix for pageview actor automated reasons THIN [analytics/refinery@a1f5011b]
  • 13:32 mforns@deploy1003: Finished deploy [analytics/refinery@a1f5011]: Fix for pageview actor automated reasons [analytics/refinery@a1f5011b] (duration: 02m 52s)
  • 13:31 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dnsdse-k8s-worker1014 - jclark@cumin1002"
  • 13:31 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dnsdse-k8s-worker1014 - jclark@cumin1002"
  • 13:30 jgreen@dns1004: END - running authdns-update
  • 13:30 mforns@deploy1003: Started deploy [analytics/refinery@a1f5011]: Fix for pageview actor automated reasons [analytics/refinery@a1f5011b]
  • 13:29 jgreen@dns1004: START - running authdns-update
  • 13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T401906)', diff saved to https://phabricator.wikimedia.org/P82557 and previous config saved to /var/cache/conftool/dbconfig/20250904-132630-fceratto.json
  • 13:25 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:24 dcaro@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cloudcephosd1052.eqiad.wmnet with reason: swapping network card
  • 13:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T401906)', diff saved to https://phabricator.wikimedia.org/P82556 and previous config saved to /var/cache/conftool/dbconfig/20250904-132419-fceratto.json
  • 13:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 13:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T401906)', diff saved to https://phabricator.wikimedia.org/P82555 and previous config saved to /var/cache/conftool/dbconfig/20250904-132356-fceratto.json
  • 13:20 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 13:20 btullis@cumin1003: START - Cookbook sre.hosts.rename from dumpsdata1006 to an-worker1235
  • 13:13 hashar: upgrading CI Jenkins | T403703
  • 13:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P82554 and previous config saved to /var/cache/conftool/dbconfig/20250904-130848-fceratto.json
  • 13:04 XioNoX: push pfw policies - T403717
  • 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3007.esams.wmnet with OS bookworm
  • 12:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P82553 and previous config saved to /var/cache/conftool/dbconfig/20250904-125341-fceratto.json
  • 12:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T401906)', diff saved to https://phabricator.wikimedia.org/P82552 and previous config saved to /var/cache/conftool/dbconfig/20250904-123833-fceratto.json
  • 12:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T401906)', diff saved to https://phabricator.wikimedia.org/P82551 and previous config saved to /var/cache/conftool/dbconfig/20250904-123723-fceratto.json
  • 12:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 12:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T401906)', diff saved to https://phabricator.wikimedia.org/P82550 and previous config saved to /var/cache/conftool/dbconfig/20250904-123701-fceratto.json
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3007.esams.wmnet with reason: host reimage
  • 12:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3007.esams.wmnet with reason: host reimage
  • 12:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P82549 and previous config saved to /var/cache/conftool/dbconfig/20250904-122153-fceratto.json
  • 12:14 arnoldokoth: Upgrade envoyproxy on vrts1003 T402584
  • 12:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P82548 and previous config saved to /var/cache/conftool/dbconfig/20250904-120646-fceratto.json
  • 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3007.esams.wmnet with OS bookworm
  • 11:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T401906)', diff saved to https://phabricator.wikimedia.org/P82547 and previous config saved to /var/cache/conftool/dbconfig/20250904-115135-fceratto.json
  • 11:50 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T401906)', diff saved to https://phabricator.wikimedia.org/P82546 and previous config saved to /var/cache/conftool/dbconfig/20250904-115025-fceratto.json
  • 11:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 11:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T401906)', diff saved to https://phabricator.wikimedia.org/P82545 and previous config saved to /var/cache/conftool/dbconfig/20250904-115002-fceratto.json
  • 11:44 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@b41bbe7] (releasing): Update production releases Jenkins (duration: 00m 36s)
  • 11:43 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@b41bbe7] (releasing): Update production releases Jenkins
  • 11:36 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@b41bbe7] (releasing): Testing (duration: 00m 26s)
  • 11:36 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@b41bbe7] (releasing): Testing
  • 11:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P82544 and previous config saved to /var/cache/conftool/dbconfig/20250904-113455-fceratto.json
  • 11:32 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@b41bbe7] (releasing): Testing (duration: 00m 38s)
  • 11:32 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@b41bbe7] (releasing): Testing
  • 11:28 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2147 (T402925)', diff saved to https://phabricator.wikimedia.org/P82543 and previous config saved to /var/cache/conftool/dbconfig/20250904-112804-ladsgroup.json
  • 11:27 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 11:21 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@9a6431c] (releasing): Update backup releases Jenkins (duration: 02m 09s)
  • 11:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P82542 and previous config saved to /var/cache/conftool/dbconfig/20250904-111947-fceratto.json
  • 11:19 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@9a6431c] (releasing): Update backup releases Jenkins
  • 11:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T401906)', diff saved to https://phabricator.wikimedia.org/P82541 and previous config saved to /var/cache/conftool/dbconfig/20250904-110440-fceratto.json
  • 11:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T401906)', diff saved to https://phabricator.wikimedia.org/P82540 and previous config saved to /var/cache/conftool/dbconfig/20250904-110230-fceratto.json
  • 11:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 11:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T401906)', diff saved to https://phabricator.wikimedia.org/P82539 and previous config saved to /var/cache/conftool/dbconfig/20250904-110207-fceratto.json
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast3007.wikimedia.org
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast3007.wikimedia.org with OS bookworm
  • 10:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P82538 and previous config saved to /var/cache/conftool/dbconfig/20250904-104700-fceratto.json
  • 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast3007.wikimedia.org with reason: host reimage
  • 10:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast3007.wikimedia.org with reason: host reimage
  • 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ncredir3003.esams.wmnet
  • 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir3003.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir3003.esams.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P82537 and previous config saved to /var/cache/conftool/dbconfig/20250904-103153-fceratto.json
  • 10:29 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:24 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ncredir3003.esams.wmnet
  • 10:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T401906)', diff saved to https://phabricator.wikimedia.org/P82536 and previous config saved to /var/cache/conftool/dbconfig/20250904-101645-fceratto.json
  • 10:14 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T401906)', diff saved to https://phabricator.wikimedia.org/P82535 and previous config saved to /var/cache/conftool/dbconfig/20250904-101435-fceratto.json
  • 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast3007.wikimedia.org with OS bookworm
  • 10:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 10:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T401906)', diff saved to https://phabricator.wikimedia.org/P82534 and previous config saved to /var/cache/conftool/dbconfig/20250904-101412-fceratto.json
  • 10:02 moritzm: imported jenkins 2.516.2 for Bullseye/Bookworm T403703
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast3007.wikimedia.org - jmm@cumin2002"
  • 10:00 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast3007.wikimedia.org - jmm@cumin2002"
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast3007.wikimedia.org on all recursors
  • 09:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache bast3007.wikimedia.org on all recursors
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3007.wikimedia.org - jmm@cumin2002"
  • 09:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P82533 and previous config saved to /var/cache/conftool/dbconfig/20250904-095904-fceratto.json
  • 09:58 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast3007.wikimedia.org - jmm@cumin2002"
  • 09:54 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host bast3007.wikimedia.org
  • 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host prometheus3004.esams.wmnet
  • 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus3004.esams.wmnet with OS bookworm
  • 09:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P82532 and previous config saved to /var/cache/conftool/dbconfig/20250904-094357-fceratto.json
  • 09:42 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 09:41 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 09:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 09:38 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 09:38 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 09:37 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 09:37 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 09:37 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 09:37 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 09:36 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus3004.esams.wmnet with reason: host reimage
  • 09:28 ayounsi@cumin1003: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 7679
  • 09:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T401906)', diff saved to https://phabricator.wikimedia.org/P82530 and previous config saved to /var/cache/conftool/dbconfig/20250904-092849-fceratto.json
  • 09:28 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 7679
  • 09:26 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T401906)', diff saved to https://phabricator.wikimedia.org/P82529 and previous config saved to /var/cache/conftool/dbconfig/20250904-092639-fceratto.json
  • 09:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus3004.esams.wmnet with reason: host reimage
  • 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus3004.esams.wmnet with OS bookworm
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus3004.esams.wmnet - jmm@cumin2002"
  • 09:00 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus3004.esams.wmnet - jmm@cumin2002"
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus3004.esams.wmnet on all recursors
  • 09:00 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache prometheus3004.esams.wmnet on all recursors
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3004.esams.wmnet - jmm@cumin2002"
  • 08:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus3004.esams.wmnet - jmm@cumin2002"
  • 08:56 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host prometheus3004.esams.wmnet
  • 08:45 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T402925)', diff saved to https://phabricator.wikimedia.org/P82528 and previous config saved to /var/cache/conftool/dbconfig/20250904-084508-ladsgroup.json
  • 08:39 elukey: kill and restart imposm on maps-test2001 - stuck since August 10, lag building up and alerts
  • 08:36 ayounsi@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host atlas3001.wikimedia.org
  • 08:36 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas3001.wikimedia.org - ayounsi@cumin1003"
  • 08:36 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas3001.wikimedia.org - ayounsi@cumin1003"
  • 08:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas3001.wikimedia.org on all recursors
  • 08:35 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache atlas3001.wikimedia.org on all recursors
  • 08:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas3001.wikimedia.org - ayounsi@cumin1003"
  • 08:35 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas3001.wikimedia.org - ayounsi@cumin1003"
  • 08:32 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:31 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 08:31 ayounsi@cumin1003: START - Cookbook sre.ganeti.makevm for new host atlas3001.wikimedia.org
  • 08:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P82527 and previous config saved to /var/cache/conftool/dbconfig/20250904-083001-ladsgroup.json
  • 08:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti3005.esams.wmnet to cluster esams03 and group B
  • 08:21 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3005.esams.wmnet to cluster esams03 and group B
  • 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3005.esams.wmnet
  • 08:17 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 150178
  • 08:16 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 150178
  • 08:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P82526 and previous config saved to /var/cache/conftool/dbconfig/20250904-081453-ladsgroup.json
  • 08:10 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 08:01 kartik@deploy1003: Finished scap sync-world: Backport for Revert^2 "TranslationUnitDTO: Make blob type properties writable" (duration: 15m 30s)
  • 07:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T402925)', diff saved to https://phabricator.wikimedia.org/P82525 and previous config saved to /var/cache/conftool/dbconfig/20250904-075945-ladsgroup.json
  • 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7679
  • 07:58 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 7679
  • 07:58 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 264927
  • 07:58 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 264927
  • 07:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 267517
  • 07:57 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 267517
  • 07:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268197
  • 07:57 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 268197
  • 07:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40731
  • 07:57 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 40731
  • 07:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270364
  • 07:57 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 270364
  • 07:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 150178
  • 07:56 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 150178
  • 07:56 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263270
  • 07:56 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 263270
  • 07:56 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263016
  • 07:56 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 263016
  • 07:55 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28604
  • 07:55 kartik@deploy1003: abi, kartik: Continuing with sync
  • 07:55 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 28604
  • 07:55 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 212635
  • 07:54 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 212635
  • 07:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 273421
  • 07:53 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 273421
  • 07:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7063
  • 07:53 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 7063
  • 07:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268188
  • 07:53 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 268188
  • 07:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262662
  • 07:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 262662
  • 07:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 53066
  • 07:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 53066
  • 07:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 267614
  • 07:52 kartik@deploy1003: abi, kartik: Backport for Revert^2 "TranslationUnitDTO: Make blob type properties writable" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:51 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 267614
  • 07:51 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266539
  • 07:51 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 266539
  • 07:51 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 269548
  • 07:51 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 269548
  • 07:51 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263908
  • 07:51 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 263908
  • 07:51 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 265966
  • 07:50 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 265966
  • 07:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28652
  • 07:50 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 28652
  • 07:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262316
  • 07:50 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 262316
  • 07:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 272207
  • 07:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3005.esams.wmnet with OS bookworm
  • 07:49 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 272207
  • 07:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52762
  • 07:49 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 52762
  • 07:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45014
  • 07:49 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 45014
  • 07:47 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 264011
  • 07:46 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 264011
  • 07:46 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 265249
  • 07:45 kartik@deploy1003: Started scap sync-world: Backport for Revert^2 "TranslationUnitDTO: Make blob type properties writable"
  • 07:45 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 265249
  • 07:45 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 139628
  • 07:45 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 139628
  • 07:44 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262412
  • 07:44 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 262412
  • 07:44 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199710
  • 07:43 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 199710
  • 07:43 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270735
  • 07:43 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 270735
  • 07:43 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9002
  • 07:42 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 9002
  • 07:42 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266240
  • 07:41 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 266240
  • 07:41 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52968
  • 07:41 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 52968
  • 07:41 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 273363
  • 07:41 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 273363
  • 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268795
  • 07:40 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 268795
  • 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 269396
  • 07:40 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 269396
  • 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 267536
  • 07:39 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 267536
  • 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 41327
  • 07:39 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 41327
  • 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262777
  • 07:38 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 262777
  • 07:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28126
  • 07:37 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 28126
  • 07:37 phuedx@deploy1003: Finished scap sync-world: Backport for MetricsPlatform: Enable overrides everywhere (T402369) (duration: 15m 33s)
  • 07:32 phuedx@deploy1003: phuedx: Continuing with sync
  • 07:31 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 5400
  • 07:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3005.esams.wmnet with reason: host reimage
  • 07:26 phuedx@deploy1003: phuedx: Backport for MetricsPlatform: Enable overrides everywhere (T402369) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3005.esams.wmnet with reason: host reimage
  • 07:21 phuedx@deploy1003: Started scap sync-world: Backport for MetricsPlatform: Enable overrides everywhere (T402369)
  • 07:20 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 5400
  • 07:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3005.esams.wmnet with OS bookworm
  • 06:46 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 06:36 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 05:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1252 (T402925)', diff saved to https://phabricator.wikimedia.org/P82524 and previous config saved to /var/cache/conftool/dbconfig/20250904-051806-ladsgroup.json
  • 05:17 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1252.eqiad.wmnet with reason: Maintenance
  • 05:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T402925)', diff saved to https://phabricator.wikimedia.org/P82523 and previous config saved to /var/cache/conftool/dbconfig/20250904-051743-ladsgroup.json
  • 05:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P82522 and previous config saved to /var/cache/conftool/dbconfig/20250904-050235-ladsgroup.json
  • 04:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P82521 and previous config saved to /var/cache/conftool/dbconfig/20250904-044728-ladsgroup.json
  • 04:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T402925)', diff saved to https://phabricator.wikimedia.org/P82520 and previous config saved to /var/cache/conftool/dbconfig/20250904-043220-ladsgroup.json
  • 01:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1249 (T402925)', diff saved to https://phabricator.wikimedia.org/P82519 and previous config saved to /var/cache/conftool/dbconfig/20250904-015952-ladsgroup.json
  • 01:59 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 01:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T402925)', diff saved to https://phabricator.wikimedia.org/P82518 and previous config saved to /var/cache/conftool/dbconfig/20250904-015929-ladsgroup.json
  • 01:44 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P82517 and previous config saved to /var/cache/conftool/dbconfig/20250904-014422-ladsgroup.json
  • 01:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P82516 and previous config saved to /var/cache/conftool/dbconfig/20250904-012914-ladsgroup.json
  • 01:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T402925)', diff saved to https://phabricator.wikimedia.org/P82515 and previous config saved to /var/cache/conftool/dbconfig/20250904-011407-ladsgroup.json
  • 01:13 kemayo@deploy1003: Finished scap sync-world: Backport for EditAttemptStep: don't error if something is blocking session logging (T403656), EditAttemptStep: don't error if something is blocking session logging (T403656) (duration: 12m 07s)
  • 01:08 kemayo@deploy1003: jforrester, kemayo: Continuing with sync
  • 01:06 kemayo@deploy1003: jforrester, kemayo: Backport for EditAttemptStep: don't error if something is blocking session logging (T403656), EditAttemptStep: don't error if something is blocking session logging (T403656) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 01:01 kemayo@deploy1003: Started scap sync-world: Backport for EditAttemptStep: don't error if something is blocking session logging (T403656), EditAttemptStep: don't error if something is blocking session logging (T403656)
  • 00:45 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 00:45 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 00:45 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 00:44 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 00:44 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 00:43 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 00:07 krinkle@deploy1003: Finished scap sync-world: Backport for Disable wmgUseMdotRouting on testwiki in prod (T401595) (duration: 09m 30s)
  • 00:01 krinkle@deploy1003: krinkle: Continuing with sync
  • 00:00 krinkle@deploy1003: krinkle: Backport for Disable wmgUseMdotRouting on testwiki in prod (T401595) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

2025-09-03

  • 23:57 krinkle@deploy1003: Started scap sync-world: Backport for Disable wmgUseMdotRouting on testwiki in prod (T401595)
  • 23:38 denisse: Adding slack_bot_token to private repo - T401730
  • 22:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1248 (T402925)', diff saved to https://phabricator.wikimedia.org/P82513 and previous config saved to /var/cache/conftool/dbconfig/20250903-225738-ladsgroup.json
  • 22:57 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 22:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T402925)', diff saved to https://phabricator.wikimedia.org/P82512 and previous config saved to /var/cache/conftool/dbconfig/20250903-225714-ladsgroup.json
  • 22:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P82511 and previous config saved to /var/cache/conftool/dbconfig/20250903-224206-ladsgroup.json
  • 22:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P82510 and previous config saved to /var/cache/conftool/dbconfig/20250903-222659-ladsgroup.json
  • 22:23 jdlrobson@deploy1003: Finished scap sync-world: Backport for Cleanup special wikis (T400066) (duration: 11m 47s)
  • 22:18 jdlrobson@deploy1003: jdlrobson: Continuing with sync
  • 22:16 jdlrobson@deploy1003: jdlrobson: Backport for Cleanup special wikis (T400066) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T402925)', diff saved to https://phabricator.wikimedia.org/P82509 and previous config saved to /var/cache/conftool/dbconfig/20250903-221151-ladsgroup.json
  • 22:11 jdlrobson@deploy1003: Started scap sync-world: Backport for Cleanup special wikis (T400066)
  • 21:56 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 21:50 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 21:43 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 21:37 James_F: Running `mwscript-k8s -f -- extensions/WikiLambda/maintenance/updateSecondaryTables.php --wiki=wikifunctionswiki --quick --zType Z4 --verbose` to try to fix T403671
  • 21:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modifiy DNS for frm2002 and frdb2002 - pt1979@cumin2002"
  • 21:13 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modifiy DNS for frm2002 and frdb2002 - pt1979@cumin2002"
  • 21:12 catrope@deploy1003: Finished scap sync-world: Backport for Fix display of Codex message icons (T401457), Fix display of Codex message icons (T401457) (duration: 13m 20s)
  • 21:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 21:07 catrope@deploy1003: catrope: Continuing with sync
  • 21:06 catrope@deploy1003: catrope: Backport for Fix display of Codex message icons (T401457), Fix display of Codex message icons (T401457) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:59 catrope@deploy1003: Started scap sync-world: Backport for Fix display of Codex message icons (T401457), Fix display of Codex message icons (T401457)
  • 20:45 kemayo@deploy1003: Finished scap sync-world: Backport for Edit check: log to VEFU if a tone check would have been shown if not for the a/b test (T394952), Edit check: deploy tone a/b test to frwiki, jawiki, ptwiki (T389231) (duration: 11m 12s)
  • 20:40 kemayo@deploy1003: kemayo: Continuing with sync
  • 20:39 Dreamy_Jazz: Created cusi_case on testwiki extension1 - T403473
  • 20:39 kemayo@deploy1003: kemayo: Backport for Edit check: log to VEFU if a tone check would have been shown if not for the a/b test (T394952), Edit check: deploy tone a/b test to frwiki, jawiki, ptwiki (T389231) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:38 Dreamy_Jazz: Created cusi_signal on frwiki, zhwiki, idwiki, jawiki, fawiki, ptwiki, trwiki, and enwiki in the extension1 cluster - T403473
  • 20:37 Dreamy_Jazz: Created cusi_case on frwiki, zhwiki, idwiki, jawiki, fawiki, ptwiki, trwiki, and enwiki in the extension1 cluster - T403473
  • 20:36 Dreamy_Jazz: Created cusi_user on frwiki, zhwiki, idwiki, jawiki, fawiki, ptwiki, trwiki, and enwiki in the extension1 cluster - T403473
  • 20:36 Dreamy_Jazz: Created cusi_user on frwiki, zhwiki, idwiki, jawiki, fawiki, ptwiki, trwiki, and enwiki
  • 20:34 kemayo@deploy1003: Started scap sync-world: Backport for Edit check: log to VEFU if a tone check would have been shown if not for the a/b test (T394952), Edit check: deploy tone a/b test to frwiki, jawiki, ptwiki (T389231)
  • 20:34 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 20:32 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps1014.eqiad.wmnet with OS bookworm
  • 20:32 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 20:31 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 20:31 Dreamy_Jazz: Created cusi_signal on testwiki extension1 - T403473
  • 20:30 Dreamy_Jazz: Created cusi_user on testwiki extension1 - T403473
  • 20:29 Dreamy_Jazz: Created cusi_case on testwiki extension1
  • 20:27 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:17 dani@deploy1003: Finished scap sync-world: Backport for Fix typo on newcomers survey (T402915), tlwiktionary: add logos (T403433) (duration: 12m 43s)
  • 20:17 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:12 dani@deploy1003: chlod, dani: Continuing with sync
  • 20:11 dani@deploy1003: chlod, dani: Backport for Fix typo on newcomers survey (T402915), tlwiktionary: add logos (T403433) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1014.eqiad.wmnet with reason: host reimage
  • 20:06 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host backup1012.eqiad.wmnet with OS bookworm
  • 20:05 dani@deploy1003: Started scap sync-world: Backport for Fix typo on newcomers survey (T402915), tlwiktionary: add logos (T403433)
  • 20:02 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1014.eqiad.wmnet with reason: host reimage
  • 19:51 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps1013.eqiad.wmnet with OS bookworm
  • 19:46 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 19:43 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 19:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1247 (T402925)', diff saved to https://phabricator.wikimedia.org/P82508 and previous config saved to /var/cache/conftool/dbconfig/20250903-193624-ladsgroup.json
  • 19:36 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 19:34 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host maps1014.eqiad.wmnet with OS bookworm
  • 19:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1013.eqiad.wmnet with reason: host reimage
  • 19:19 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1013.eqiad.wmnet with reason: host reimage
  • 18:51 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host maps1013.eqiad.wmnet with OS bookworm
  • 18:47 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps1012.eqiad.wmnet with OS bookworm
  • 18:47 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 18:47 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 18:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1012.eqiad.wmnet with reason: host reimage
  • 18:24 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1012.eqiad.wmnet with reason: host reimage
  • 18:20 dancy@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.17 refs T396378
  • 18:06 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps1011.eqiad.wmnet with OS bookworm
  • 18:04 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 18:03 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 17:56 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host maps1012.eqiad.wmnet with OS bookworm
  • 17:54 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host maps1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:48 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host maps1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1011.eqiad.wmnet with reason: host reimage
  • 17:39 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1011.eqiad.wmnet with reason: host reimage
  • 17:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host maps1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:37 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host maps1014
  • 17:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host maps1014
  • 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:35 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt maps1011 - vriley@cumin1003"
  • 17:35 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt maps1011 - vriley@cumin1003"
  • 17:30 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 17:19 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:18 vriley@cumin1003: START - Cookbook sre.hosts.provision for host maps1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:12 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host maps1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:11 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host maps1011.eqiad.wmnet with OS bookworm
  • 17:09 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:08 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 17:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T402925)', diff saved to https://phabricator.wikimedia.org/P82507 and previous config saved to /var/cache/conftool/dbconfig/20250903-170800-ladsgroup.json
  • 17:04 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:04 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:01 sukhe: rolling out CRs 1180969, 1183212, 1180577, -b31 A:cp: T401595
  • 16:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host maps1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:52 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P82506 and previous config saved to /var/cache/conftool/dbconfig/20250903-165253-ladsgroup.json
  • 16:49 urbanecm@deploy1003: Finished scap sync-world: Backport for [Growth] enwiki: Deploy "Add a link" to 100% of users (T395524) (duration: 09m 40s)
  • 16:44 urbanecm@deploy1003: urbanecm, cyndywikime: Continuing with sync
  • 16:44 urbanecm@deploy1003: urbanecm, cyndywikime: Backport for [Growth] enwiki: Deploy "Add a link" to 100% of users (T395524) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:43 sukhe: merging CR 1183212: T401595
  • 16:40 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] enwiki: Deploy "Add a link" to 100% of users (T395524)
  • 16:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P82505 and previous config saved to /var/cache/conftool/dbconfig/20250903-163745-ladsgroup.json
  • 16:35 sukhe: sudo cumin "A:cp" "disable-puppet 'merging CR 1180969'": T401595
  • 16:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T402925)', diff saved to https://phabricator.wikimedia.org/P82504 and previous config saved to /var/cache/conftool/dbconfig/20250903-162237-ladsgroup.json
  • 16:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 16:12 mutante: people1005 - systemctl start wmf_auto_restart_envoyproxy.service
  • 16:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T401906)', diff saved to https://phabricator.wikimedia.org/P82503 and previous config saved to /var/cache/conftool/dbconfig/20250903-161252-fceratto.json
  • 16:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for people2004.codfw.wmnet
  • 16:11 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for people2004.codfw.wmnet
  • 16:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for people1005.eqiad.wmnet
  • 16:11 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for people1005.eqiad.wmnet
  • 15:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P82502 and previous config saved to /var/cache/conftool/dbconfig/20250903-155744-fceratto.json
  • 15:55 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:54 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:47 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on people2004.codfw.wmnet with reason: debugging
  • 15:44 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:43 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on people1005.eqiad.wmnet with reason: debugging
  • 15:43 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P82501 and previous config saved to /var/cache/conftool/dbconfig/20250903-154237-fceratto.json
  • 15:39 ariel@deploy1003: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=nowiki --logwiki=metawiki DSBinfo Nordlysoversola # T403581
  • 15:38 ariel@deploy1003: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=no.wikipedia.org --logwiki=metawiki DSBinfo Nordlysoversola # T403581
  • 15:38 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:37 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T401906)', diff saved to https://phabricator.wikimedia.org/P82500 and previous config saved to /var/cache/conftool/dbconfig/20250903-152729-fceratto.json
  • 15:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T401906)', diff saved to https://phabricator.wikimedia.org/P82499 and previous config saved to /var/cache/conftool/dbconfig/20250903-152519-fceratto.json
  • 15:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 15:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T401906)', diff saved to https://phabricator.wikimedia.org/P82498 and previous config saved to /var/cache/conftool/dbconfig/20250903-152457-fceratto.json
  • 15:15 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Instrument CentralAuthUser::getBlocks (T401701) (duration: 12m 03s)
  • 15:10 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 15:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P82497 and previous config saved to /var/cache/conftool/dbconfig/20250903-150949-fceratto.json
  • 15:09 dreamyjazz@deploy1003: dreamyjazz: Backport for Instrument CentralAuthUser::getBlocks (T401701) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:09 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1012.eqiad.wmnet with OS trixie
  • 15:03 dreamyjazz@deploy1003: Started scap sync-world: Backport for Instrument CentralAuthUser::getBlocks (T401701)
  • 15:00 urandom: upgrading envoyproxy to 1.26.8-1, restbase/codfw — T402584
  • 14:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P82496 and previous config saved to /var/cache/conftool/dbconfig/20250903-145441-fceratto.json
  • 14:54 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1012.eqiad.wmnet with reason: host reimage
  • 14:50 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1012.eqiad.wmnet with reason: host reimage
  • 14:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T401906)', diff saved to https://phabricator.wikimedia.org/P82495 and previous config saved to /var/cache/conftool/dbconfig/20250903-143934-fceratto.json
  • 14:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T401906)', diff saved to https://phabricator.wikimedia.org/P82494 and previous config saved to /var/cache/conftool/dbconfig/20250903-143724-fceratto.json
  • 14:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 14:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T401906)', diff saved to https://phabricator.wikimedia.org/P82493 and previous config saved to /var/cache/conftool/dbconfig/20250903-143701-fceratto.json
  • 14:28 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS trixie
  • 14:24 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 14:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 14:23 gkyziridis@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 14:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P82491 and previous config saved to /var/cache/conftool/dbconfig/20250903-142154-fceratto.json
  • 14:21 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Update logging (duration: 10m 43s)
  • 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:20 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:18 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:17 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:16 kharlan@deploy1003: kharlan: Continuing with sync
  • 14:15 kharlan@deploy1003: kharlan: Backport for hCaptcha: Update logging synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:11 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Update logging
  • 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:08 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:08 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P82490 and previous config saved to /var/cache/conftool/dbconfig/20250903-140646-fceratto.json
  • 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:03 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:01 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:00 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:59 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS trixie
  • 13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T401906)', diff saved to https://phabricator.wikimedia.org/P82489 and previous config saved to /var/cache/conftool/dbconfig/20250903-135139-fceratto.json
  • 13:50 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T401906)', diff saved to https://phabricator.wikimedia.org/P82488 and previous config saved to /var/cache/conftool/dbconfig/20250903-135028-fceratto.json
  • 13:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 13:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T401906)', diff saved to https://phabricator.wikimedia.org/P82487 and previous config saved to /var/cache/conftool/dbconfig/20250903-135005-fceratto.json
  • 13:49 urandom: upgrading envoyproxy to 1.26.8-1, restbase/eqiad (cassandra) rack 'd' — T402584
  • 13:49 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:47 mvolz@deploy1003: Finished scap sync-world: Backport for Remove all references to deprecated parameter (T361576) (duration: 17m 13s)
  • 13:45 Amir1: dropping all unused tables of securepoll in s3 (T395928)
  • 13:45 urandom: upgrading envoyproxy to 1.26.8-1, restbase/eqiad (cassandra) rack 'b' — T402584
  • 13:42 mvolz@deploy1003: mvolz: Continuing with sync
  • 13:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1244 (T402925)', diff saved to https://phabricator.wikimedia.org/P82485 and previous config saved to /var/cache/conftool/dbconfig/20250903-134043-ladsgroup.json
  • 13:40 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 13:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T402925)', diff saved to https://phabricator.wikimedia.org/P82484 and previous config saved to /var/cache/conftool/dbconfig/20250903-134019-ladsgroup.json
  • 13:36 mvolz@deploy1003: mvolz: Backport for Remove all references to deprecated parameter (T361576) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:35 urandom: upgrading envoyproxy to 1.26.8-1, restbase/eqiad (cassandra) rack 'a' — T402584
  • 13:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P82483 and previous config saved to /var/cache/conftool/dbconfig/20250903-133457-fceratto.json
  • 13:30 mvolz@deploy1003: Started scap sync-world: Backport for Remove all references to deprecated parameter (T361576)
  • 13:26 cscott@deploy1003: Finished scap sync-world: Backport for Replace ParamType with ListType (duration: 15m 06s)
  • 13:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P82482 and previous config saved to /var/cache/conftool/dbconfig/20250903-132512-ladsgroup.json
  • 13:21 cscott@deploy1003: cscott: Continuing with sync
  • 13:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P82481 and previous config saved to /var/cache/conftool/dbconfig/20250903-131950-fceratto.json
  • 13:18 cscott@deploy1003: cscott: Backport for Replace ParamType with ListType synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:11 cscott@deploy1003: Started scap sync-world: Backport for Replace ParamType with ListType
  • 13:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P82480 and previous config saved to /var/cache/conftool/dbconfig/20250903-131004-ladsgroup.json
  • 13:09 jgleeson: fundraising-tools upgraded from 284579a9 to 3fba9888
  • 13:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T401906)', diff saved to https://phabricator.wikimedia.org/P82479 and previous config saved to /var/cache/conftool/dbconfig/20250903-130442-fceratto.json
  • 13:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T401906)', diff saved to https://phabricator.wikimedia.org/P82478 and previous config saved to /var/cache/conftool/dbconfig/20250903-130232-fceratto.json
  • 13:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 12:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum3003.esams.wmnet
  • 12:59 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:56 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 12:55 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh3003.wikimedia.org
  • 12:55 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T402925)', diff saved to https://phabricator.wikimedia.org/P82477 and previous config saved to /var/cache/conftool/dbconfig/20250903-125456-ladsgroup.json
  • 12:52 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 12:52 ayounsi@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host atlas3001.wikimedia.org
  • 12:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas3001.wikimedia.org on all recursors
  • 12:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache atlas3001.wikimedia.org on all recursors
  • 12:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM atlas3001.wikimedia.org - ayounsi@cumin1003"
  • 12:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM atlas3001.wikimedia.org - ayounsi@cumin1003"
  • 12:47 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 12:47 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:45 sukhe@cumin1003: START - Cookbook sre.hosts.decommission for hosts doh3003.wikimedia.org
  • 12:44 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 12:44 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas3001.wikimedia.org on all recursors
  • 12:44 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache atlas3001.wikimedia.org on all recursors
  • 12:44 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:44 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas3001.wikimedia.org - ayounsi@cumin1003"
  • 12:43 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas3001.wikimedia.org - ayounsi@cumin1003"
  • 12:43 sukhe@cumin1003: START - Cookbook sre.hosts.decommission for hosts durum3003.esams.wmnet
  • 12:40 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 12:40 ayounsi@cumin1003: START - Cookbook sre.ganeti.makevm for new host atlas3001.wikimedia.org
  • 12:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T401906)', diff saved to https://phabricator.wikimedia.org/P82476 and previous config saved to /var/cache/conftool/dbconfig/20250903-121648-fceratto.json
  • 12:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T401906)', diff saved to https://phabricator.wikimedia.org/P82475 and previous config saved to /var/cache/conftool/dbconfig/20250903-121538-fceratto.json
  • 12:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 12:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T401906)', diff saved to https://phabricator.wikimedia.org/P82474 and previous config saved to /var/cache/conftool/dbconfig/20250903-121514-fceratto.json
  • 12:04 Amir1: dropping objectcache table in group0 (T397367)
  • 12:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P82473 and previous config saved to /var/cache/conftool/dbconfig/20250903-120007-fceratto.json
  • 11:46 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1234.eqiad.wmnet with OS bullseye
  • 11:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P82472 and previous config saved to /var/cache/conftool/dbconfig/20250903-114500-fceratto.json
  • 11:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T401906)', diff saved to https://phabricator.wikimedia.org/P82471 and previous config saved to /var/cache/conftool/dbconfig/20250903-112952-fceratto.json
  • 11:29 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1234.eqiad.wmnet with reason: host reimage
  • 11:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T401906)', diff saved to https://phabricator.wikimedia.org/P82470 and previous config saved to /var/cache/conftool/dbconfig/20250903-112842-fceratto.json
  • 11:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 11:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T401906)', diff saved to https://phabricator.wikimedia.org/P82469 and previous config saved to /var/cache/conftool/dbconfig/20250903-112820-fceratto.json
  • 11:25 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:25 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1234.eqiad.wmnet with reason: host reimage
  • 11:25 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:24 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:24 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:21 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:21 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P82468 and previous config saved to /var/cache/conftool/dbconfig/20250903-111313-fceratto.json
  • 11:12 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:12 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:10 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:10 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:08 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:08 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:05 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:05 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:01 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1234.eqiad.wmnet with OS bullseye
  • 11:01 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1234.eqiad.wmnet with OS bullseye
  • 10:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P82467 and previous config saved to /var/cache/conftool/dbconfig/20250903-105805-fceratto.json
  • 10:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T401906)', diff saved to https://phabricator.wikimedia.org/P82466 and previous config saved to /var/cache/conftool/dbconfig/20250903-104257-fceratto.json
  • 10:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T401906)', diff saved to https://phabricator.wikimedia.org/P82465 and previous config saved to /var/cache/conftool/dbconfig/20250903-104047-fceratto.json
  • 10:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 10:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T401906)', diff saved to https://phabricator.wikimedia.org/P82464 and previous config saved to /var/cache/conftool/dbconfig/20250903-104023-fceratto.json
  • 10:28 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1012.eqiad.wmnet with reason: host reimage
  • 10:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P82463 and previous config saved to /var/cache/conftool/dbconfig/20250903-102516-fceratto.json
  • 10:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1012.eqiad.wmnet with reason: host reimage
  • 10:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1243 (T402925)', diff saved to https://phabricator.wikimedia.org/P82462 and previous config saved to /var/cache/conftool/dbconfig/20250903-101749-ladsgroup.json
  • 10:17 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 10:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T402925)', diff saved to https://phabricator.wikimedia.org/P82461 and previous config saved to /var/cache/conftool/dbconfig/20250903-101725-ladsgroup.json
  • 10:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P82460 and previous config saved to /var/cache/conftool/dbconfig/20250903-101008-fceratto.json
  • 10:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P82459 and previous config saved to /var/cache/conftool/dbconfig/20250903-100218-ladsgroup.json
  • 10:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS trixie
  • 09:58 jayme@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1002"
  • 09:58 jayme@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1002
  • 09:57 jayme@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1002
  • 09:57 jayme@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1002"
  • 09:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T401906)', diff saved to https://phabricator.wikimedia.org/P82458 and previous config saved to /var/cache/conftool/dbconfig/20250903-095501-fceratto.json
  • 09:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T401906)', diff saved to https://phabricator.wikimedia.org/P82457 and previous config saved to /var/cache/conftool/dbconfig/20250903-095251-fceratto.json
  • 09:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 09:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T401906)', diff saved to https://phabricator.wikimedia.org/P82456 and previous config saved to /var/cache/conftool/dbconfig/20250903-095228-fceratto.json
  • 09:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P82455 and previous config saved to /var/cache/conftool/dbconfig/20250903-094710-ladsgroup.json
  • 09:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P82454 and previous config saved to /var/cache/conftool/dbconfig/20250903-093720-fceratto.json
  • 09:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T402925)', diff saved to https://phabricator.wikimedia.org/P82453 and previous config saved to /var/cache/conftool/dbconfig/20250903-093202-ladsgroup.json
  • 09:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P82452 and previous config saved to /var/cache/conftool/dbconfig/20250903-092213-fceratto.json
  • 09:21 kartik@deploy1003: Finished scap sync-world: Backport for CX section positioning: Fix cxserver requests to include /v2 in the URL (T386131) (duration: 19m 26s)
  • 09:15 kartik@deploy1003: kartik: Continuing with sync
  • 09:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T401906)', diff saved to https://phabricator.wikimedia.org/P82451 and previous config saved to /var/cache/conftool/dbconfig/20250903-090705-fceratto.json
  • 09:05 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T401906)', diff saved to https://phabricator.wikimedia.org/P82450 and previous config saved to /var/cache/conftool/dbconfig/20250903-090556-fceratto.json
  • 09:05 kartik@deploy1003: kartik: Backport for CX section positioning: Fix cxserver requests to include /v2 in the URL (T386131) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 09:01 kartik@deploy1003: Started scap sync-world: Backport for CX section positioning: Fix cxserver requests to include /v2 in the URL (T386131)
  • 08:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:48 ayounsi@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host atlas3001.wikimedia.org
  • 08:48 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas3001.wikimedia.org on all recursors
  • 08:48 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache atlas3001.wikimedia.org on all recursors
  • 08:47 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:47 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM atlas3001.wikimedia.org - ayounsi@cumin1003"
  • 08:47 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM atlas3001.wikimedia.org - ayounsi@cumin1003"
  • 08:43 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 08:43 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas3001.wikimedia.org on all recursors
  • 08:43 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache atlas3001.wikimedia.org on all recursors
  • 08:43 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:43 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas3001.wikimedia.org - ayounsi@cumin1003"
  • 08:43 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas3001.wikimedia.org - ayounsi@cumin1003"
  • 08:42 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1234.eqiad.wmnet with OS bullseye
  • 08:42 kartik@deploy1003: Finished scap sync-world: Backport for ContentTranslation: Add cxserver host for server-side requests (T386131) (duration: 45m 46s)
  • 08:41 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 08:40 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from dumpsdata1005 to an-worker1234
  • 08:40 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1234
  • 08:39 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 08:39 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1234
  • 08:39 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-worker1234 on all recursors
  • 08:38 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-worker1234 on all recursors
  • 08:38 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:38 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming dumpsdata1005 to an-worker1234 - btullis@cumin1003"
  • 08:38 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming dumpsdata1005 to an-worker1234 - btullis@cumin1003"
  • 08:37 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:37 ayounsi@cumin1003: START - Cookbook sre.ganeti.makevm for new host atlas3001.wikimedia.org
  • 08:36 kartik@deploy1003: ngkountas, kartik: Continuing with sync
  • 08:35 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 08:34 btullis@cumin1003: START - Cookbook sre.hosts.rename from dumpsdata1005 to an-worker1234
  • 08:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from dumpsdata1004 to an-worker1233
  • 08:33 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1233
  • 08:32 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1233
  • 08:32 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-worker1233 on all recursors
  • 08:32 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-worker1233 on all recursors
  • 08:32 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 08:31 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:31 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update esams sandbox IPs to routed ganeti - ayounsi@cumin1003"
  • 08:31 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update esams sandbox IPs to routed ganeti - ayounsi@cumin1003"
  • 08:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 08:26 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 08:23 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 08:03 kartik@deploy1003: ngkountas, kartik: Backport for ContentTranslation: Add cxserver host for server-side requests (T386131) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:02 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2045']
  • 08:02 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2045']
  • 08:02 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cp2045']
  • 08:01 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cp2045']
  • 07:59 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:56 kartik@deploy1003: Started scap sync-world: Backport for ContentTranslation: Add cxserver host for server-side requests (T386131)
  • 07:54 kartik@deploy1003: Finished scap sync-world: Backport for CxServerClient: Log url instead of relative path upon failure (T386131) (duration: 16m 46s)
  • 07:49 kartik@deploy1003: kartik, sbisson: Continuing with sync
  • 07:44 kartik@deploy1003: kartik, sbisson: Backport for CxServerClient: Log url instead of relative path upon failure (T386131) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:38 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:37 kartik@deploy1003: Started scap sync-world: Backport for CxServerClient: Log url instead of relative path upon failure (T386131)
  • 07:37 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 07:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 07:35 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 07:30 kartik@deploy1003: Finished scap sync-world: Backport for Revert "UIC: Avoid fetching revisions from wikis to make list of active wikis" (duration: 11m 54s)
  • 07:24 kartik@deploy1003: kartik, mszwarc: Continuing with sync
  • 07:23 kartik@deploy1003: kartik, mszwarc: Backport for Revert "UIC: Avoid fetching revisions from wikis to make list of active wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:18 kartik@deploy1003: Started scap sync-world: Backport for Revert "UIC: Avoid fetching revisions from wikis to make list of active wikis"
  • 06:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 06:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1242 (T402925)', diff saved to https://phabricator.wikimedia.org/P82448 and previous config saved to /var/cache/conftool/dbconfig/20250903-064830-ladsgroup.json
  • 06:48 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 06:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T402925)', diff saved to https://phabricator.wikimedia.org/P82447 and previous config saved to /var/cache/conftool/dbconfig/20250903-064807-ladsgroup.json
  • 06:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P82446 and previous config saved to /var/cache/conftool/dbconfig/20250903-063259-ladsgroup.json
  • 06:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P82445 and previous config saved to /var/cache/conftool/dbconfig/20250903-061752-ladsgroup.json
  • 06:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T402925)', diff saved to https://phabricator.wikimedia.org/P82444 and previous config saved to /var/cache/conftool/dbconfig/20250903-060244-ladsgroup.json
  • 03:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1241 (T402925)', diff saved to https://phabricator.wikimedia.org/P82443 and previous config saved to /var/cache/conftool/dbconfig/20250903-032556-ladsgroup.json
  • 03:25 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T402925)', diff saved to https://phabricator.wikimedia.org/P82442 and previous config saved to /var/cache/conftool/dbconfig/20250903-032534-ladsgroup.json
  • 03:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P82441 and previous config saved to /var/cache/conftool/dbconfig/20250903-031026-ladsgroup.json
  • 02:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P82440 and previous config saved to /var/cache/conftool/dbconfig/20250903-025518-ladsgroup.json
  • 02:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T402925)', diff saved to https://phabricator.wikimedia.org/P82439 and previous config saved to /var/cache/conftool/dbconfig/20250903-024011-ladsgroup.json
  • 02:15 eileen: config revision changed from 58dce716 to 544c968d
  • 02:01 eileen: civicrm upgraded from 60b06928 to a8f49cc4
  • 01:47 eileen: civicrm upgraded from 0cf1ef03 to 60b06928
  • 01:28 eileen: config revision changed from 01b64dec to 58dce716
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 45s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1238 (T402925)', diff saved to https://phabricator.wikimedia.org/P82437 and previous config saved to /var/cache/conftool/dbconfig/20250903-000651-ladsgroup.json
  • 00:06 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T402925)', diff saved to https://phabricator.wikimedia.org/P82436 and previous config saved to /var/cache/conftool/dbconfig/20250903-000629-ladsgroup.json

2025-09-02

  • 23:53 eileen: config aebeab81 -> 58dce716
  • 23:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P82435 and previous config saved to /var/cache/conftool/dbconfig/20250902-235121-ladsgroup.json
  • 23:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P82434 and previous config saved to /var/cache/conftool/dbconfig/20250902-233615-ladsgroup.json
  • 23:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T402925)', diff saved to https://phabricator.wikimedia.org/P82433 and previous config saved to /var/cache/conftool/dbconfig/20250902-232107-ladsgroup.json
  • 22:06 Lucas_WMDE: UTC late backport+config window (belatedly) done
  • 22:06 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Revert "Set $wgPHPSessionHandling to 'disable' on group0 wikis" (T362324 T403519) (duration: 10m 03s)
  • 22:00 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister, lucaswerkmeister-wmde: Continuing with sync
  • 22:00 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister, lucaswerkmeister-wmde: Backport for Revert "Set $wgPHPSessionHandling to 'disable' on group0 wikis" (T362324 T403519) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:55 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Revert "Set $wgPHPSessionHandling to 'disable' on group0 wikis" (T362324 T403519)
  • 21:54 jdlrobson@deploy1003: Finished scap sync-world: Backport for Remove deprecated search config (T402208) (duration: 12m 06s)
  • 21:48 jdlrobson@deploy1003: jdlrobson, bwang: Continuing with sync
  • 21:48 jdlrobson@deploy1003: jdlrobson, bwang: Backport for Remove deprecated search config (T402208) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:42 jdlrobson@deploy1003: Started scap sync-world: Backport for Remove deprecated search config (T402208)
  • 21:31 jdrewniak@deploy1003: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 59s)
  • 21:29 jdrewniak@deploy1003: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 11m 18s)
  • 21:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Revert "Set $wgPHPSessionHandling to 'disable' on group1 wikis" (T362324 T403519) (duration: 13m 27s)
  • 21:09 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, lucaswerkmeister: Continuing with sync
  • 21:06 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, lucaswerkmeister: Backport for Revert "Set $wgPHPSessionHandling to 'disable' on group1 wikis" (T362324 T403519) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1221 (T402925)', diff saved to https://phabricator.wikimedia.org/P82430 and previous config saved to /var/cache/conftool/dbconfig/20250902-210259-ladsgroup.json
  • 21:02 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 21:02 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 21:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T402925)', diff saved to https://phabricator.wikimedia.org/P82429 and previous config saved to /var/cache/conftool/dbconfig/20250902-210229-ladsgroup.json
  • 21:01 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Revert "Set $wgPHPSessionHandling to 'disable' on group1 wikis" (T362324 T403519)
  • 20:58 kemayo@deploy1003: Finished scap sync-world: Backport for Restore ext.visualEditor.track module (T403127) (duration: 12m 20s)
  • 20:52 kemayo@deploy1003: jdlrobson, kemayo: Continuing with sync
  • 20:51 kemayo@deploy1003: jdlrobson, kemayo: Backport for Restore ext.visualEditor.track module (T403127) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P82428 and previous config saved to /var/cache/conftool/dbconfig/20250902-204722-ladsgroup.json
  • 20:45 kemayo@deploy1003: Started scap sync-world: Backport for Restore ext.visualEditor.track module (T403127)
  • 20:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P82427 and previous config saved to /var/cache/conftool/dbconfig/20250902-203212-ladsgroup.json
  • 20:31 kemayo@deploy1003: Finished scap sync-world: Backport for Edit check: set up the tone check a/b test (T389231 T402195), Edit check: log to VEFU if a tone check would have been shown if not for the a/b test (T394952) (duration: 12m 34s)
  • 20:25 kemayo@deploy1003: kemayo: Continuing with sync
  • 20:24 kemayo@deploy1003: kemayo: Backport for Edit check: set up the tone check a/b test (T389231 T402195), Edit check: log to VEFU if a tone check would have been shown if not for the a/b test (T394952) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:18 kemayo@deploy1003: Started scap sync-world: Backport for Edit check: set up the tone check a/b test (T389231 T402195), Edit check: log to VEFU if a tone check would have been shown if not for the a/b test (T394952)
  • 20:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T402925)', diff saved to https://phabricator.wikimedia.org/P82426 and previous config saved to /var/cache/conftool/dbconfig/20250902-201705-ladsgroup.json
  • 20:15 dani@deploy1003: Finished scap sync-world: Backport for Pre-deploy Newcomers survey on enwiki (T402915) (duration: 11m 53s)
  • 20:09 dani@deploy1003: dani: Continuing with sync
  • 20:09 dani@deploy1003: dani: Backport for Pre-deploy Newcomers survey on enwiki (T402915) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:06 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:03 dani@deploy1003: Started scap sync-world: Backport for Pre-deploy Newcomers survey on enwiki (T402915)
  • 20:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:01 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 19:49 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 19:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 19:33 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 19:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:32 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:32 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:32 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:31 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 19:24 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 18:47 ladsgroup@deploy1003: Finished scap sync-world: Backport for Stop writing to categorylinks old in enwiki (T399579) (duration: 11m 57s)
  • 18:41 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 18:39 ladsgroup@deploy1003: ladsgroup: Backport for Stop writing to categorylinks old in enwiki (T399579) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:35 ladsgroup@deploy1003: Started scap sync-world: Backport for Stop writing to categorylinks old in enwiki (T399579)
  • 18:15 dancy@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.17 refs T396378
  • 17:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:28 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti3005']
  • 17:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1199 (T402925)', diff saved to https://phabricator.wikimedia.org/P82423 and previous config saved to /var/cache/conftool/dbconfig/20250902-172741-ladsgroup.json
  • 17:27 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T402925)', diff saved to https://phabricator.wikimedia.org/P82422 and previous config saved to /var/cache/conftool/dbconfig/20250902-172718-ladsgroup.json
  • 17:15 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3005']
  • 17:14 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ganeti3005']
  • 17:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti3005']
  • 17:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P82421 and previous config saved to /var/cache/conftool/dbconfig/20250902-171210-ladsgroup.json
  • 16:57 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host maps2011.codfw.wmnet with OS bookworm
  • 16:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P82420 and previous config saved to /var/cache/conftool/dbconfig/20250902-165702-ladsgroup.json
  • 16:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T401906)', diff saved to https://phabricator.wikimedia.org/P82419 and previous config saved to /var/cache/conftool/dbconfig/20250902-164727-fceratto.json
  • 16:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T402925)', diff saved to https://phabricator.wikimedia.org/P82418 and previous config saved to /var/cache/conftool/dbconfig/20250902-164155-ladsgroup.json
  • 16:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2011.codfw.wmnet with reason: host reimage
  • 16:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2011.codfw.wmnet with reason: host reimage
  • 16:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P82417 and previous config saved to /var/cache/conftool/dbconfig/20250902-163219-fceratto.json
  • 16:29 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cp2042.codfw.wmnet
  • 16:29 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2042.codfw.wmnet
  • 16:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P82416 and previous config saved to /var/cache/conftool/dbconfig/20250902-161711-fceratto.json
  • 16:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps2011.codfw.wmnet with OS bookworm
  • 16:04 ladsgroup@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1163 gradually with 4 steps - Maint over
  • 16:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T401906)', diff saved to https://phabricator.wikimedia.org/P82414 and previous config saved to /var/cache/conftool/dbconfig/20250902-160204-fceratto.json
  • 15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2222 (T401906)', diff saved to https://phabricator.wikimedia.org/P82413 and previous config saved to /var/cache/conftool/dbconfig/20250902-155942-fceratto.json
  • 15:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2222.codfw.wmnet with reason: Maintenance
  • 15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T401906)', diff saved to https://phabricator.wikimedia.org/P82412 and previous config saved to /var/cache/conftool/dbconfig/20250902-155918-fceratto.json
  • 15:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2011.codfw.wmnet with OS bookworm
  • 15:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P82410 and previous config saved to /var/cache/conftool/dbconfig/20250902-154409-fceratto.json
  • 15:42 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp
  • 15:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2011.codfw.wmnet with reason: host reimage
  • 15:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2011.codfw.wmnet with reason: host reimage
  • 15:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P82408 and previous config saved to /var/cache/conftool/dbconfig/20250902-152902-fceratto.json
  • 15:22 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Add the CheckUserMatchSuggestedInvestigationsSignalAgainstUser hook (T403111) (duration: 10m 25s)
  • 15:19 ladsgroup@cumin1003: START - Cookbook sre.mysql.pool db1163 gradually with 4 steps - Maint over
  • 15:16 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 15:16 dreamyjazz@deploy1003: dreamyjazz: Backport for Add the CheckUserMatchSuggestedInvestigationsSignalAgainstUser hook (T403111) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T401906)', diff saved to https://phabricator.wikimedia.org/P82406 and previous config saved to /var/cache/conftool/dbconfig/20250902-151354-fceratto.json
  • 15:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps2011.codfw.wmnet with OS bookworm
  • 15:12 dreamyjazz@deploy1003: Started scap sync-world: Backport for Add the CheckUserMatchSuggestedInvestigationsSignalAgainstUser hook (T403111)
  • 15:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2221 (T401906)', diff saved to https://phabricator.wikimedia.org/P82405 and previous config saved to /var/cache/conftool/dbconfig/20250902-151131-fceratto.json
  • 15:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2221.codfw.wmnet with reason: Maintenance
  • 15:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T401906)', diff saved to https://phabricator.wikimedia.org/P82404 and previous config saved to /var/cache/conftool/dbconfig/20250902-151108-fceratto.json
  • 15:09 kartik@deploy1003: Finished scap sync-world: Backport for Revert "ContentTranslation: Add cxserver host for server-side requests" (duration: 12m 42s)
  • 15:08 brennen@deploy1003: Finished deploy [phabricator/deployment@6e0b4b1]: deploy phab1004 for T403494 (duration: 00m 43s)
  • 15:07 brennen@deploy1003: Started deploy [phabricator/deployment@6e0b4b1]: deploy phab1004 for T403494
  • 15:07 brennen@deploy1003: Finished deploy [phabricator/deployment@6e0b4b1]: deploy phab2002 for T403494 (duration: 00m 43s)
  • 15:06 brennen@deploy1003: Started deploy [phabricator/deployment@6e0b4b1]: deploy phab2002 for T403494
  • 15:04 kartik@deploy1003: kartik: Continuing with sync
  • 15:03 kartik@deploy1003: kartik: Backport for Revert "ContentTranslation: Add cxserver host for server-side requests" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:56 kartik@deploy1003: Started scap sync-world: Backport for Revert "ContentTranslation: Add cxserver host for server-side requests"
  • 14:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P82403 and previous config saved to /var/cache/conftool/dbconfig/20250902-145601-fceratto.json
  • 14:53 kartik@deploy1003: Sync cancelled.
  • 14:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P82402 and previous config saved to /var/cache/conftool/dbconfig/20250902-144053-fceratto.json
  • 14:26 XioNoX: codfw: remove lvs static routes - T300877
  • 14:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T401906)', diff saved to https://phabricator.wikimedia.org/P82401 and previous config saved to /var/cache/conftool/dbconfig/20250902-142545-fceratto.json
  • 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T401906)', diff saved to https://phabricator.wikimedia.org/P82400 and previous config saved to /var/cache/conftool/dbconfig/20250902-142322-fceratto.json
  • 14:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T401906)', diff saved to https://phabricator.wikimedia.org/P82399 and previous config saved to /var/cache/conftool/dbconfig/20250902-142259-fceratto.json
  • 14:15 XioNoX: ulsfo: remove lvs static routes - T300877
  • 14:09 kartik@deploy1003: kartik, ngkountas: Backport for ContentTranslation: Add cxserver host for server-side requests (T386131) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:09 XioNoX: eqsin: remove lvs static routes - T300877
  • 14:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P82398 and previous config saved to /var/cache/conftool/dbconfig/20250902-140751-fceratto.json
  • 14:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:02 kartik@deploy1003: Started scap sync-world: Backport for ContentTranslation: Add cxserver host for server-side requests (T386131)
  • 13:58 kartik@deploy1003: Finished scap sync-world: Backport for idwiki: Add extended confirmed usergroup & restriction level (T402755) (duration: 12m 51s)
  • 13:57 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4001.ulsfo.wmnet with OS trixie
  • 13:55 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy2003.codfw.wmnet with OS bookworm
  • 13:55 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 13:55 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 13:52 kartik@deploy1003: kartik, anzx: Continuing with sync
  • 13:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P82397 and previous config saved to /var/cache/conftool/dbconfig/20250902-135243-fceratto.json
  • 13:52 kartik@deploy1003: kartik, anzx: Backport for idwiki: Add extended confirmed usergroup & restriction level (T402755) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:50 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 13:50 btullis@cumin1003: START - Cookbook sre.hosts.rename from dumpsdata1004 to an-worker1233
  • 13:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1190 (T402925)', diff saved to https://phabricator.wikimedia.org/P82396 and previous config saved to /var/cache/conftool/dbconfig/20250902-134854-ladsgroup.json
  • 13:48 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3005.wikimedia.org
  • 13:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh3005.wikimedia.org with OS bookworm
  • 13:46 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 13:45 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum2001.codfw.wmnet with OS trixie
  • 13:45 kartik@deploy1003: Started scap sync-world: Backport for idwiki: Add extended confirmed usergroup & restriction level (T402755)
  • 13:42 kartik@deploy1003: Finished scap sync-world: Backport for Set $wgPHPSessionHandling to 'disable' on group1 wikis (T362324) (duration: 17m 28s)
  • 13:40 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 13:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T401906)', diff saved to https://phabricator.wikimedia.org/P82395 and previous config saved to /var/cache/conftool/dbconfig/20250902-133736-fceratto.json
  • 13:36 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy2003.codfw.wmnet with reason: host reimage
  • 13:36 kartik@deploy1003: hokwelum, kartik: Continuing with sync
  • 13:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T401906)', diff saved to https://phabricator.wikimedia.org/P82394 and previous config saved to /var/cache/conftool/dbconfig/20250902-133513-fceratto.json
  • 13:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P82392 and previous config saved to /var/cache/conftool/dbconfig/20250902-131845-fceratto.json
  • 13:16 stran@deploy1003: tchanders, stran: Continuing with sync
  • 13:11 stran@deploy1003: tchanders, stran: Backport for Document that IP reveal permissions can't just be reassigned (T396217), Enable temporary accounts on remaining small-sized projects (T402181) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:09 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum4001.ulsfo.wmnet with OS trixie
  • 13:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host doh3005.wikimedia.org with OS bookworm
  • 13:07 Emperor: install libpython3.9-dbg python3.9-dbg on ms-fe2016 for debugging
  • 13:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3005.wikimedia.org - jmm@cumin2002"
  • 13:06 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum2001.codfw.wmnet with OS trixie
  • 13:06 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh3005.wikimedia.org - jmm@cumin2002"
  • 13:05 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host deploy2003.codfw.wmnet with OS bookworm
  • 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh3005.wikimedia.org on all recursors
  • 13:05 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache doh3005.wikimedia.org on all recursors
  • 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3005.wikimedia.org - jmm@cumin2002"
  • 13:05 stran@deploy1003: Started scap sync-world: Backport for Document that IP reveal permissions can't just be reassigned (T396217), Enable temporary accounts on remaining small-sized projects (T402181)
  • 13:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh3005.wikimedia.org - jmm@cumin2002"
  • 13:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P82391 and previous config saved to /var/cache/conftool/dbconfig/20250902-130338-fceratto.json
  • 13:00 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:00 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh3005.wikimedia.org
  • 12:58 jmm@puppetserver1001: conftool action : set/pooled=no; selector: name=ncredir3003.esams.wmnet
  • 12:51 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1163.eqiad.wmnet with reason: Old primary of s1
  • 12:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T401906)', diff saved to https://phabricator.wikimedia.org/P82390 and previous config saved to /var/cache/conftool/dbconfig/20250902-124830-fceratto.json
  • 12:46 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T401906)', diff saved to https://phabricator.wikimedia.org/P82389 and previous config saved to /var/cache/conftool/dbconfig/20250902-124608-fceratto.json
  • 12:46 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 12:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T401906)', diff saved to https://phabricator.wikimedia.org/P82388 and previous config saved to /var/cache/conftool/dbconfig/20250902-124545-fceratto.json
  • 12:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:44 btullis@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts an-mariadb1001.eqiad.wmnet
  • 12:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1001.eqiad.wmnet
  • 12:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1163.eqiad.wmnet
  • 12:40 sukhe@cumin1003: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp
  • 12:35 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-mariadb1001.eqiad.wmnet
  • 12:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1163 - Upgrading db1163.eqiad.wmnet
  • 12:31 ladsgroup@cumin1002: START - Cookbook sre.mysql.depool db1163 - Upgrading db1163.eqiad.wmnet
  • 12:31 ladsgroup@cumin1002: START - Cookbook sre.mysql.upgrade for db1163.eqiad.wmnet
  • 12:31 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-mariadb1001.eqiad.wmnet
  • 12:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P82387 and previous config saved to /var/cache/conftool/dbconfig/20250902-123038-fceratto.json
  • 12:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum3005.esams.wmnet
  • 12:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum3005.esams.wmnet with OS bookworm
  • 12:28 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1163.eqiad.wmnet with reason: Old primary of s1
  • 12:25 jmm@puppetserver1001: conftool action : set/pooled=yes; selector: name=ncredir3005.esams.wmnet
  • 12:24 jmm@puppetserver1001: conftool action : set/weight=1; selector: name=ncredir3005.esams.wmnet
  • 12:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool db1163 T402870', diff saved to https://phabricator.wikimedia.org/P82385 and previous config saved to /var/cache/conftool/dbconfig/20250902-122310-ladsgroup.json
  • 12:21 ladsgroup@dns1004: END - running authdns-update
  • 12:20 ladsgroup@dns1004: START - running authdns-update
  • 12:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Promote db1184 to s1 primary and set section read-write T402870', diff saved to https://phabricator.wikimedia.org/P82384 and previous config saved to /var/cache/conftool/dbconfig/20250902-121814-ladsgroup.json
  • 12:16 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 12:15 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 12:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T402870', diff saved to https://phabricator.wikimedia.org/P82383 and previous config saved to /var/cache/conftool/dbconfig/20250902-121548-ladsgroup.json
  • 12:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P82382 and previous config saved to /var/cache/conftool/dbconfig/20250902-121531-fceratto.json
  • 12:15 Amir1: Starting s1 eqiad failover from db1163 to db1184 - T402870
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3005.esams.wmnet with reason: host reimage
  • 12:11 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 12:10 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 12:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3005.esams.wmnet with reason: host reimage
  • 12:08 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 12:02 gkyziridis@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:01 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 12:01 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 12:00 gkyziridis@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 12:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T401906)', diff saved to https://phabricator.wikimedia.org/P82381 and previous config saved to /var/cache/conftool/dbconfig/20250902-120020-fceratto.json
  • 11:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T401906)', diff saved to https://phabricator.wikimedia.org/P82380 and previous config saved to /var/cache/conftool/dbconfig/20250902-115754-fceratto.json
  • 11:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 11:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T401906)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250902-115727-fceratto.json
  • 11:48 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum3005.esams.wmnet with OS bookworm
  • 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum3005.esams.wmnet - jmm@cumin2002"
  • 11:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum3005.esams.wmnet - jmm@cumin2002"
  • 11:44 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum3005.esams.wmnet on all recursors
  • 11:44 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum3005.esams.wmnet on all recursors
  • 11:44 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum3005.esams.wmnet - jmm@cumin2002"
  • 11:44 ladsgroup@cumin1003: dbctl commit (dc=all): 'Set db1184 with weight 0 T402870', diff saved to https://phabricator.wikimedia.org/P82377 and previous config saved to /var/cache/conftool/dbconfig/20250902-114408-ladsgroup.json
  • 11:43 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s1 T402870
  • 11:43 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:43 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P82376 and previous config saved to /var/cache/conftool/dbconfig/20250902-114219-fceratto.json
  • 11:42 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum3005.esams.wmnet - jmm@cumin2002"
  • 11:38 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum3005.esams.wmnet
  • 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir3005.esams.wmnet
  • 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir3005.esams.wmnet with OS bookworm
  • 11:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P82375 and previous config saved to /var/cache/conftool/dbconfig/20250902-112711-fceratto.json
  • 11:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3005.esams.wmnet with reason: host reimage
  • 11:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T401906)', diff saved to https://phabricator.wikimedia.org/P82374 and previous config saved to /var/cache/conftool/dbconfig/20250902-111203-fceratto.json
  • 11:09 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T401906)', diff saved to https://phabricator.wikimedia.org/P82373 and previous config saved to /var/cache/conftool/dbconfig/20250902-110942-fceratto.json
  • 11:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T401906)', diff saved to https://phabricator.wikimedia.org/P82372 and previous config saved to /var/cache/conftool/dbconfig/20250902-110919-fceratto.json
  • 11:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3005.esams.wmnet with reason: host reimage
  • 11:04 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P82371 and previous config saved to /var/cache/conftool/dbconfig/20250902-105411-fceratto.json
  • 10:46 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir3005.esams.wmnet with OS bookworm
  • 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 10:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir3005.esams.wmnet on all recursors
  • 10:45 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir3005.esams.wmnet on all recursors
  • 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 10:44 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 10:40 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:40 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ncredir3005.esams.wmnet
  • 10:40 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir3005.esams.wmnet
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir3005.esams.wmnet on all recursors
  • 10:40 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir3005.esams.wmnet on all recursors
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 10:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 10:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P82370 and previous config saved to /var/cache/conftool/dbconfig/20250902-103901-fceratto.json
  • 10:38 btullis@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 10:38 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 10:37 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:37 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:36 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:36 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:36 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:36 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:35 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1083.eqiad.wmnet with OS bullseye
  • 10:35 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir3005.esams.wmnet on all recursors
  • 10:35 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir3005.esams.wmnet on all recursors
  • 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 10:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 10:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:31 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ncredir3005.esams.wmnet
  • 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
  • 10:29 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 10:28 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 10:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 10:24 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 10:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T401906)', diff saved to https://phabricator.wikimedia.org/P82369 and previous config saved to /var/cache/conftool/dbconfig/20250902-102353-fceratto.json
  • 10:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 10:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
  • 10:21 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 10:21 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T401906)', diff saved to https://phabricator.wikimedia.org/P82368 and previous config saved to /var/cache/conftool/dbconfig/20250902-102130-fceratto.json
  • 10:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 10:19 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 10:19 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1083.eqiad.wmnet with reason: host reimage
  • 10:19 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 10:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 10:16 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 10:15 mvernon@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1083.eqiad.wmnet with reason: host reimage
  • 10:14 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 09:38 btullis@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 09:33 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1085.eqiad.wmnet with OS bullseye
  • 09:31 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1084.eqiad.wmnet with OS bullseye
  • 09:29 btullis@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 09:29 btullis@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 09:28 btullis@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 09:27 btullis@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 09:27 btullis@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 09:27 btullis@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 09:26 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 09:24 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1083.eqiad.wmnet with OS bullseye
  • 09:23 mvernon@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ms-be1083.eqiad.wmnet with OS bullseye
  • 09:23 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir3005.esams.wmnet
  • 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir3005.esams.wmnet on all recursors
  • 09:23 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir3005.esams.wmnet on all recursors
  • 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 09:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 09:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir3005.esams.wmnet on all recursors
  • 09:19 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir3005.esams.wmnet on all recursors
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 09:19 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 09:17 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1085.eqiad.wmnet with reason: host reimage
  • 09:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:15 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ncredir3005.esams.wmnet
  • 09:14 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir3005.esams.wmnet
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir3005.esams.wmnet on all recursors
  • 09:14 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir3005.esams.wmnet on all recursors
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 09:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 09:14 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1084.eqiad.wmnet with reason: host reimage
  • 09:13 mvernon@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1085.eqiad.wmnet with reason: host reimage
  • 09:10 mvernon@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1084.eqiad.wmnet with reason: host reimage
  • 09:10 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir3005.esams.wmnet on all recursors
  • 09:10 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir3005.esams.wmnet on all recursors
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 09:10 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir3005.esams.wmnet - jmm@cumin2002"
  • 09:06 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:06 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ncredir3005.esams.wmnet
  • 08:54 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 08:50 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1085.eqiad.wmnet with OS bullseye
  • 08:47 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1084.eqiad.wmnet with OS bullseye
  • 08:47 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1083.eqiad.wmnet with OS bullseye
  • 07:54 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 07:41 kartik@deploy1003: Finished scap sync-world: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496) (duration: 14m 12s)
  • 07:36 kartik@deploy1003: hueitan, kartik: Continuing with sync
  • 07:33 kartik@deploy1003: hueitan, kartik: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:27 kartik@deploy1003: Started scap sync-world: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496)
  • 07:20 kartik@deploy1003: Finished scap sync-world: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496) (duration: 16m 43s)
  • 07:19 moritzm: create ganeti03 cluster T402259
  • 07:13 kartik@deploy1003: hueitan, kartik: Continuing with sync
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
  • 07:10 kartik@deploy1003: hueitan, kartik: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:04 kartik@deploy1003: Started scap sync-world: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496)
  • 07:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
  • 06:47 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:47 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new VIP for esams03 - jmm@cumin2002"
  • 06:46 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new VIP for esams03 - jmm@cumin2002"
  • 06:42 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 06:10 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone_es (exit_code=0) of es2026.codfw.wmnet onto es2049.codfw.wmnet
  • 06:10 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2026 gradually with 4 steps - Pool es2026.codfw.wmnet in after cloning
  • 05:25 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2026 gradually with 4 steps - Pool es2026.codfw.wmnet in after cloning
  • 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.14 (duration: 01m 04s)
  • 03:47 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.17 refs T396378 (duration: 43m 50s)
  • 03:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2231 (T403362)', diff saved to https://phabricator.wikimedia.org/P82361 and previous config saved to /var/cache/conftool/dbconfig/20250902-034226-ladsgroup.json
  • 03:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2231', diff saved to https://phabricator.wikimedia.org/P82360 and previous config saved to /var/cache/conftool/dbconfig/20250902-032719-ladsgroup.json
  • 03:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2231', diff saved to https://phabricator.wikimedia.org/P82359 and previous config saved to /var/cache/conftool/dbconfig/20250902-031211-ladsgroup.json
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.17 refs T396378
  • 02:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2231 (T403362)', diff saved to https://phabricator.wikimedia.org/P82358 and previous config saved to /var/cache/conftool/dbconfig/20250902-025704-ladsgroup.json
  • 01:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2231 (T403362)', diff saved to https://phabricator.wikimedia.org/P82357 and previous config saved to /var/cache/conftool/dbconfig/20250902-015927-ladsgroup.json
  • 01:59 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2231.codfw.wmnet with reason: Maintenance
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 55s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:41 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2201.codfw.wmnet with reason: Maintenance

2025-09-01

  • 23:23 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 23:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2196 (T403362)', diff saved to https://phabricator.wikimedia.org/P82356 and previous config saved to /var/cache/conftool/dbconfig/20250901-232330-ladsgroup.json
  • 23:17 eileen: config revision changed from 01b64dec to aebeab81 - jobs running again
  • 23:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2196', diff saved to https://phabricator.wikimedia.org/P82355 and previous config saved to /var/cache/conftool/dbconfig/20250901-230822-ladsgroup.json
  • 22:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2196', diff saved to https://phabricator.wikimedia.org/P82354 and previous config saved to /var/cache/conftool/dbconfig/20250901-225314-ladsgroup.json
  • 22:51 eileen: config revision changed from d10b733d to 01b64dec
  • 22:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2196 (T403362)', diff saved to https://phabricator.wikimedia.org/P82353 and previous config saved to /var/cache/conftool/dbconfig/20250901-223807-ladsgroup.json
  • 21:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2196 (T403362)', diff saved to https://phabricator.wikimedia.org/P82352 and previous config saved to /var/cache/conftool/dbconfig/20250901-214057-ladsgroup.json
  • 21:40 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2196.codfw.wmnet with reason: Maintenance
  • 21:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2191 (T403362)', diff saved to https://phabricator.wikimedia.org/P82351 and previous config saved to /var/cache/conftool/dbconfig/20250901-214034-ladsgroup.json
  • 21:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P82350 and previous config saved to /var/cache/conftool/dbconfig/20250901-212526-ladsgroup.json
  • 21:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P82348 and previous config saved to /var/cache/conftool/dbconfig/20250901-211019-ladsgroup.json
  • 20:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2191 (T403362)', diff saved to https://phabricator.wikimedia.org/P82347 and previous config saved to /var/cache/conftool/dbconfig/20250901-205511-ladsgroup.json
  • 20:32 jmm@cumin2002: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 20:31 jmm@cumin2002: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 19:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2191 (T403362)', diff saved to https://phabricator.wikimedia.org/P82346 and previous config saved to /var/cache/conftool/dbconfig/20250901-195545-ladsgroup.json
  • 19:55 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2191.codfw.wmnet with reason: Maintenance
  • 19:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2186 (T403362)', diff saved to https://phabricator.wikimedia.org/P82345 and previous config saved to /var/cache/conftool/dbconfig/20250901-195522-ladsgroup.json
  • 19:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2186', diff saved to https://phabricator.wikimedia.org/P82342 and previous config saved to /var/cache/conftool/dbconfig/20250901-194014-ladsgroup.json
  • 19:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2186', diff saved to https://phabricator.wikimedia.org/P82341 and previous config saved to /var/cache/conftool/dbconfig/20250901-192507-ladsgroup.json
  • 19:23 XioNoX: cr1-esams> request chassis fpc slot 1 offline - T403360
  • 19:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2186 (T403362)', diff saved to https://phabricator.wikimedia.org/P82340 and previous config saved to /var/cache/conftool/dbconfig/20250901-190959-ladsgroup.json
  • 18:26 damilare: SmashPig upgraded from aa4ef732 to 6031b3c4
  • 18:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2186 (T403362)', diff saved to https://phabricator.wikimedia.org/P82339 and previous config saved to /var/cache/conftool/dbconfig/20250901-180958-ladsgroup.json
  • 18:09 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:59 jmm@cumin2002: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 17:59 jmm@cumin2002: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 16:26 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1237 (T403362)', diff saved to https://phabricator.wikimedia.org/P82338 and previous config saved to /var/cache/conftool/dbconfig/20250901-162552-ladsgroup.json
  • 16:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 16:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P82337 and previous config saved to /var/cache/conftool/dbconfig/20250901-161043-ladsgroup.json
  • 15:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P82336 and previous config saved to /var/cache/conftool/dbconfig/20250901-155535-ladsgroup.json
  • 15:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 15:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T401906)', diff saved to https://phabricator.wikimedia.org/P82335 and previous config saved to /var/cache/conftool/dbconfig/20250901-154111-fceratto.json
  • 15:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1237 (T403362)', diff saved to https://phabricator.wikimedia.org/P82334 and previous config saved to /var/cache/conftool/dbconfig/20250901-154028-ladsgroup.json
  • 15:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P82333 and previous config saved to /var/cache/conftool/dbconfig/20250901-152603-fceratto.json
  • 15:19 moritzm: installing luajit security updates
  • 15:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1237 (T403362)', diff saved to https://phabricator.wikimedia.org/P82332 and previous config saved to /var/cache/conftool/dbconfig/20250901-151757-ladsgroup.json
  • 15:17 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: Maintenance
  • 15:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P82327 and previous config saved to /var/cache/conftool/dbconfig/20250901-151056-fceratto.json
  • 14:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T401906)', diff saved to https://phabricator.wikimedia.org/P82326 and previous config saved to /var/cache/conftool/dbconfig/20250901-145548-fceratto.json
  • 14:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2227 (T401906)', diff saved to https://phabricator.wikimedia.org/P82325 and previous config saved to /var/cache/conftool/dbconfig/20250901-144211-fceratto.json
  • 14:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2227.codfw.wmnet with reason: Maintenance
  • 14:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T401906)', diff saved to https://phabricator.wikimedia.org/P82324 and previous config saved to /var/cache/conftool/dbconfig/20250901-144148-fceratto.json
  • 14:38 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --fix --since=20250101000000 # T313900
  • 14:32 logmsgbot: dreamyjazz Deployed security patch for T403289
  • 14:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P82323 and previous config saved to /var/cache/conftool/dbconfig/20250901-142641-fceratto.json
  • 14:20 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --fix --since=20240101000000 --until=20250101000000 # T313900
  • 14:18 damilare: civicrm upgraded from ddac1aee to cac8d439
  • 14:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P82322 and previous config saved to /var/cache/conftool/dbconfig/20250901-141133-fceratto.json
  • 14:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3006.esams.wmnet with OS bookworm
  • 14:02 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --fix --since=20230101000000 --until=20240101000000 # T313900
  • 13:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T401906)', diff saved to https://phabricator.wikimedia.org/P82321 and previous config saved to /var/cache/conftool/dbconfig/20250901-135626-fceratto.json
  • 13:53 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 13:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3006.esams.wmnet with reason: host reimage
  • 13:47 jmm@cumin2002: START - Cookbook sre.postgresql.postgres-init
  • 13:45 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --fix --since=20220310000000 --until=20230101000000 # T313900
  • 13:44 jmm@cumin2002: START - Cookbook sre.postgresql.postgres-init
  • 13:44 jmm@cumin2002: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 13:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3006.esams.wmnet with reason: host reimage
  • 13:43 jmm@cumin2002: START - Cookbook sre.postgresql.postgres-init
  • 13:42 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: foreachwikiindblist sul CentralAuth:FixRenameUserLocalLogs --logwiki=metawiki # T398177 (dry run)
  • 13:42 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Add caller to maintenance script SQL queries (T313900 T398177 T403387), FixRenameUserLocalLogs: Batch more queries to speed up the script (T398177), FixRenameUserLocalLogs: Skip rows where the performer is 'Global rename script' (T398177) (duration: 11m 48s)
  • 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T401906)', diff saved to https://phabricator.wikimedia.org/P82320 and previous config saved to /var/cache/conftool/dbconfig/20250901-134148-fceratto.json
  • 13:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T401906)', diff saved to https://phabricator.wikimedia.org/P82319 and previous config saved to /var/cache/conftool/dbconfig/20250901-134125-fceratto.json
  • 13:37 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, matmarex, d3r1ck01: Continuing with sync
  • 13:36 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, matmarex, d3r1ck01: Backport for Add caller to maintenance script SQL queries (T313900 T398177 T403387), FixRenameUserLocalLogs: Batch more queries to speed up the script (T398177), FixRenameUserLocalLogs: Skip rows where the performer is 'Global rename script' (T398177) synced to the testservers (see http
  • 13:33 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2026 - Depool es2026.codfw.wmnet to then clone it to es2049.codfw.wmnet - fceratto@cumin1002
  • 13:33 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 13:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T403362)', diff saved to https://phabricator.wikimedia.org/P82317 and previous config saved to /var/cache/conftool/dbconfig/20250901-133314-ladsgroup.json
  • 13:33 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2026 - Depool es2026.codfw.wmnet to then clone it to es2049.codfw.wmnet - fceratto@cumin1002
  • 13:33 fceratto@cumin1002: START - Cookbook sre.mysql.clone_es of es2026.codfw.wmnet onto es2049.codfw.wmnet
  • 13:30 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Add caller to maintenance script SQL queries (T313900 T398177 T403387), FixRenameUserLocalLogs: Batch more queries to speed up the script (T398177), FixRenameUserLocalLogs: Skip rows where the performer is 'Global rename script' (T398177)
  • 13:30 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496), Enable electionclerk user group on fawiki (T396347) (duration: 14m 53s)
  • 13:27 tappof: Add 15G to prometheus-k8s-dse lv
  • 13:26 damilare: civicrm upgraded from ddac1aee to 88f17089
  • 13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P82315 and previous config saved to /var/cache/conftool/dbconfig/20250901-132617-fceratto.json
  • 13:24 lucaswerkmeister-wmde@deploy1003: huji, hueitan, lucaswerkmeister-wmde: Continuing with sync
  • 13:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3006.esams.wmnet with OS bookworm
  • 13:20 lucaswerkmeister-wmde@deploy1003: huji, hueitan, lucaswerkmeister-wmde: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496), Enable electionclerk user group on fawiki (T396347) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P82314 and previous config saved to /var/cache/conftool/dbconfig/20250901-131807-ladsgroup.json
  • 13:15 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496), Enable electionclerk user group on fawiki (T396347)
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 13:13 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for SUL3: Use `metawiki` as central wiki (T402527) (duration: 09m 36s)
  • 13:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P82313 and previous config saved to /var/cache/conftool/dbconfig/20250901-131110-fceratto.json
  • 13:08 lucaswerkmeister-wmde@deploy1003: d3r1ck01, lucaswerkmeister-wmde: Continuing with sync
  • 13:07 lucaswerkmeister-wmde@deploy1003: d3r1ck01, lucaswerkmeister-wmde: Backport for SUL3: Use `metawiki` as central wiki (T402527) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti3006.esams.wmnet
  • 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
  • 13:03 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for SUL3: Use `metawiki` as central wiki (T402527)
  • 13:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P82312 and previous config saved to /var/cache/conftool/dbconfig/20250901-130259-ladsgroup.json
  • 12:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T401906)', diff saved to https://phabricator.wikimedia.org/P82311 and previous config saved to /var/cache/conftool/dbconfig/20250901-125602-fceratto.json
  • 12:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
  • 12:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T403362)', diff saved to https://phabricator.wikimedia.org/P82310 and previous config saved to /var/cache/conftool/dbconfig/20250901-124751-ladsgroup.json
  • 12:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T401906)', diff saved to https://phabricator.wikimedia.org/P82309 and previous config saved to /var/cache/conftool/dbconfig/20250901-124223-fceratto.json
  • 12:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 12:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T401906)', diff saved to https://phabricator.wikimedia.org/P82308 and previous config saved to /var/cache/conftool/dbconfig/20250901-124211-fceratto.json
  • 12:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P82307 and previous config saved to /var/cache/conftool/dbconfig/20250901-122704-fceratto.json
  • 12:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti3006.esams.wmnet
  • 12:23 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ganeti3006.esams.wmnet
  • 12:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti3006.esams.wmnet
  • 12:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P82306 and previous config saved to /var/cache/conftool/dbconfig/20250901-121156-fceratto.json
  • 11:59 ladsgroup@deploy1003: Finished scap sync-world: Backport for Stop writing to cl_to and cl_collation on commonswiki (T399579) (duration: 12m 15s)
  • 11:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T401906)', diff saved to https://phabricator.wikimedia.org/P82305 and previous config saved to /var/cache/conftool/dbconfig/20250901-115649-fceratto.json
  • 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus3003.esams.wmnet to plain
  • 11:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus3003.esams.wmnet to plain
  • 11:54 ladsgroup@deploy1003: ladsgroup, zabe: Continuing with sync
  • 11:53 ladsgroup@deploy1003: ladsgroup, zabe: Backport for Stop writing to cl_to and cl_collation on commonswiki (T399579) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow3003.esams.wmnet to plain
  • 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow3003.esams.wmnet to plain
  • 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh3004.wikimedia.org to plain
  • 11:47 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh3004.wikimedia.org to plain
  • 11:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1224 (T403362)', diff saved to https://phabricator.wikimedia.org/P82304 and previous config saved to /var/cache/conftool/dbconfig/20250901-114725-ladsgroup.json
  • 11:47 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 11:47 ladsgroup@deploy1003: Started scap sync-world: Backport for Stop writing to cl_to and cl_collation on commonswiki (T399579)
  • 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum3004.esams.wmnet to plain
  • 11:45 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum3004.esams.wmnet to plain
  • 11:45 ayounsi@dns1004: END - running authdns-update
  • 11:44 ayounsi@dns1004: START - running authdns-update
  • 11:43 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T401906)', diff saved to https://phabricator.wikimedia.org/P82303 and previous config saved to /var/cache/conftool/dbconfig/20250901-114310-fceratto.json
  • 11:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 11:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T401906)', diff saved to https://phabricator.wikimedia.org/P82302 and previous config saved to /var/cache/conftool/dbconfig/20250901-114247-fceratto.json
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir3004.esams.wmnet to plain
  • 11:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir3004.esams.wmnet to plain
  • 11:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: esams v4 routed ganeti IPs - ayounsi@cumin1003"
  • 11:40 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: esams v4 routed ganeti IPs - ayounsi@cumin1003"
  • 11:36 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 11:36 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:32 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 11:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P82301 and previous config saved to /var/cache/conftool/dbconfig/20250901-112739-fceratto.json
  • 11:27 mvernon@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ms-be[1083-1085].eqiad.wmnet with reason: awaiting controller swap
  • 11:26 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:25 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install3003.wikimedia.org to plain
  • 11:24 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install3003.wikimedia.org to plain
  • 11:22 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
  • 11:21 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 11:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
  • 11:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for ParserTestRunner: Update category counts for articles (T365303), CategoryCacheTest: Update category count, Drop support for categorylinks read old (T299951 T403147 T403337) (duration: 12m 28s)
  • 11:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P82300 and previous config saved to /var/cache/conftool/dbconfig/20250901-111232-fceratto.json
  • 11:11 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 11:09 ladsgroup@deploy1003: ladsgroup: Backport for ParserTestRunner: Update category counts for articles (T365303), CategoryCacheTest: Update category count, Drop support for categorylinks read old (T299951 T403147 T403337) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:04 ladsgroup@deploy1003: Started scap sync-world: Backport for ParserTestRunner: Update category counts for articles (T365303), CategoryCacheTest: Update category count, Drop support for categorylinks read old (T299951 T403147 T403337)
  • 10:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T401906)', diff saved to https://phabricator.wikimedia.org/P82299 and previous config saved to /var/cache/conftool/dbconfig/20250901-105724-fceratto.json
  • 10:45 moritzm: installing luajit security updates
  • 10:44 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T401906)', diff saved to https://phabricator.wikimedia.org/P82298 and previous config saved to /var/cache/conftool/dbconfig/20250901-104407-fceratto.json
  • 10:44 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 10:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T401906)', diff saved to https://phabricator.wikimedia.org/P82297 and previous config saved to /var/cache/conftool/dbconfig/20250901-104345-fceratto.json
  • 10:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P82296 and previous config saved to /var/cache/conftool/dbconfig/20250901-102837-fceratto.json
  • 10:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P82295 and previous config saved to /var/cache/conftool/dbconfig/20250901-101330-fceratto.json
  • 10:07 jmm@cumin2002: START - Cookbook sre.postgresql.postgres-init
  • 10:06 jmm@cumin2002: START - Cookbook sre.postgresql.postgres-init
  • 10:05 jmm@cumin2002: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 10:04 jmm@cumin2002: START - Cookbook sre.postgresql.postgres-init
  • 10:01 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T403362)', diff saved to https://phabricator.wikimedia.org/P82294 and previous config saved to /var/cache/conftool/dbconfig/20250901-100054-ladsgroup.json
  • 09:58 dcausse@deploy1003: Finished scap sync-world: Backport for SECURITY: declare PoolCounter settings for cirrusbuilddoc (T401220) (duration: 11m 12s)
  • 09:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T401906)', diff saved to https://phabricator.wikimedia.org/P82293 and previous config saved to /var/cache/conftool/dbconfig/20250901-095822-fceratto.json
  • 09:53 dcausse@deploy1003: dcausse: Continuing with sync
  • 09:52 dcausse@deploy1003: dcausse: Backport for SECURITY: declare PoolCounter settings for cirrusbuilddoc (T401220) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:47 dcausse@deploy1003: Started scap sync-world: Backport for SECURITY: declare PoolCounter settings for cirrusbuilddoc (T401220)
  • 09:47 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 09:47 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 09:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P82292 and previous config saved to /var/cache/conftool/dbconfig/20250901-094547-ladsgroup.json
  • 09:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T401906)', diff saved to https://phabricator.wikimedia.org/P82291 and previous config saved to /var/cache/conftool/dbconfig/20250901-094504-fceratto.json
  • 09:44 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 09:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T401906)', diff saved to https://phabricator.wikimedia.org/P82290 and previous config saved to /var/cache/conftool/dbconfig/20250901-094442-fceratto.json
  • 09:43 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 09:43 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 09:41 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:41 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:38 dcausse@deploy1003: dcausse: Continuing with sync
  • 09:33 dcausse@deploy1003: dcausse: Backport for SECURITY: declare PoolCounter settings for cirrusbuilddoc (T401220) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P82289 and previous config saved to /var/cache/conftool/dbconfig/20250901-093039-ladsgroup.json
  • 09:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P82288 and previous config saved to /var/cache/conftool/dbconfig/20250901-092934-fceratto.json
  • 09:27 dcausse@deploy1003: Started scap sync-world: Backport for SECURITY: declare PoolCounter settings for cirrusbuilddoc (T401220)
  • 09:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
  • 09:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
  • 09:24 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti3005.esams.wmnet with OS bookworm
  • 09:24 dcausse@deploy1003: Finished scap sync-world: Backport for hCaptcha: Provide label/help in authmanagerinfo API calls (T403253) (duration: 16m 15s)
  • 09:19 dcausse@deploy1003: kharlan, dcausse: Continuing with sync
  • 09:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T403362)', diff saved to https://phabricator.wikimedia.org/P82287 and previous config saved to /var/cache/conftool/dbconfig/20250901-091531-ladsgroup.json
  • 09:14 dcausse@deploy1003: kharlan, dcausse: Backport for hCaptcha: Provide label/help in authmanagerinfo API calls (T403253) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P82286 and previous config saved to /var/cache/conftool/dbconfig/20250901-091427-fceratto.json
  • 09:08 dcausse@deploy1003: Started scap sync-world: Backport for hCaptcha: Provide label/help in authmanagerinfo API calls (T403253)
  • 09:06 dcausse@deploy1003: Finished scap sync-world: Backport for Lift permission for event-organizer in Chinese Wikipedia (T403350) (duration: 14m 20s)
  • 09:01 dcausse@deploy1003: hamishz, dcausse: Continuing with sync
  • 08:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T401906)', diff saved to https://phabricator.wikimedia.org/P82285 and previous config saved to /var/cache/conftool/dbconfig/20250901-085920-fceratto.json
  • 08:58 dcausse@deploy1003: hamishz, dcausse: Backport for Lift permission for event-organizer in Chinese Wikipedia (T403350) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:52 dcausse@deploy1003: Started scap sync-world: Backport for Lift permission for event-organizer in Chinese Wikipedia (T403350)
  • 08:51 jmm@cumin2002: START - Cookbook sre.postgresql.postgres-init
  • 08:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2049.codfw.wmnet with reason: T402859
  • 08:46 dcausse@deploy1003: Finished scap sync-world: Backport for Revert "wikimaniawiki: update logo to 2025" (T403148), Remove setting `wgEnablePartialActionBlocks`. (T280532) (duration: 12m 05s)
  • 08:46 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T401906)', diff saved to https://phabricator.wikimedia.org/P82284 and previous config saved to /var/cache/conftool/dbconfig/20250901-084558-fceratto.json
  • 08:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 08:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1179 (T403362)', diff saved to https://phabricator.wikimedia.org/P82283 and previous config saved to /var/cache/conftool/dbconfig/20250901-084254-ladsgroup.json
  • 08:42 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 08:42 jmm@cumin2002: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 08:41 jmm@cumin2002: START - Cookbook sre.postgresql.postgres-init
  • 08:40 dcausse@deploy1003: mszwarc, dcausse: Continuing with sync
  • 08:39 dcausse@deploy1003: mszwarc, dcausse: Backport for Revert "wikimaniawiki: update logo to 2025" (T403148), Remove setting `wgEnablePartialActionBlocks`. (T280532) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:36 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2049.codfw.wmnet
  • 08:34 dcausse@deploy1003: Started scap sync-world: Backport for Revert "wikimaniawiki: update logo to 2025" (T403148), Remove setting `wgEnablePartialActionBlocks`. (T280532)
  • 08:25 kartik@deploy1003: Finished scap sync-world: Backport for Update HomepageVisit schema to 1.6.1 (T402496 T402497) (duration: 15m 21s)
  • 08:20 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for es2049.codfw.wmnet
  • 08:20 kartik@deploy1003: hueitan, kartik: Continuing with sync
  • 08:16 kartik@deploy1003: hueitan, kartik: Backport for Update HomepageVisit schema to 1.6.1 (T402496 T402497) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:09 kartik@deploy1003: Started scap sync-world: Backport for Update HomepageVisit schema to 1.6.1 (T402496 T402497)
  • 08:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3005.esams.wmnet with OS bookworm
  • 07:59 kartik@deploy1003: Finished scap sync-world: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496) (duration: 29m 29s)
  • 07:51 kartik@deploy1003: kartik, hueitan: Continuing with sync
  • 07:44 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts atlas3001.wikimedia.org
  • 07:44 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:44 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas3001.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1003"
  • 07:44 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas3001.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1003"
  • 07:40 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 07:40 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on people1005.eqiad.wmnet with reason: WIP T402953#11120672
  • 07:36 ayounsi@cumin1003: START - Cookbook sre.hosts.decommission for hosts atlas3001.wikimedia.org
  • 07:35 kartik@deploy1003: kartik, hueitan: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:32 ayounsi@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts atlas3001.wikimedia.org
  • 07:31 ayounsi@cumin1003: START - Cookbook sre.hosts.decommission for hosts atlas3001.wikimedia.org
  • 07:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 07:30 kartik@deploy1003: Started scap sync-world: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496)
  • 07:29 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 07:26 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh3003.wikimedia.org to plain
  • 07:26 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh3003.wikimedia.org to plain
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum3003.esams.wmnet to plain
  • 07:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 07:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum3003.esams.wmnet to plain
  • 07:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 07:17 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Disable hCaptcha for API contexts (T403263) (duration: 43m 11s)
  • 07:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir3003.esams.wmnet to plain
  • 07:17 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir3003.esams.wmnet to plain
  • 07:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:11 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 07:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
  • 07:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
  • 07:09 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 07:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast3007.wikimedia.org
  • 07:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3007.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:07 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast3007.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:04 kharlan@deploy1003: kharlan: Continuing with sync
  • 07:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:02 kharlan@deploy1003: kharlan: Backport for hCaptcha: Disable hCaptcha for API contexts (T403263) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:58 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast3007.wikimedia.org
  • 06:55 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Máté Szabó out of all services on: 2410 hosts
  • 06:55 dcausse: restarting blazegraph on wdqs1011 (stuck)
  • 06:34 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Disable hCaptcha for API contexts (T403263)


Other archives

2000s

2010s

2020-2024

2025-present