Jump to content

Server Admin Log

From Wikitech

2024-10-22

  • 22:59 ejegg: fundraising civicrm upgraded from 36660cb3 to d9e85c3d
  • 22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P70562 and previous config saved to /var/cache/conftool/dbconfig/20241022-223858-ladsgroup.json
  • 22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P70561 and previous config saved to /var/cache/conftool/dbconfig/20241022-222352-ladsgroup.json
  • 22:11 zabe@deploy2002: Finished scap sync-world: Backport for s1: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 07m 17s)
  • 22:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P70560 and previous config saved to /var/cache/conftool/dbconfig/20241022-220847-ladsgroup.json
  • 22:07 zabe@deploy2002: zabe: Continuing with sync
  • 22:06 zabe@deploy2002: zabe: Backport for s1: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:03 zabe@deploy2002: Started scap sync-world: Backport for s1: Reduce revision-slots cache expiry to 60 seconds (T183490)
  • 21:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P70559 and previous config saved to /var/cache/conftool/dbconfig/20241022-215137-ladsgroup.json
  • 21:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet
  • 21:44 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet
  • 21:44 dancy@deploy2002: Installation of scap version "4.117.0" completed for 209 hosts
  • 21:40 dancy@deploy2002: Installing scap version "4.117.0" for 209 hosts
  • 21:01 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@b08d130] (releasing): Deploying changes to single-version MediaWiki image build (duration: 01m 44s)
  • 21:00 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@b08d130] (releasing): Deploying changes to single-version MediaWiki image build
  • 20:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 20:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 20:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 20:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T376905)', diff saved to https://phabricator.wikimedia.org/P70558 and previous config saved to /var/cache/conftool/dbconfig/20241022-202717-ladsgroup.json
  • 20:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P70557 and previous config saved to /var/cache/conftool/dbconfig/20241022-201210-ladsgroup.json
  • 19:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P70556 and previous config saved to /var/cache/conftool/dbconfig/20241022-195703-ladsgroup.json
  • 19:54 swfrench-wmf: running puppet on A:cp-text (-b11) after validating ATS Lua changes on cp4040 - T372605
  • 19:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T376905)', diff saved to https://phabricator.wikimedia.org/P70555 and previous config saved to /var/cache/conftool/dbconfig/20241022-194156-ladsgroup.json
  • 19:40 swfrench-wmf: disabling puppet on A:cp-text before merging ATS Lua changes - T372605
  • 19:39 ladsgroup@deploy2002: Finished scap sync-world: Backport for Fix duplicated key in wgVectorNightMode (duration: 07m 51s)
  • 19:36 ladsgroup@deploy2002: ladsgroup, ebrahim: Continuing with sync
  • 19:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T376905)', diff saved to https://phabricator.wikimedia.org/P70554 and previous config saved to /var/cache/conftool/dbconfig/20241022-193352-ladsgroup.json
  • 19:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@deploy2002: ladsgroup, ebrahim: Backport for Fix duplicated key in wgVectorNightMode synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 19:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T376905)', diff saved to https://phabricator.wikimedia.org/P70553 and previous config saved to /var/cache/conftool/dbconfig/20241022-193327-ladsgroup.json
  • 19:31 ladsgroup@deploy2002: Started scap sync-world: Backport for Fix duplicated key in wgVectorNightMode
  • 19:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P70552 and previous config saved to /var/cache/conftool/dbconfig/20241022-191820-ladsgroup.json
  • 19:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P70551 and previous config saved to /var/cache/conftool/dbconfig/20241022-190313-ladsgroup.json
  • 19:00 dduvall@deploy2002: Installation of scap version "4.116.0" completed for 209 hosts
  • 18:56 dduvall@deploy2002: Installing scap version "4.116.0" for 209 hosts
  • 18:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70550 and previous config saved to /var/cache/conftool/dbconfig/20241022-184946-arnaudb.json
  • 18:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T376905)', diff saved to https://phabricator.wikimedia.org/P70549 and previous config saved to /var/cache/conftool/dbconfig/20241022-184806-ladsgroup.json
  • 18:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T376905)', diff saved to https://phabricator.wikimedia.org/P70548 and previous config saved to /var/cache/conftool/dbconfig/20241022-183955-ladsgroup.json
  • 18:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T376905)', diff saved to https://phabricator.wikimedia.org/P70547 and previous config saved to /var/cache/conftool/dbconfig/20241022-183930-ladsgroup.json
  • 18:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70546 and previous config saved to /var/cache/conftool/dbconfig/20241022-183440-arnaudb.json
  • 18:26 dancy@deploy2002: sync-world aborted: Refreshing (duration: 01m 33s)
  • 18:24 dancy@deploy2002: Started scap sync-world: Refreshing
  • 18:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P70544 and previous config saved to /var/cache/conftool/dbconfig/20241022-182423-ladsgroup.json
  • 18:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70543 and previous config saved to /var/cache/conftool/dbconfig/20241022-181933-arnaudb.json
  • 18:17 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.28 refs T375659
  • 18:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P70542 and previous config saved to /var/cache/conftool/dbconfig/20241022-180916-ladsgroup.json
  • 18:09 dancy@deploy2002: Finished scap sync-world: Backport for Prevent blocked users from being able to review/unreview articles (T366991) (duration: 07m 26s)
  • 18:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70541 and previous config saved to /var/cache/conftool/dbconfig/20241022-180426-arnaudb.json
  • 18:04 dancy@deploy2002: dancy, sbassett: Continuing with sync
  • 18:04 dancy@deploy2002: dancy, sbassett: Backport for Prevent blocked users from being able to review/unreview articles (T366991) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:01 dancy@deploy2002: Started scap sync-world: Backport for Prevent blocked users from being able to review/unreview articles (T366991)
  • 17:54 sukhe: sudo cumin -b4 "A:cp-upload" 'run-puppet-agent --enable "merging CR 1078994"': T375761
  • 17:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T376905)', diff saved to https://phabricator.wikimedia.org/P70540 and previous config saved to /var/cache/conftool/dbconfig/20241022-175409-ladsgroup.json
  • 17:50 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@16eb792] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/90 (duration: 01m 21s)
  • 17:49 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@16eb792] (releasing): Deploying https://gitlab.wikimedia.org/repos/releng/jenkins-deploy/-/merge_requests/90
  • 17:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T376905)', diff saved to https://phabricator.wikimedia.org/P70539 and previous config saved to /var/cache/conftool/dbconfig/20241022-174555-ladsgroup.json
  • 17:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 17:45 sukhe: sudo cumin "A:cp-upload" 'disable-puppet "merging CR 1078994"': T375761
  • 17:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 17:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70538 and previous config saved to /var/cache/conftool/dbconfig/20241022-174530-ladsgroup.json
  • 17:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P70537 and previous config saved to /var/cache/conftool/dbconfig/20241022-173022-ladsgroup.json
  • 17:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2014.codfw.wmnet
  • 17:23 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs2014.codfw.wmnet
  • 17:18 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2014.codfw.wmnet with reason: rebooting to test changes rolled out in CR 1006063
  • 17:17 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs2014.codfw.wmnet with reason: rebooting to test changes rolled out in CR 1006063
  • 17:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P70536 and previous config saved to /var/cache/conftool/dbconfig/20241022-171515-ladsgroup.json
  • 17:14 sukhe: re-enable Puppet on A:lvs [change merged on lvs2014]: T358260
  • 17:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in eqiad: repooling sessionstore post mesh migration T363996
  • 17:04 hnowlan@cumin1002: START - Cookbook sre.discovery.service-route pool sessionstore in eqiad: repooling sessionstore post mesh migration T363996
  • 17:04 sukhe: disable Puppet on A:lvs to merge 1006063: T358260
  • 17:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70535 and previous config saved to /var/cache/conftool/dbconfig/20241022-170400-arnaudb.json
  • 17:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 17:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70534 and previous config saved to /var/cache/conftool/dbconfig/20241022-170337-arnaudb.json
  • 17:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70533 and previous config saved to /var/cache/conftool/dbconfig/20241022-170008-ladsgroup.json
  • 16:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70532 and previous config saved to /var/cache/conftool/dbconfig/20241022-165211-ladsgroup.json
  • 16:52 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1176.eqiad.wmnet
  • 16:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 16:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T376905)', diff saved to https://phabricator.wikimedia.org/P70531 and previous config saved to /var/cache/conftool/dbconfig/20241022-165147-ladsgroup.json
  • 16:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70530 and previous config saved to /var/cache/conftool/dbconfig/20241022-164830-arnaudb.json
  • 16:47 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 16:46 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 16:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
  • 16:44 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 16:44 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 16:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P70529 and previous config saved to /var/cache/conftool/dbconfig/20241022-163639-ladsgroup.json
  • 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70528 and previous config saved to /var/cache/conftool/dbconfig/20241022-163323-arnaudb.json
  • 16:31 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1176.eqiad.wmnet
  • 16:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P70527 and previous config saved to /var/cache/conftool/dbconfig/20241022-162132-ladsgroup.json
  • 16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70526 and previous config saved to /var/cache/conftool/dbconfig/20241022-161816-arnaudb.json
  • 16:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70525 and previous config saved to /var/cache/conftool/dbconfig/20241022-161604-arnaudb.json
  • 16:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 16:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70524 and previous config saved to /var/cache/conftool/dbconfig/20241022-161552-arnaudb.json
  • 16:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in eqiad: testing sessionstore mesh migration
  • 16:08 hnowlan@cumin1002: START - Cookbook sre.discovery.service-route depool sessionstore in eqiad: testing sessionstore mesh migration
  • 16:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T376905)', diff saved to https://phabricator.wikimedia.org/P70523 and previous config saved to /var/cache/conftool/dbconfig/20241022-160625-ladsgroup.json
  • 16:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70522 and previous config saved to /var/cache/conftool/dbconfig/20241022-160045-arnaudb.json
  • 15:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5004.wikimedia.org
  • 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T376905)', diff saved to https://phabricator.wikimedia.org/P70521 and previous config saved to /var/cache/conftool/dbconfig/20241022-155824-ladsgroup.json
  • 15:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 15:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70520 and previous config saved to /var/cache/conftool/dbconfig/20241022-155759-ladsgroup.json
  • 15:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
  • 15:54 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 15:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5004.wikimedia.org
  • 15:53 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 15:53 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) check sessionstore: maintenance
  • 15:53 hnowlan@cumin1002: START - Cookbook sre.discovery.service-route check sessionstore: maintenance
  • 15:52 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:52 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70519 and previous config saved to /var/cache/conftool/dbconfig/20241022-154538-arnaudb.json
  • 15:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P70518 and previous config saved to /var/cache/conftool/dbconfig/20241022-154251-ladsgroup.json
  • 15:39 sbassett@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:38 sbassett@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:38 sbassett@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:38 sbassett@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:38 sbassett@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:38 sbassett@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:37 sbassett@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:37 sbassett@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:36 sbassett@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:36 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:36 sbassett@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:35 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
  • 15:31 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:30 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70517 and previous config saved to /var/cache/conftool/dbconfig/20241022-153031-arnaudb.json
  • 15:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P70516 and previous config saved to /var/cache/conftool/dbconfig/20241022-152743-ladsgroup.json
  • 15:19 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:19 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:18 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:18 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:15 aqu: Deployed refinery using scap, then deployed onto hdfs
  • 15:14 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host kubestagemaster2003.codfw.wmnet
  • 15:14 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host kubestagemaster2003.codfw.wmnet
  • 15:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70515 and previous config saved to /var/cache/conftool/dbconfig/20241022-151237-ladsgroup.json
  • 15:11 gmodena@deploy2002: Finished deploy [airflow-dags/analytics@7c2d65f]: DPE 2024-10-22 deployment train (duration: 01m 16s)
  • 15:10 gmodena@deploy2002: Started deploy [airflow-dags/analytics@7c2d65f]: DPE 2024-10-22 deployment train
  • 15:09 brennen@deploy2002: Finished deploy [phabricator/deployment@582cde5]: deploy phab1004 for T377850 (duration: 01m 04s)
  • 15:08 brennen@deploy2002: Started deploy [phabricator/deployment@582cde5]: deploy phab1004 for T377850
  • 15:07 brennen@deploy2002: Finished deploy [phabricator/deployment@582cde5]: test deploy phab2002 for T377850 (may fail, expected) (duration: 00m 24s)
  • 15:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:07 eoghan@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator deployment
  • 15:07 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator deployment
  • 15:07 brennen@deploy2002: Started deploy [phabricator/deployment@582cde5]: test deploy phab2002 for T377850 (may fail, expected)
  • 15:06 eoghan@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phabricator.wikimedia.org with reason: Phabricator deployment
  • 15:06 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phabricator.wikimedia.org with reason: Phabricator deployment
  • 15:06 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator deployment
  • 15:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:06 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator deployment
  • 15:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70514 and previous config saved to /var/cache/conftool/dbconfig/20241022-150435-ladsgroup.json
  • 15:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 15:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 15:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70513 and previous config saved to /var/cache/conftool/dbconfig/20241022-150409-ladsgroup.json
  • 14:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 100%: T377718', diff saved to https://phabricator.wikimedia.org/P70512 and previous config saved to /var/cache/conftool/dbconfig/20241022-145653-arnaudb.json
  • 14:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:52 hashar@deploy2002: Finished deploy [gerrit/gerrit@30691f2]: Update patch demo to recognize both legacy and new URLs - T374954 (duration: 00m 10s)
  • 14:52 hashar@deploy2002: Started deploy [gerrit/gerrit@30691f2]: Update patch demo to recognize both legacy and new URLs - T374954
  • 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
  • 14:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P70511 and previous config saved to /var/cache/conftool/dbconfig/20241022-144902-ladsgroup.json
  • 14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 75%: T377718', diff saved to https://phabricator.wikimedia.org/P70510 and previous config saved to /var/cache/conftool/dbconfig/20241022-144148-arnaudb.json
  • 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 14:40 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 14:37 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2084 to codfw - jhancock@cumin2002"
  • 14:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2084 to codfw - jhancock@cumin2002"
  • 14:36 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70509 and previous config saved to /var/cache/conftool/dbconfig/20241022-143628-arnaudb.json
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 14:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 14:34 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
  • 14:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P70507 and previous config saved to /var/cache/conftool/dbconfig/20241022-143355-ladsgroup.json
  • 14:32 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Fix performer link on Special:GlobalBlockList (T377398) (duration: 07m 43s)
  • 14:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70506 and previous config saved to /var/cache/conftool/dbconfig/20241022-143005-arnaudb.json
  • 14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:27 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
  • 14:27 dreamyjazz@deploy2002: dreamyjazz: Backport for Fix performer link on Special:GlobalBlockList (T377398) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 50%: T377718', diff saved to https://phabricator.wikimedia.org/P70505 and previous config saved to /var/cache/conftool/dbconfig/20241022-142642-arnaudb.json
  • 14:24 dreamyjazz@deploy2002: Started scap sync-world: Backport for Fix performer link on Special:GlobalBlockList (T377398)
  • 14:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70504 and previous config saved to /var/cache/conftool/dbconfig/20241022-142123-arnaudb.json
  • 14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70503 and previous config saved to /var/cache/conftool/dbconfig/20241022-141848-ladsgroup.json
  • 14:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 25%: T377718', diff saved to https://phabricator.wikimedia.org/P70502 and previous config saved to /var/cache/conftool/dbconfig/20241022-141137-arnaudb.json
  • 14:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
  • 14:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2011.codfw.wmnet
  • 14:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
  • 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70501 and previous config saved to /var/cache/conftool/dbconfig/20241022-140956-ladsgroup.json
  • 14:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 14:09 ejegg: payments-wiki upgraded from 7ae3479f to a039cd50
  • 14:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T376905)', diff saved to https://phabricator.wikimedia.org/P70500 and previous config saved to /var/cache/conftool/dbconfig/20241022-140931-ladsgroup.json
  • 14:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
  • 14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70499 and previous config saved to /var/cache/conftool/dbconfig/20241022-140617-arnaudb.json
  • 14:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
  • 13:59 moritzm: rebalance ganeti clusters in magru following reboots
  • 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet
  • 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
  • 13:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 10%: T377718', diff saved to https://phabricator.wikimedia.org/P70498 and previous config saved to /var/cache/conftool/dbconfig/20241022-135631-arnaudb.json
  • 13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P70497 and previous config saved to /var/cache/conftool/dbconfig/20241022-135424-ladsgroup.json
  • 13:52 Lucas_WMDE: UTC afternoon backport+window done (a further GlobalBlocking fix will be backported out-of-window soon)
  • 13:51 aqu@deploy2002: Finished deploy [analytics/refinery@ffc985a] (hadoop-test): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 03m 17s)
  • 13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70496 and previous config saved to /var/cache/conftool/dbconfig/20241022-135112-arnaudb.json
  • 13:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
  • 13:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet
  • 13:48 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a] (hadoop-test): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
  • 13:48 aqu@deploy2002: Finished deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 00m 07s)
  • 13:48 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
  • 13:47 aqu@deploy2002: Finished deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 00m 57s)
  • 13:47 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:46 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
  • 13:45 aqu@deploy2002: deploy aborted: Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 03m 50s)
  • 13:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:44 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Activate feature flag to default move wikibase sidebar link to other projects. (T66315) (duration: 08m 40s)
  • 13:41 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a] (thin): Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
  • 13:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 5%: T377718', diff saved to https://phabricator.wikimedia.org/P70495 and previous config saved to /var/cache/conftool/dbconfig/20241022-134126-arnaudb.json
  • 13:40 lucaswerkmeister-wmde@deploy2002: joelyrookewmde, lucaswerkmeister-wmde: Continuing with sync
  • 13:39 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2002.codfw.wmnet with OS bullseye
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P70494 and previous config saved to /var/cache/conftool/dbconfig/20241022-133916-ladsgroup.json
  • 13:39 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:37 lucaswerkmeister-wmde@deploy2002: joelyrookewmde, lucaswerkmeister-wmde: Backport for Activate feature flag to default move wikibase sidebar link to other projects. (T66315) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:35 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Activate feature flag to default move wikibase sidebar link to other projects. (T66315)
  • 13:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2149.codfw.wmnet onto db2227.codfw.wmnet
  • 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
  • 13:32 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Don't escape performer link HTML in GlobalBlockDetailsRenderer (T377398) (duration: 15m 27s)
  • 13:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:29 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 100%: T377718', diff saved to https://phabricator.wikimedia.org/P70493 and previous config saved to /var/cache/conftool/dbconfig/20241022-132745-arnaudb.json
  • 13:25 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dreamyjazz: Continuing with sync
  • 13:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
  • 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T376905)', diff saved to https://phabricator.wikimedia.org/P70492 and previous config saved to /var/cache/conftool/dbconfig/20241022-132409-ladsgroup.json
  • 13:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
  • 13:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
  • 13:19 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2085-2086,2088-2089].codfw.wmnet
  • 13:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dreamyjazz: Backport for Don't escape performer link HTML in GlobalBlockDetailsRenderer (T377398) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:19 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2085-2086,2088-2089].codfw.wmnet
  • 13:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Don't escape performer link HTML in GlobalBlockDetailsRenderer (T377398)
  • 13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T376905)', diff saved to https://phabricator.wikimedia.org/P70491 and previous config saved to /var/cache/conftool/dbconfig/20241022-131448-ladsgroup.json
  • 13:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 13:14 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Release CampaignEvents to eswiki (T376786) (duration: 09m 35s)
  • 13:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70490 and previous config saved to /var/cache/conftool/dbconfig/20241022-131415-ladsgroup.json
  • 13:14 aqu@deploy2002: Finished deploy [analytics/refinery@ffc985a]: Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7] (duration: 19m 41s)
  • 13:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 75%: T377718', diff saved to https://phabricator.wikimedia.org/P70489 and previous config saved to /var/cache/conftool/dbconfig/20241022-131239-arnaudb.json
  • 13:09 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync
  • 13:07 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for Release CampaignEvents to eswiki (T376786) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Release CampaignEvents to eswiki (T376786)
  • 13:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2002.codfw.wmnet with reason: host reimage
  • 12:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P70488 and previous config saved to /var/cache/conftool/dbconfig/20241022-125908-ladsgroup.json
  • 12:58 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2002.codfw.wmnet with reason: host reimage
  • 12:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 50%: T377718', diff saved to https://phabricator.wikimedia.org/P70487 and previous config saved to /var/cache/conftool/dbconfig/20241022-125734-arnaudb.json
  • 12:55 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2089.codfw.wmnet with OS bookworm
  • 12:54 aqu@deploy2002: Started deploy [analytics/refinery@ffc985a]: Adding refinery/source 0.2.49.2 & 0.2.53 [analytics/refinery@ffc985a7]
  • 12:53 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2086.codfw.wmnet with OS bookworm
  • 12:50 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2085.codfw.wmnet with OS bookworm
  • 12:45 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2088.codfw.wmnet with OS bookworm
  • 12:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P70486 and previous config saved to /var/cache/conftool/dbconfig/20241022-124401-ladsgroup.json
  • 12:42 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 25%: T377718', diff saved to https://phabricator.wikimedia.org/P70485 and previous config saved to /var/cache/conftool/dbconfig/20241022-124228-arnaudb.json
  • 12:42 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host gitlab-runner2002
  • 12:42 jelto@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host gitlab-runner2002
  • 12:41 jelto@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host gitlab-runner2002
  • 12:41 jelto@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gitlab-runner2002.codfw.wmnet 161.16.192.10.in-addr.arpa 1.6.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:41 jelto@cumin1002: START - Cookbook sre.dns.wipe-cache gitlab-runner2002.codfw.wmnet 161.16.192.10.in-addr.arpa 1.6.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:41 jelto@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:41 jelto@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2002 - jelto@cumin1002"
  • 12:41 jelto@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host gitlab-runner2002 - jelto@cumin1002"
  • 12:37 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2089.codfw.wmnet with reason: host reimage
  • 12:37 jelto@cumin1002: START - Cookbook sre.dns.netbox
  • 12:36 jelto@cumin1002: START - Cookbook sre.hosts.move-vlan for host gitlab-runner2002
  • 12:36 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner2002.codfw.wmnet with OS bullseye
  • 12:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:34 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2086.codfw.wmnet with reason: host reimage
  • 12:34 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2089.codfw.wmnet with reason: host reimage
  • 12:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:31 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2085.codfw.wmnet with reason: host reimage
  • 12:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70484 and previous config saved to /var/cache/conftool/dbconfig/20241022-122854-ladsgroup.json
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2010.codfw.wmnet
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2010.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:27 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2088.codfw.wmnet with reason: host reimage
  • 12:27 Dreamy_Jazz: Running MediaModeration scan on all group2 wikis
  • 12:27 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2086.codfw.wmnet with reason: host reimage
  • 12:27 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2085.codfw.wmnet with reason: host reimage
  • 12:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 10%: T377718', diff saved to https://phabricator.wikimedia.org/P70483 and previous config saved to /var/cache/conftool/dbconfig/20241022-122723-arnaudb.json
  • 12:27 Dreamy_Jazz: Stopped MediaModeration scan on all group1 wikis
  • 12:24 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2088.codfw.wmnet with reason: host reimage
  • 12:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2010.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:20 Dreamy_Jazz: Running MediaModeration scan on all group1 wikis
  • 12:20 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Java 11 security updates - klausman@cumin2002
  • 12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70482 and previous config saved to /var/cache/conftool/dbconfig/20241022-121928-ladsgroup.json
  • 12:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 12:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T376905)', diff saved to https://phabricator.wikimedia.org/P70481 and previous config saved to /var/cache/conftool/dbconfig/20241022-121903-ladsgroup.json
  • 12:17 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:12 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2227.codfw.wmnet
  • 12:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 5%: T377718', diff saved to https://phabricator.wikimedia.org/P70480 and previous config saved to /var/cache/conftool/dbconfig/20241022-121218-arnaudb.json
  • 12:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2010.codfw.wmnet
  • 12:09 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2089.codfw.wmnet with OS bookworm
  • 12:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2149,2227].codfw.wmnet with reason: maintenance
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2009.codfw.wmnet
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 12:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2149,2227].codfw.wmnet with reason: maintenance
  • 12:08 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2088.codfw.wmnet with OS bookworm
  • 12:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2149 and db2227 - T377718', diff saved to https://phabricator.wikimedia.org/P70479 and previous config saved to /var/cache/conftool/dbconfig/20241022-120753-arnaudb.json
  • 12:06 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2086.codfw.wmnet with OS bookworm
  • 12:06 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2085.codfw.wmnet with OS bookworm
  • 12:05 Dreamy_Jazz: Running MediaModeration scan on all group0 wikis
  • 12:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P70478 and previous config saved to /var/cache/conftool/dbconfig/20241022-120356-ladsgroup.json
  • 12:03 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for tests: Don't depend on Message implementation details (T377778), Update for Message/MessageValue changes (T377778) (duration: 15m 27s)
  • 12:02 klausman@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Java 11 security updates - klausman@cumin2002
  • 11:57 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2085-2086,2088-2089].codfw.wmnet
  • 11:57 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Java 11 security updates - klausman@cumin2002
  • 11:56 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 11:55 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2085-2086,2088-2089].codfw.wmnet
  • 11:55 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for tests: Don't depend on Message implementation details (T377778), Update for Message/MessageValue changes (T377778) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:48 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P70477 and previous config saved to /var/cache/conftool/dbconfig/20241022-114849-ladsgroup.json
  • 11:48 kart_: Updated cxserver to 2024-10-22-112806-production (T357950)
  • 11:47 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for tests: Don't depend on Message implementation details (T377778), Update for Message/MessageValue changes (T377778)
  • 11:47 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:46 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:46 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:45 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:44 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2009.codfw.wmnet
  • 11:43 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:43 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host wikikube-worker2085.codfw.wmnet
  • 11:43 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker2085.codfw.wmnet
  • 11:41 akosiaris: remove faidon from WMCS projects maps, visualeditor, swift, testlabs per his request. Keep the bastion project. cc paravoid
  • 11:39 klausman@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Java 11 security updates - klausman@cumin2002
  • 11:34 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host kubestagemaster2005.codfw.wmnet
  • 11:34 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host kubestagemaster2005.codfw.wmnet
  • 11:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T376905)', diff saved to https://phabricator.wikimedia.org/P70476 and previous config saved to /var/cache/conftool/dbconfig/20241022-113342-ladsgroup.json
  • 11:27 moritzm: installing Java 11 security updates
  • 11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T376905)', diff saved to https://phabricator.wikimedia.org/P70475 and previous config saved to /var/cache/conftool/dbconfig/20241022-112408-ladsgroup.json
  • 11:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T376905)', diff saved to https://phabricator.wikimedia.org/P70474 and previous config saved to /var/cache/conftool/dbconfig/20241022-112343-ladsgroup.json
  • 11:21 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: sync
  • 11:21 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: sync
  • 11:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P70473 and previous config saved to /var/cache/conftool/dbconfig/20241022-110836-ladsgroup.json
  • 11:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70472 and previous config saved to /var/cache/conftool/dbconfig/20241022-110744-arnaudb.json
  • 10:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P70471 and previous config saved to /var/cache/conftool/dbconfig/20241022-105329-ladsgroup.json
  • 10:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70470 and previous config saved to /var/cache/conftool/dbconfig/20241022-105238-arnaudb.json
  • 10:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T376905)', diff saved to https://phabricator.wikimedia.org/P70469 and previous config saved to /var/cache/conftool/dbconfig/20241022-103822-ladsgroup.json
  • 10:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70468 and previous config saved to /var/cache/conftool/dbconfig/20241022-103733-arnaudb.json
  • 10:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T376905)', diff saved to https://phabricator.wikimedia.org/P70467 and previous config saved to /var/cache/conftool/dbconfig/20241022-102907-ladsgroup.json
  • 10:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 10:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 10:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70466 and previous config saved to /var/cache/conftool/dbconfig/20241022-102843-ladsgroup.json
  • 10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70465 and previous config saved to /var/cache/conftool/dbconfig/20241022-102227-arnaudb.json
  • 10:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P70464 and previous config saved to /var/cache/conftool/dbconfig/20241022-101336-ladsgroup.json
  • 10:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 10:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 10:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 10:04 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: sync
  • 10:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 10:03 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: sync
  • 10:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2149.codfw.wmnet onto db2205.codfw.wmnet
  • 10:03 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@dcf019d]: (no justification provided) (duration: 00m 11s)
  • 10:02 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@dcf019d]: (no justification provided)
  • 09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P70463 and previous config saved to /var/cache/conftool/dbconfig/20241022-095829-ladsgroup.json
  • 09:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 09:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70461 and previous config saved to /var/cache/conftool/dbconfig/20241022-094322-ladsgroup.json
  • 09:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 09:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70460 and previous config saved to /var/cache/conftool/dbconfig/20241022-093345-ladsgroup.json
  • 09:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 09:28 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:22 hashar: Restarting CI Jenkins
  • 09:06 hashar: Restarting Gerrit
  • 08:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
  • 08:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
  • 08:37 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2205.codfw.wmnet
  • 08:35 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:33 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:33 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70459 and previous config saved to /var/cache/conftool/dbconfig/20241022-082545-arnaudb.json
  • 08:24 moritzm: irc.wikimedia.org has been switched to ircstream T376014
  • 08:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70457 and previous config saved to /var/cache/conftool/dbconfig/20241022-081040-arnaudb.json
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
  • 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
  • 08:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:03 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:00 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2149,2205].codfw.wmnet with reason: db2205 reclone
  • 07:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db[2149,2205].codfw.wmnet with reason: db2205 reclone
  • 07:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:58 arnaudb@cumin1002: dbctl commit (dc=all): 'T377718', diff saved to https://phabricator.wikimedia.org/P70456 and previous config saved to /var/cache/conftool/dbconfig/20241022-075830-arnaudb.json
  • 07:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70455 and previous config saved to /var/cache/conftool/dbconfig/20241022-075534-arnaudb.json
  • 07:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 28%: post clone', diff saved to https://phabricator.wikimedia.org/P70454 and previous config saved to /var/cache/conftool/dbconfig/20241022-074029-arnaudb.json
  • 07:28 moritzm: installing Java 17 security updates
  • 07:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 27%: post clone', diff saved to https://phabricator.wikimedia.org/P70453 and previous config saved to /var/cache/conftool/dbconfig/20241022-072523-arnaudb.json
  • 07:23 moritzm: rearm keyholder on netmon1003
  • 07:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 26%: post clone', diff saved to https://phabricator.wikimedia.org/P70452 and previous config saved to /var/cache/conftool/dbconfig/20241022-071018-arnaudb.json
  • 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6003.wikimedia.org
  • 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org
  • 06:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6003.wikimedia.org
  • 06:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2240 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70451 and previous config saved to /var/cache/conftool/dbconfig/20241022-065513-arnaudb.json
  • 06:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
  • 05:41 kart_: Remove servicerunner dependency for cxserver (T357950, T373777)
  • 05:31 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:30 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:25 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:24 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.25 (duration: 00m 58s)
  • 03:52 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.28 refs T375659 (duration: 49m 37s)
  • 03:02 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.28 refs T375659
  • 01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T376905)', diff saved to https://phabricator.wikimedia.org/P70450 and previous config saved to /var/cache/conftool/dbconfig/20241022-010820-ladsgroup.json
  • 00:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P70449 and previous config saved to /var/cache/conftool/dbconfig/20241022-005313-ladsgroup.json
  • 00:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P70448 and previous config saved to /var/cache/conftool/dbconfig/20241022-003807-ladsgroup.json
  • 00:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T376905)', diff saved to https://phabricator.wikimedia.org/P70447 and previous config saved to /var/cache/conftool/dbconfig/20241022-002259-ladsgroup.json
  • 00:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2229 (T376905)', diff saved to https://phabricator.wikimedia.org/P70446 and previous config saved to /var/cache/conftool/dbconfig/20241022-001606-ladsgroup.json
  • 00:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 00:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 00:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70445 and previous config saved to /var/cache/conftool/dbconfig/20241022-001539-ladsgroup.json
  • 00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P70444 and previous config saved to /var/cache/conftool/dbconfig/20241022-000032-ladsgroup.json

2024-10-21

  • 23:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P70443 and previous config saved to /var/cache/conftool/dbconfig/20241021-234525-ladsgroup.json
  • 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70442 and previous config saved to /var/cache/conftool/dbconfig/20241021-233018-ladsgroup.json
  • 23:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
  • 22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T376905)', diff saved to https://phabricator.wikimedia.org/P70441 and previous config saved to /var/cache/conftool/dbconfig/20241021-222952-ladsgroup.json
  • 22:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 22:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T376905)', diff saved to https://phabricator.wikimedia.org/P70440 and previous config saved to /var/cache/conftool/dbconfig/20241021-222926-ladsgroup.json
  • 22:21 eileen: config revision changed from a1c7759c to 3bbf553d
  • 22:18 zabe@deploy2002: Finished scap sync-world: Backport for group0: Increase revision-slots cache expiry back to default (T183490) (duration: 06m 58s)
  • 22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P70439 and previous config saved to /var/cache/conftool/dbconfig/20241021-221419-ladsgroup.json
  • 22:13 zabe@deploy2002: zabe: Continuing with sync
  • 22:13 zabe@deploy2002: zabe: Backport for group0: Increase revision-slots cache expiry back to default (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:11 zabe@deploy2002: Started scap sync-world: Backport for group0: Increase revision-slots cache expiry back to default (T183490)
  • 21:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
  • 21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P70438 and previous config saved to /var/cache/conftool/dbconfig/20241021-215912-ladsgroup.json
  • 21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T376905)', diff saved to https://phabricator.wikimedia.org/P70437 and previous config saved to /var/cache/conftool/dbconfig/20241021-214405-ladsgroup.json
  • 21:43 eileen: config revision changed from d240bcfb to a1c7759c
  • 21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T376905)', diff saved to https://phabricator.wikimedia.org/P70436 and previous config saved to /var/cache/conftool/dbconfig/20241021-213801-ladsgroup.json
  • 21:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 21:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 21:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T376905)', diff saved to https://phabricator.wikimedia.org/P70435 and previous config saved to /var/cache/conftool/dbconfig/20241021-213733-ladsgroup.json
  • 21:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 21:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P70434 and previous config saved to /var/cache/conftool/dbconfig/20241021-212226-ladsgroup.json
  • 21:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 21:16 swfrench-wmf: ran authdns-update to pick up mw-(web|api-ext)-next discovery records - T377040
  • 21:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P70433 and previous config saved to /var/cache/conftool/dbconfig/20241021-210718-ladsgroup.json
  • 21:00 sukhe: running authdns-update for CR 1081371
  • away: UTC late deploys done
  • 20:56 tgr@deploy2002: Finished scap sync-world: Backport for fix(AuthManagerStatsd): counters require static set of labels (T377476) (duration: 18m 43s)
  • 20:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T376905)', diff saved to https://phabricator.wikimedia.org/P70431 and previous config saved to /var/cache/conftool/dbconfig/20241021-205211-ladsgroup.json
  • 20:52 tgr@deploy2002: tgr: Continuing with sync
  • 20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T376905)', diff saved to https://phabricator.wikimedia.org/P70430 and previous config saved to /var/cache/conftool/dbconfig/20241021-204603-ladsgroup.json
  • 20:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 20:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 20:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T376905)', diff saved to https://phabricator.wikimedia.org/P70429 and previous config saved to /var/cache/conftool/dbconfig/20241021-204536-ladsgroup.json
  • 20:40 tgr@deploy2002: tgr: Backport for fix(AuthManagerStatsd): counters require static set of labels (T377476) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:37 tgr@deploy2002: Started scap sync-world: Backport for fix(AuthManagerStatsd): counters require static set of labels (T377476)
  • 20:32 tgr@deploy2002: Finished scap sync-world: Backport for frwiki: switch clearing link recommendations to PageSaveComplete hook (T372337) (duration: 08m 19s)
  • 20:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P70428 and previous config saved to /var/cache/conftool/dbconfig/20241021-203029-ladsgroup.json
  • 20:28 tgr@deploy2002: migr, tgr: Continuing with sync
  • 20:26 tgr@deploy2002: migr, tgr: Backport for frwiki: switch clearing link recommendations to PageSaveComplete hook (T372337) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:24 tgr@deploy2002: Started scap sync-world: Backport for frwiki: switch clearing link recommendations to PageSaveComplete hook (T372337)
  • 20:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 20:21 tgr@deploy2002: Finished scap sync-world: Backport for Re-apply "Set special footer licence message for MediaWiki.org re. Help: pages" (T301483) (duration: 09m 48s)
  • 20:19 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 20:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 20:16 tgr@deploy2002: matmarex, tgr: Continuing with sync
  • 20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P70427 and previous config saved to /var/cache/conftool/dbconfig/20241021-201522-ladsgroup.json
  • 20:13 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 20:13 tgr@deploy2002: matmarex, tgr: Backport for Re-apply "Set special footer licence message for MediaWiki.org re. Help: pages" (T301483) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:11 tgr@deploy2002: Started scap sync-world: Backport for Re-apply "Set special footer licence message for MediaWiki.org re. Help: pages" (T301483)
  • 20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T376905)', diff saved to https://phabricator.wikimedia.org/P70426 and previous config saved to /var/cache/conftool/dbconfig/20241021-200015-ladsgroup.json
  • 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T376905)', diff saved to https://phabricator.wikimedia.org/P70425 and previous config saved to /var/cache/conftool/dbconfig/20241021-195300-ladsgroup.json
  • 19:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70424 and previous config saved to /var/cache/conftool/dbconfig/20241021-195233-ladsgroup.json
  • 19:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P70423 and previous config saved to /var/cache/conftool/dbconfig/20241021-193726-ladsgroup.json
  • 19:36 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-next-ro,name=eqiad [reason: preparing mw-api-ext-next-ro (a/a) for discovery - T377040]
  • 19:36 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-next-ro,name=codfw [reason: preparing mw-api-ext-next-ro (a/a) for discovery - T377040]
  • 19:36 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@b75c4aa] (releasing): Deploying changes to MediaWiki branch and publish WMF single-version image job (duration: 01m 20s)
  • 19:36 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-next-ro,name=eqiad [reason: preparing mw-web-next-ro (a/a) for discovery - T377040]
  • 19:35 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-next-ro,name=codfw [reason: preparing mw-web-next-ro (a/a) for discovery - T377040]
  • 19:34 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@b75c4aa] (releasing): Deploying changes to MediaWiki branch and publish WMF single-version image job
  • 19:31 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-next,name=codfw [reason: preparing mw-api-ext-next (a/p) for discovery - T377040]
  • 19:30 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-next,name=codfw [reason: preparing mw-web-next (a/p) for discovery - T377040]
  • 19:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P70422 and previous config saved to /var/cache/conftool/dbconfig/20241021-192219-ladsgroup.json
  • 19:11 ejegg: re-enabled fundraising thank you mailer
  • 19:10 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)
  • 19:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70421 and previous config saved to /var/cache/conftool/dbconfig/20241021-190712-ladsgroup.json
  • 19:04 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)
  • 19:02 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T377040)
  • 19:02 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T377040)
  • 19:01 swfrench-wmf: ran and enabled puppet agent on 'A:lvs and A:codfw' - T377040
  • 19:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T376905)', diff saved to https://phabricator.wikimedia.org/P70420 and previous config saved to /var/cache/conftool/dbconfig/20241021-185957-ladsgroup.json
  • 19:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 19:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T376905)', diff saved to https://phabricator.wikimedia.org/P70419 and previous config saved to /var/cache/conftool/dbconfig/20241021-185931-ladsgroup.json
  • 18:58 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)
  • 18:52 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)
  • 18:51 zabe@deploy2002: Finished scap sync-world: Backport for s4: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 16m 09s)
  • 18:51 ejegg: fundraising civicrm upgraded from cfb0def0 to 36660cb3
  • 18:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1012.eqiad.wmnet with OS bookworm
  • 18:45 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
  • 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P70418 and previous config saved to /var/cache/conftool/dbconfig/20241021-184424-ladsgroup.json
  • 18:43 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)
  • 18:42 zabe@deploy2002: zabe: Continuing with sync
  • 18:42 zabe@deploy2002: zabe: Backport for s4: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:37 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
  • 18:37 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)
  • 18:36 swfrench-wmf: ran and enabled puppet agent on 'A:lvs and A:eqiad' - T377040
  • 18:35 zabe@deploy2002: Started scap sync-world: Backport for s4: Reduce revision-slots cache expiry to 60 seconds (T183490)
  • 18:32 swfrench-wmf: ran disable-puppet on 'A:lvs and (A:eqiad or A:codfw)' - T377040
  • 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P70417 and previous config saved to /var/cache/conftool/dbconfig/20241021-182916-ladsgroup.json
  • 18:23 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)
  • 18:22 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw (T377040)
  • 18:20 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T377040)
  • 18:19 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T377040)
  • 18:19 swfrench-wmf: ran and enabled pupppet agent on 'A:lvs and A:codfw' - T377040
  • 18:15 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)
  • 18:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
  • 18:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T376905)', diff saved to https://phabricator.wikimedia.org/P70416 and previous config saved to /var/cache/conftool/dbconfig/20241021-181410-ladsgroup.json
  • 18:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
  • 18:09 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T377040)
  • 18:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T376905)', diff saved to https://phabricator.wikimedia.org/P70415 and previous config saved to /var/cache/conftool/dbconfig/20241021-180654-ladsgroup.json
  • 18:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 18:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 18:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 18:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T376905)', diff saved to https://phabricator.wikimedia.org/P70414 and previous config saved to /var/cache/conftool/dbconfig/20241021-180612-ladsgroup.json
  • 18:06 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)
  • 18:05 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T377040)
  • 18:04 swfrench-wmf: ran and enabled pupppet agent on 'A:lvs and A:eqiad' - T377040
  • 17:59 swfrench-wmf: ran disable-puppet on 'A:lvs and (A:eqiad or A:codfw)' - T377040
  • 17:56 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 17:53 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1012.eqiad.wmnet with OS bookworm
  • 17:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 17:52 dduvall@deploy2002: Installing scap version "4.115.0" for 209 hosts
  • 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P70413 and previous config saved to /var/cache/conftool/dbconfig/20241021-175105-ladsgroup.json
  • 17:50 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@671896c]: Deploy T375402. (duration: 01m 04s)
  • 17:48 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@671896c]: Deploy T375402.
  • 17:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:42 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:41 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P70412 and previous config saved to /var/cache/conftool/dbconfig/20241021-173558-ladsgroup.json
  • 17:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T376905)', diff saved to https://phabricator.wikimedia.org/P70411 and previous config saved to /var/cache/conftool/dbconfig/20241021-172051-ladsgroup.json
  • 17:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T376905)', diff saved to https://phabricator.wikimedia.org/P70410 and previous config saved to /var/cache/conftool/dbconfig/20241021-171138-ladsgroup.json
  • 17:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 17:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T376905)', diff saved to https://phabricator.wikimedia.org/P70409 and previous config saved to /var/cache/conftool/dbconfig/20241021-171046-ladsgroup.json
  • 16:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70408 and previous config saved to /var/cache/conftool/dbconfig/20241021-165624-arnaudb.json
  • 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P70407 and previous config saved to /var/cache/conftool/dbconfig/20241021-165539-ladsgroup.json
  • 16:44 herron@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 16:43 herron@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 16:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70406 and previous config saved to /var/cache/conftool/dbconfig/20241021-164119-arnaudb.json
  • 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P70405 and previous config saved to /var/cache/conftool/dbconfig/20241021-164032-ladsgroup.json
  • 16:33 volans@cumin1002: dbctl commit (dc=all): 'Fix db1185 weight', diff saved to https://phabricator.wikimedia.org/P70404 and previous config saved to /var/cache/conftool/dbconfig/20241021-163355-volans.json
  • 16:32 volans@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1185 quickly with 2 steps - Testing new cookbook
  • 16:29 volans@cumin1002: START - Cookbook sre.mysql.pool db1185 quickly with 2 steps - Testing new cookbook
  • 16:29 volans@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1185 quickly with 2 steps - Testing new cookbook
  • 16:28 volans@cumin1002: START - Cookbook sre.mysql.pool db1185 quickly with 2 steps - Testing new cookbook
  • 16:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70401 and previous config saved to /var/cache/conftool/dbconfig/20241021-162613-arnaudb.json
  • 16:27 volans@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1185 - Testing new cookbook
  • 16:26 volans@cumin1002: START - Cookbook sre.mysql.depool db1185 - Testing new cookbook
  • 16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T376905)', diff saved to https://phabricator.wikimedia.org/P70399 and previous config saved to /var/cache/conftool/dbconfig/20241021-162525-ladsgroup.json
  • 16:22 volans@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) db1185 - Testing new cookbook
  • 16:22 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:18 volans@cumin1002: START - Cookbook sre.mysql.depool db1185 - Testing new cookbook
  • 16:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:17 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2213 (T376905)', diff saved to https://phabricator.wikimedia.org/P70398 and previous config saved to /var/cache/conftool/dbconfig/20241021-161701-ladsgroup.json
  • 16:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 16:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 16:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70397 and previous config saved to /var/cache/conftool/dbconfig/20241021-161634-ladsgroup.json
  • 16:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2172 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70396 and previous config saved to /var/cache/conftool/dbconfig/20241021-161108-arnaudb.json
  • 16:04 ejegg: disabled fundraising Thank You mail send jobs
  • 16:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P70395 and previous config saved to /var/cache/conftool/dbconfig/20241021-160127-ladsgroup.json
  • 15:58 volans@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1185 gradually with 4 steps - Testing new cookbook
  • 15:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:55 volans@cumin1002: START - Cookbook sre.mysql.pool db1185 gradually with 4 steps - Testing new cookbook
  • 15:53 volans@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1185 - Testing new cookbook
  • 15:53 volans@cumin1002: START - Cookbook sre.mysql.depool db1185 - Testing new cookbook
  • 15:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P70389 and previous config saved to /var/cache/conftool/dbconfig/20241021-154620-ladsgroup.json
  • 15:39 Dreamy_Jazz: Starting MediaModeration scanning script for 12 hrs on enwiki - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 15:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 15:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2172.codfw.wmnet onto db2240.codfw.wmnet
  • 15:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 15:32 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 15:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70388 and previous config saved to /var/cache/conftool/dbconfig/20241021-153113-ladsgroup.json
  • 15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70387 and previous config saved to /var/cache/conftool/dbconfig/20241021-152408-ladsgroup.json
  • 15:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 15:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 15:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70386 and previous config saved to /var/cache/conftool/dbconfig/20241021-152339-ladsgroup.json
  • 15:20 moritzm: rearm keyholder on netmon2002
  • 15:20 stran@deploy2002: Finished scap sync-world: Backport for Disable local IP view right group on meta (T377584) (duration: 20m 29s)
  • 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
  • 15:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P70385 and previous config saved to /var/cache/conftool/dbconfig/20241021-150832-ladsgroup.json
  • 15:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
  • 15:02 stran@deploy2002: stran: Continuing with sync
  • 15:01 stran@deploy2002: stran: Backport for Disable local IP view right group on meta (T377584) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:59 stran@deploy2002: Started scap sync-world: Backport for Disable local IP view right group on meta (T377584)
  • 14:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P70384 and previous config saved to /var/cache/conftool/dbconfig/20241021-145325-ladsgroup.json
  • 14:53 ejegg: disabled failing CiviCRM contact dedupe job
  • 14:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70383 and previous config saved to /var/cache/conftool/dbconfig/20241021-143818-ladsgroup.json
  • 14:33 herron@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:32 herron@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70382 and previous config saved to /var/cache/conftool/dbconfig/20241021-143108-ladsgroup.json
  • 14:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 14:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70381 and previous config saved to /var/cache/conftool/dbconfig/20241021-143042-ladsgroup.json
  • 14:29 moritzm: installing PHP 8.2 security updates
  • 14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P70380 and previous config saved to /var/cache/conftool/dbconfig/20241021-141535-ladsgroup.json
  • 14:15 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:10 stran@deploy2002: Finished scap sync-world: Backport for Disable IP reveal rights for local metawiki groups (T377584), Set redirect wiki for Special:GlobalContributions (T376612), temp accounts: Make temp accounts known on metawiki (T376132) (duration: 14m 55s)
  • 14:05 stran@deploy2002: stran, kharlan: Continuing with sync
  • 14:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P70379 and previous config saved to /var/cache/conftool/dbconfig/20241021-140028-ladsgroup.json
  • 13:57 stran@deploy2002: stran, kharlan: Backport for Disable IP reveal rights for local metawiki groups (T377584), Set redirect wiki for Special:GlobalContributions (T376612), temp accounts: Make temp accounts known on metawiki (T376132) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:55 stran@deploy2002: Started scap sync-world: Backport for Disable IP reveal rights for local metawiki groups (T377584), Set redirect wiki for Special:GlobalContributions (T376612), temp accounts: Make temp accounts known on metawiki (T376132)
  • 13:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:53 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:50 stran@deploy2002: Finished scap sync-world: Backport for Apply wmf-specific protected vars rights access (T369610) (duration: 08m 53s)
  • 13:45 stran@deploy2002: stran: Continuing with sync
  • 13:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70378 and previous config saved to /var/cache/conftool/dbconfig/20241021-134521-ladsgroup.json
  • 13:43 stran@deploy2002: stran: Backport for Apply wmf-specific protected vars rights access (T369610) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:41 stran@deploy2002: Started scap sync-world: Backport for Apply wmf-specific protected vars rights access (T369610)
  • 13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70377 and previous config saved to /var/cache/conftool/dbconfig/20241021-133619-ladsgroup.json
  • 13:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 13:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 13:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T376905)', diff saved to https://phabricator.wikimedia.org/P70376 and previous config saved to /var/cache/conftool/dbconfig/20241021-133552-ladsgroup.json
  • 13:35 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 13:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7002.magru.wmnet
  • 13:34 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert "Enable CampaignEvents collaboration list in testwiki and test2wiki" (duration: 08m 20s)
  • 13:33 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 13:33 inflatador: bking@stat1009,stat1010.mgmt racadm>>racadm set BIOS.MemSettings.NodeInterleave Enabled && racadm jobqueue create BIOS.Setup.1-1 T376813
  • 13:32 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:30 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2172.codfw.wmnet onto db2240.codfw.wmnet
  • 13:29 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 13:29 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, trainbranchbot: Continuing with sync
  • 13:28 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:28 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, trainbranchbot: Backport for Revert "Enable CampaignEvents collaboration list in testwiki and test2wiki" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:27 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:26 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert "Enable CampaignEvents collaboration list in testwiki and test2wiki"
  • 13:25 inflatador: bking@stat1008.mgmt racadm>>racadm jobqueue create BIOS.Setup.1-1
  • 13:24 inflatador: bking@stat1008.mgmt racadm>>racadm set BIOS.MemSettings.NodeInterleave Enabled T376813
  • 13:24 lucaswerkmeister-wmde@deploy2002: Sync cancelled.
  • 13:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2172 in db2240 for T373579', diff saved to https://phabricator.wikimedia.org/P70375 and previous config saved to /var/cache/conftool/dbconfig/20241021-132351-arnaudb.json
  • 13:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: provisionning db2240.codfw.wmnet - T373579
  • 13:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: provisionning db2240.codfw.wmnet - T373579
  • 13:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: provisionning db2240.codfw.wmnet - T373579
  • 13:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: provisionning db2240.codfw.wmnet - T373579
  • 13:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P70374 and previous config saved to /var/cache/conftool/dbconfig/20241021-132045-ladsgroup.json
  • 13:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2172 to clone on db2240 T373579', diff saved to https://phabricator.wikimedia.org/P70373 and previous config saved to /var/cache/conftool/dbconfig/20241021-131750-arnaudb.json
  • 13:12 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: test Ide32aa with dummy upgrade
  • 13:11 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test Ide32aa with dummy upgrade
  • 13:08 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for Enable CampaignEvents collaboration list in testwiki and test2wiki (T376055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P70372 and previous config saved to /var/cache/conftool/dbconfig/20241021-130538-ladsgroup.json
  • 13:05 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable CampaignEvents collaboration list in testwiki and test2wiki (T376055)
  • 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2035.codfw.wmnet
  • 12:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2035.codfw.wmnet
  • 12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T376905)', diff saved to https://phabricator.wikimedia.org/P70371 and previous config saved to /var/cache/conftool/dbconfig/20241021-125029-ladsgroup.json
  • 12:45 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-lab1002.eqiad.wmnet with OS bookworm
  • 12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T376905)', diff saved to https://phabricator.wikimedia.org/P70370 and previous config saved to /var/cache/conftool/dbconfig/20241021-124217-ladsgroup.json
  • 12:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 12:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 12:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70369 and previous config saved to /var/cache/conftool/dbconfig/20241021-124151-ladsgroup.json
  • 12:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P70368 and previous config saved to /var/cache/conftool/dbconfig/20241021-122644-ladsgroup.json
  • 12:24 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-lab1002.eqiad.wmnet with reason: host reimage
  • 12:21 klausman@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-lab1002.eqiad.wmnet with reason: host reimage
  • 12:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P70367 and previous config saved to /var/cache/conftool/dbconfig/20241021-121136-ladsgroup.json
  • 12:09 klausman@cumin1002: START - Cookbook sre.hosts.reimage for host ml-lab1002.eqiad.wmnet with OS bookworm
  • 12:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:00 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 11:56 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 11:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70366 and previous config saved to /var/cache/conftool/dbconfig/20241021-115629-ladsgroup.json
  • 11:52 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 11:52 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 11:52 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 11:51 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 11:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70365 and previous config saved to /var/cache/conftool/dbconfig/20241021-114723-ladsgroup.json
  • 11:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 11:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 11:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T376905)', diff saved to https://phabricator.wikimedia.org/P70364 and previous config saved to /var/cache/conftool/dbconfig/20241021-114657-ladsgroup.json
  • 11:40 moritzm: installing python-idna security updates
  • 11:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P70363 and previous config saved to /var/cache/conftool/dbconfig/20241021-113150-ladsgroup.json
  • 11:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 11:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P70362 and previous config saved to /var/cache/conftool/dbconfig/20241021-111643-ladsgroup.json
  • 11:14 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T376905)', diff saved to https://phabricator.wikimedia.org/P70361 and previous config saved to /var/cache/conftool/dbconfig/20241021-110136-ladsgroup.json
  • 10:59 moritzm: installing curl security updates
  • 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1029.eqiad.wmnet
  • 10:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T376905)', diff saved to https://phabricator.wikimedia.org/P70360 and previous config saved to /var/cache/conftool/dbconfig/20241021-105223-ladsgroup.json
  • 10:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 10:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 10:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 10:47 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1029.eqiad.wmnet
  • 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2038.codfw.wmnet to cluster codfw and group C
  • 10:31 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2038.codfw.wmnet to cluster codfw and group C
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1004.eqiad.wmnet
  • 10:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1004.eqiad.wmnet
  • 10:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1185.eqiad.wmnet with reason: testing depool/repool
  • 10:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1185.eqiad.wmnet with reason: testing depool/repool
  • 10:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 10:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1213.eqiad.wmnet with reason: testing depool/repool
  • 10:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1213.eqiad.wmnet with reason: testing depool/repool
  • 10:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1245.eqiad.wmnet with reason: testing depool/repool
  • 10:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 10:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1245.eqiad.wmnet with reason: testing depool/repool
  • 10:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 10:10 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host cloudcephmon1006.eqiad.wmnet
  • 10:08 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster staging-eqiad: containerd migration
  • 10:08 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1005.eqiad.wmnet with OS bookworm
  • 10:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 10:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 10:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephmon1006.eqiad.wmnet
  • 09:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:52 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2037.codfw.wmnet to cluster codfw and group C
  • 09:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2037.codfw.wmnet to cluster codfw and group C
  • 09:47 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
  • 09:46 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1005.eqiad.wmnet with reason: host reimage
  • 09:45 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
  • 09:42 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1005.eqiad.wmnet with reason: host reimage
  • 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2038.codfw.wmnet
  • 09:40 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
  • 09:39 dcausse@deploy2002: Finished scap sync-world: Backport for Fix phan issue with getCounter returning NullMetric|CounterMetric, Do not pass null to DataSender::sendWeightedTagsUpdate $tagWeights (T376715) (duration: 23m 26s)
  • 09:36 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1011.eqiad.wmnet
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2038.codfw.wmnet
  • 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2004.codfw.wmnet
  • 09:32 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1011.eqiad.wmnet
  • 09:31 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1010.eqiad.wmnet
  • 09:29 dcausse@deploy2002: dcausse: Continuing with sync
  • 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2037.codfw.wmnet
  • 09:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2004.codfw.wmnet
  • 09:27 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster1005.eqiad.wmnet with OS bookworm
  • 09:27 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1004.eqiad.wmnet with OS bookworm
  • 09:27 dcausse@deploy2002: dcausse: Backport for Fix phan issue with getCounter returning NullMetric|CounterMetric, Do not pass null to DataSender::sendWeightedTagsUpdate $tagWeights (T376715) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:26 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1010.eqiad.wmnet
  • 09:24 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1009.eqiad.wmnet
  • 09:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2037.codfw.wmnet
  • 09:19 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1009.eqiad.wmnet
  • 09:18 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2038.codfw.wmnet
  • 09:16 dcausse@deploy2002: Started scap sync-world: Backport for Fix phan issue with getCounter returning NullMetric|CounterMetric, Do not pass null to DataSender::sendWeightedTagsUpdate $tagWeights (T376715)
  • 09:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2038.codfw.wmnet
  • 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2037.codfw.wmnet
  • 09:11 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet
  • 09:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:10 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:09 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2037.codfw.wmnet
  • 09:06 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1004.eqiad.wmnet with reason: host reimage
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2044.codfw.wmnet
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet
  • 09:03 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1001.eqiad.wmnet
  • 09:02 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1004.eqiad.wmnet with reason: host reimage
  • 09:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet
  • 08:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2044.codfw.wmnet
  • 08:57 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1001.eqiad.wmnet
  • 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2043.codfw.wmnet
  • 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2039.codfw.wmnet
  • 08:53 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@d176c47]: (no justification provided) (duration: 00m 11s)
  • 08:53 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@d176c47]: (no justification provided)
  • 08:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2043.codfw.wmnet
  • 08:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2039.codfw.wmnet
  • 08:48 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster1004.eqiad.wmnet with OS bookworm
  • 08:47 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1003.eqiad.wmnet with OS bookworm
  • 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2040.codfw.wmnet
  • 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet
  • 08:44 jnuche@deploy2002: Installing scap version "4.114.0" for 210 hosts
  • 08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet
  • 08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2040.codfw.wmnet
  • 08:26 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1003.eqiad.wmnet with reason: host reimage
  • 08:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:23 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1003.eqiad.wmnet with reason: host reimage
  • 08:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:09 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster1003.eqiad.wmnet with OS bookworm
  • 08:09 jayme@cumin1002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-eqiad: containerd migration
  • 07:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1013.eqiad.wmnet with OS bookworm
  • 07:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:23 moritzm: installing python-reportlab security updates
  • 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast7001.wikimedia.org
  • 07:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast7001.wikimedia.org
  • 07:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: host reimage
  • 07:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: host reimage
  • 07:09 kartik@deploy2002: scap failed: <CalledProcessError> Command '['/usr/bin/scap', 'mwshell', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--', 'rm -f /srv/mediawiki-staging/php-1.43.0-wmf.27/cache/l10n/*.tmp.*']' returned non-zero exit status 126. (scap version: 4.113.0) (duration: 00m 01s)
  • 07:09 kartik@deploy2002: Started scap sync-world: Backport for Enable Special:Contribute on bnwiki
  • 07:05 kartik@deploy2002: scap failed: <CalledProcessError> Command '['/usr/bin/scap', 'mwshell', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--', 'rm -f /srv/mediawiki-staging/php-1.43.0-wmf.27/cache/l10n/*.tmp.*']' returned non-zero exit status 126. (scap version: 4.113.0) (duration: 00m 01s)
  • 07:05 kartik@deploy2002: Started scap sync-world: Backport for Enable Special:Contribute on bnwiki
  • 06:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 153087
  • 06:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 153087
  • 06:58 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 153087
  • 06:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 153087
  • 06:56 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1013.eqiad.wmnet with OS bookworm
  • 06:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 06:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 06:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P70359 and previous config saved to /var/cache/conftool/dbconfig/20241021-000434-ladsgroup.json
  • 00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance

2024-10-20

  • 21:19 eileen: civicrm upgraded from 77ea54bc to cfb0def0
  • 09:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T367856)', diff saved to https://phabricator.wikimedia.org/P70358 and previous config saved to /var/cache/conftool/dbconfig/20241020-095904-ladsgroup.json
  • 09:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70357 and previous config saved to /var/cache/conftool/dbconfig/20241020-094357-ladsgroup.json
  • 09:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70356 and previous config saved to /var/cache/conftool/dbconfig/20241020-092850-ladsgroup.json
  • 09:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T367856)', diff saved to https://phabricator.wikimedia.org/P70355 and previous config saved to /var/cache/conftool/dbconfig/20241020-091344-ladsgroup.json

2024-10-19

  • 00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 00:13 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART

2024-10-18

  • 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 22:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 21:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 21:45 dduvall@deploy2002: Finished deploy [releng/jenkins-deploy@8c1070f] (releasing): deploying changes to publishMWSingleVersion job (duration: 01m 06s)
  • 21:44 dduvall@deploy2002: Started deploy [releng/jenkins-deploy@8c1070f] (releasing): deploying changes to publishMWSingleVersion job
  • 20:23 dduvall: deployed scap release 4.113.0 to releases{1003,2003} hosts
  • 20:22 dduvall@deploy2002: Installing scap version "4.113.0" for 2 hosts
  • 20:21 dduvall@deploy2002: install-world aborted: (no justification provided) (duration: 00m 52s)
  • 20:20 dduvall@deploy2002: Installing scap version "latest" for 2 hosts
  • 19:09 tzatziki: removing 3 files for legal compliance
  • 18:56 tzatziki: removing 1 file for legal compliance
  • 16:54 dzahn@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:54 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster staging-codfw: containerd migration
  • 16:54 dzahn@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:54 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2005.codfw.wmnet with OS bookworm
  • 16:32 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
  • 16:28 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
  • 16:10 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2005.codfw.wmnet with OS bookworm
  • 16:09 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2004.codfw.wmnet with OS bookworm
  • 15:46 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2004.codfw.wmnet with reason: host reimage
  • 15:43 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2004.codfw.wmnet with reason: host reimage
  • 15:26 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2004.codfw.wmnet with OS bookworm
  • 15:26 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2003.codfw.wmnet with OS bookworm
  • 15:02 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
  • 14:59 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:58 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
  • 14:57 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 14:53 akosiaris@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:53 akosiaris@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Removal of old mx records and api.svc records - akosiaris@cumin1002"
  • 14:52 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Removal of old mx records and api.svc records - akosiaris@cumin1002"
  • 14:48 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@e44bacc]: Deploying updated dumps reconciliation (duration: 00m 31s)
  • 14:47 milimetric@deploy2002: Started deploy [airflow-dags/analytics@e44bacc]: Deploying updated dumps reconciliation
  • 14:39 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2003.codfw.wmnet with OS bookworm
  • 14:38 jayme@cumin1002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-codfw: containerd migration
  • 14:37 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1013.eqiad.wmnet
  • 14:37 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1013.eqiad.wmnet
  • 14:25 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 14:09 sergi0: Running `foreachwiki userOptions.php --delete-defaults growthexperiments-homepage-variant` (T374544, T375753)
  • 13:47 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 13:46 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 13:32 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Hardware replacement
  • 13:31 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Hardware replacement
  • 13:22 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@f020959]: Deploying updated dumps reconciliation (duration: 00m 31s)
  • 13:22 milimetric@deploy2002: Started deploy [airflow-dags/analytics@f020959]: Deploying updated dumps reconciliation
  • 13:03 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 12:22 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:22 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 12:22 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 12:21 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 11:43 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster staging-codfw: containerd migration
  • 11:43 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2005.codfw.wmnet with OS bookworm
  • 11:31 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbstore1009.eqiad.wmnet
  • 11:31 btullis@cumin1002: START - Cookbook sre.hosts.remove-downtime for dbstore1009.eqiad.wmnet
  • 11:21 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
  • 11:17 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
  • 11:00 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2005.codfw.wmnet with OS bookworm
  • 11:00 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 10:59 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 10:59 jayme@cumin1002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-codfw: containerd migration
  • 10:58 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=kubestagemaster2005.codfw.wmnet
  • 10:39 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=99) Reimaging k8s control planes of cluster staging-codfw: containerd migration
  • 10:38 jayme@cumin1002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-codfw: containerd migration
  • 10:37 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=kubestagemaster2005.codfw.wmnet
  • 10:37 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=kubestagemaster2005.codfw.wmnet
  • 10:37 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 10:26 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 09:47 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 09:45 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 09:45 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 09:43 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 09:42 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 09:41 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 09:36 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync
  • 09:35 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync
  • 09:35 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync
  • 09:33 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync
  • 09:33 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync
  • 09:33 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync
  • 09:14 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 09:11 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 09:10 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 08:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T367856)', diff saved to https://phabricator.wikimedia.org/P70348 and previous config saved to /var/cache/conftool/dbconfig/20241018-080343-ladsgroup.json
  • 08:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 08:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70347 and previous config saved to /var/cache/conftool/dbconfig/20241018-015152-ladsgroup.json
  • 01:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P70346 and previous config saved to /var/cache/conftool/dbconfig/20241018-013645-ladsgroup.json
  • 01:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P70345 and previous config saved to /var/cache/conftool/dbconfig/20241018-012138-ladsgroup.json
  • 01:16 eileen: civicrm upgraded from b0508a22 to 77ea54bc
  • 01:16 eileen: ,
  • 01:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70344 and previous config saved to /var/cache/conftool/dbconfig/20241018-010631-ladsgroup.json
  • 00:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70343 and previous config saved to /var/cache/conftool/dbconfig/20241018-005819-ladsgroup.json
  • 00:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2238.codfw.wmnet with reason: Maintenance
  • 00:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2238.codfw.wmnet with reason: Maintenance
  • 00:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T376905)', diff saved to https://phabricator.wikimedia.org/P70342 and previous config saved to /var/cache/conftool/dbconfig/20241018-005752-ladsgroup.json
  • 00:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove mgmt DNS entries for old frack switches - pt1979@cumin2002"
  • 00:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P70341 and previous config saved to /var/cache/conftool/dbconfig/20241018-004245-ladsgroup.json
  • 00:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove mgmt DNS entries for old frack switches - pt1979@cumin2002"
  • 00:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P70340 and previous config saved to /var/cache/conftool/dbconfig/20241018-002738-ladsgroup.json
  • 00:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T376905)', diff saved to https://phabricator.wikimedia.org/P70339 and previous config saved to /var/cache/conftool/dbconfig/20241018-001231-ladsgroup.json
  • 00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2225 (T376905)', diff saved to https://phabricator.wikimedia.org/P70338 and previous config saved to /var/cache/conftool/dbconfig/20241018-000422-ladsgroup.json
  • 00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2225.codfw.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2225.codfw.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70337 and previous config saved to /var/cache/conftool/dbconfig/20241018-000356-ladsgroup.json

2024-10-17

  • 23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P70336 and previous config saved to /var/cache/conftool/dbconfig/20241017-234849-ladsgroup.json
  • 23:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P70335 and previous config saved to /var/cache/conftool/dbconfig/20241017-233342-ladsgroup.json
  • 23:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70334 and previous config saved to /var/cache/conftool/dbconfig/20241017-231835-ladsgroup.json
  • 23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T376905)', diff saved to https://phabricator.wikimedia.org/P70333 and previous config saved to /var/cache/conftool/dbconfig/20241017-231037-ladsgroup.json
  • 23:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 23:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 23:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 23:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T376905)', diff saved to https://phabricator.wikimedia.org/P70332 and previous config saved to /var/cache/conftool/dbconfig/20241017-230457-ladsgroup.json
  • 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P70331 and previous config saved to /var/cache/conftool/dbconfig/20241017-224950-ladsgroup.json
  • 22:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 22:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 22:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T376905)', diff saved to https://phabricator.wikimedia.org/P70330 and previous config saved to /var/cache/conftool/dbconfig/20241017-224209-ladsgroup.json
  • 22:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P70329 and previous config saved to /var/cache/conftool/dbconfig/20241017-223443-ladsgroup.json
  • 22:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P70328 and previous config saved to /var/cache/conftool/dbconfig/20241017-222702-ladsgroup.json
  • 22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T376905)', diff saved to https://phabricator.wikimedia.org/P70327 and previous config saved to /var/cache/conftool/dbconfig/20241017-221936-ladsgroup.json
  • 22:15 eileen: civicrm upgraded from f980ace9 to b0508a22
  • 22:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P70326 and previous config saved to /var/cache/conftool/dbconfig/20241017-221155-ladsgroup.json
  • 22:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T376905)', diff saved to https://phabricator.wikimedia.org/P70325 and previous config saved to /var/cache/conftool/dbconfig/20241017-221123-ladsgroup.json
  • 22:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 22:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 22:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70324 and previous config saved to /var/cache/conftool/dbconfig/20241017-221057-ladsgroup.json
  • 21:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T376905)', diff saved to https://phabricator.wikimedia.org/P70323 and previous config saved to /var/cache/conftool/dbconfig/20241017-215648-ladsgroup.json
  • 21:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P70322 and previous config saved to /var/cache/conftool/dbconfig/20241017-215550-ladsgroup.json
  • 21:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T376905)', diff saved to https://phabricator.wikimedia.org/P70321 and previous config saved to /var/cache/conftool/dbconfig/20241017-215014-ladsgroup.json
  • 21:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 21:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T376905)', diff saved to https://phabricator.wikimedia.org/P70320 and previous config saved to /var/cache/conftool/dbconfig/20241017-214949-ladsgroup.json
  • 21:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P70319 and previous config saved to /var/cache/conftool/dbconfig/20241017-214043-ladsgroup.json
  • 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P70318 and previous config saved to /var/cache/conftool/dbconfig/20241017-213442-ladsgroup.json
  • 21:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70317 and previous config saved to /var/cache/conftool/dbconfig/20241017-212536-ladsgroup.json
  • 21:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P70316 and previous config saved to /var/cache/conftool/dbconfig/20241017-211935-ladsgroup.json
  • 21:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70315 and previous config saved to /var/cache/conftool/dbconfig/20241017-211458-ladsgroup.json
  • 21:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 21:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 21:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T376905)', diff saved to https://phabricator.wikimedia.org/P70314 and previous config saved to /var/cache/conftool/dbconfig/20241017-211432-ladsgroup.json
  • 21:11 kindrobot: UTC late backport window finished <3
  • 21:08 kindrobot: results of de-duping: https://phabricator.wikimedia.org/P70313
  • 21:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T376905)', diff saved to https://phabricator.wikimedia.org/P70312 and previous config saved to /var/cache/conftool/dbconfig/20241017-210428-ladsgroup.json
  • 21:01 kindrobot: ran mwscript-k8s -f --comment="https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1080078/comments/02a9334e_cd3e7a0e" -- namespaceDupes.php on: bclwikisource, bewwiki, gorwikiquote, iglwiki, kaawiktionary, kgewiki, kuswiki, madwiktionary, moswiki, nrwiki, rskwiki, shnwikinews, and tddwiki
  • 20:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P70311 and previous config saved to /var/cache/conftool/dbconfig/20241017-205925-ladsgroup.json
  • 20:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T376905)', diff saved to https://phabricator.wikimedia.org/P70310 and previous config saved to /var/cache/conftool/dbconfig/20241017-205655-ladsgroup.json
  • 20:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T376905)', diff saved to https://phabricator.wikimedia.org/P70309 and previous config saved to /var/cache/conftool/dbconfig/20241017-205612-ladsgroup.json
  • 20:52 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:51 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:50 eileen: config revision changed from 150b02a9 to 0d019da0
  • 20:50 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:50 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 20:49 eileen: config revision changed from 3b3e5cad to 0d019da0
  • 20:48 kindrobot@deploy2002: Finished scap sync-world: Backport for Configure namespaces, sitenames, and timezones for new wikis (T377160 T375102 T375017 T375424 T376572 T377088 T374644 T375024 T374815 T375095 T375433 T360303 T363256 T360310) (duration: 31m 15s)
  • 20:46 eileen: config revision changed from bf02494d to 3b3e5cad
  • 20:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P70308 and previous config saved to /var/cache/conftool/dbconfig/20241017-204418-ladsgroup.json
  • 20:43 kindrobot@deploy2002: pppery, kindrobot: Continuing with sync
  • 20:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P70307 and previous config saved to /var/cache/conftool/dbconfig/20241017-204105-ladsgroup.json
  • 20:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T376905)', diff saved to https://phabricator.wikimedia.org/P70306 and previous config saved to /var/cache/conftool/dbconfig/20241017-202911-ladsgroup.json
  • 20:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P70305 and previous config saved to /var/cache/conftool/dbconfig/20241017-202558-ladsgroup.json
  • 20:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T376905)', diff saved to https://phabricator.wikimedia.org/P70304 and previous config saved to /var/cache/conftool/dbconfig/20241017-201944-ladsgroup.json
  • 20:20 kindrobot@deploy2002: pppery, kindrobot: Backport for Configure namespaces, sitenames, and timezones for new wikis (T377160 T375102 T375017 T375424 T376572 T377088 T374644 T375024 T374815 T375095 T375433 T360303 T363256 T360310) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 20:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 20:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T376905)', diff saved to https://phabricator.wikimedia.org/P70303 and previous config saved to /var/cache/conftool/dbconfig/20241017-201919-ladsgroup.json
  • 20:17 kindrobot@deploy2002: Started scap sync-world: Backport for Configure namespaces, sitenames, and timezones for new wikis (T377160 T375102 T375017 T375424 T376572 T377088 T374644 T375024 T374815 T375095 T375433 T360303 T363256 T360310)
  • 20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T376905)', diff saved to https://phabricator.wikimedia.org/P70302 and previous config saved to /var/cache/conftool/dbconfig/20241017-201051-ladsgroup.json
  • 20:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P70301 and previous config saved to /var/cache/conftool/dbconfig/20241017-200412-ladsgroup.json
  • 20:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T376905)', diff saved to https://phabricator.wikimedia.org/P70300 and previous config saved to /var/cache/conftool/dbconfig/20241017-200147-ladsgroup.json
  • 20:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70299 and previous config saved to /var/cache/conftool/dbconfig/20241017-200122-ladsgroup.json
  • 19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P70298 and previous config saved to /var/cache/conftool/dbconfig/20241017-194905-ladsgroup.json
  • 19:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70297 and previous config saved to /var/cache/conftool/dbconfig/20241017-194615-ladsgroup.json
  • 19:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T376905)', diff saved to https://phabricator.wikimedia.org/P70296 and previous config saved to /var/cache/conftool/dbconfig/20241017-193358-ladsgroup.json
  • 19:33 swfrench-wmf: ran authdns-update to pick up records for mw-(web|api-ext)-next in svc - T377040
  • 19:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70295 and previous config saved to /var/cache/conftool/dbconfig/20241017-193108-ladsgroup.json
  • 19:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T376905)', diff saved to https://phabricator.wikimedia.org/P70294 and previous config saved to /var/cache/conftool/dbconfig/20241017-192424-ladsgroup.json
  • 19:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 19:18 dancy@deploy2002: Finished scap sync-world: testing https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/484 (duration: 02m 46s)
  • 19:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70293 and previous config saved to /var/cache/conftool/dbconfig/20241017-191601-ladsgroup.json
  • 19:15 dancy@deploy2002: Started scap sync-world: testing https://gitlab.wikimedia.org/repos/releng/scap/-/merge_requests/484
  • 19:13 dancy@deploy2002: Installing scap version "4.112.0" for 1 hosts
  • 19:07 dancy@deploy2002: Installing scap version "4.112.0" for 210 hosts
  • 19:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T376905)', diff saved to https://phabricator.wikimedia.org/P70292 and previous config saved to /var/cache/conftool/dbconfig/20241017-190655-ladsgroup.json
  • 19:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 19:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 19:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 18:54 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 18:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 18:53 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 18:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 18:49 dancy@deploy2002: Finished scap sync-world: testing scap 4.111.0 (duration: 02m 44s)
  • 18:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 18:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 18:48 urbanecm: mwscript-k8s --comment=T377360 -f -- extensions/Flow/maintenance/FlowFixInconsistentBoards.php --wiki=wikidatawiki # T377360
  • 18:47 dancy@deploy2002: Started scap sync-world: testing scap 4.111.0
  • 18:45 dancy@deploy2002: Installation of scap version "4.111.0" completed for 210 hosts
  • 18:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70291 and previous config saved to /var/cache/conftool/dbconfig/20241017-184402-arnaudb.json
  • 18:41 dancy@deploy2002: Installing scap version "4.111.0" for 210 hosts
  • 18:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70290 and previous config saved to /var/cache/conftool/dbconfig/20241017-182855-arnaudb.json
  • 18:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 18:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 18:19 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.27 refs T375658
  • 18:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2081.codfw.wmnet with OS bullseye
  • 18:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70289 and previous config saved to /var/cache/conftool/dbconfig/20241017-181348-arnaudb.json
  • 17:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70288 and previous config saved to /var/cache/conftool/dbconfig/20241017-175841-arnaudb.json
  • 17:56 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 17:34 swfrench@deploy2002: Finished scap sync-world: Testing scap after mw-api-ext / mw-web next release bring up - T377040 (duration: 02m 54s)
  • 17:31 swfrench@deploy2002: Started scap sync-world: Testing scap after mw-api-ext / mw-web next release bring up - T377040
  • 17:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70287 and previous config saved to /var/cache/conftool/dbconfig/20241017-171844-ladsgroup.json
  • 17:18 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:17 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 17:17 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:16 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 17:15 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:15 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
  • 17:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:12 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:07 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:06 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70286 and previous config saved to /var/cache/conftool/dbconfig/20241017-170337-ladsgroup.json
  • 16:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P70285 and previous config saved to /var/cache/conftool/dbconfig/20241017-165814-arnaudb.json
  • 16:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 16:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 16:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70284 and previous config saved to /var/cache/conftool/dbconfig/20241017-165803-arnaudb.json
  • 16:55 mutante: phab2002 T377396 - reboot | in addition to /etc/passwd also fix aphlict GID in /etc/group | fixed puppet run which can now create group vcs. now equivalent to prod server phab1004.
  • 16:53 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:52 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 16:52 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:52 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:51 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:51 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:50 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 16:49 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 16:49 mutante: phab2002 T377396 - fix UIDs/GIDs for phab-related system users: vcs: uid 496 -> 497 | aphlict: uid 497 -> uid 496, gid 497 -> gid 496 | chown aphlict:aphlict /var/log/aphlict | chown aphlict:aphlict /run/aphlict
  • 16:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70283 and previous config saved to /var/cache/conftool/dbconfig/20241017-164830-ladsgroup.json
  • 16:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70282 and previous config saved to /var/cache/conftool/dbconfig/20241017-164256-arnaudb.json
  • 16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 16:40 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 16:38 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 16:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70281 and previous config saved to /var/cache/conftool/dbconfig/20241017-163324-ladsgroup.json
  • 16:28 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 16:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70280 and previous config saved to /var/cache/conftool/dbconfig/20241017-162749-arnaudb.json
  • 16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70279 and previous config saved to /var/cache/conftool/dbconfig/20241017-161242-arnaudb.json
  • 16:02 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 16:01 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 16:00 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 16:00 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 15:59 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:59 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 15:59 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:58 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 15:58 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:58 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:57 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 15:57 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 15:56 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 15:56 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 15:52 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 15:51 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 15:51 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 15:50 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 15:48 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 15:48 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 15:47 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 15:47 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 15:45 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:45 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:44 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:44 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:41 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:40 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:39 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2005.codfw.wmnet with OS bookworm
  • 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P70278 and previous config saved to /var/cache/conftool/dbconfig/20241017-153546-ladsgroup.json
  • 15:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70277 and previous config saved to /var/cache/conftool/dbconfig/20241017-153257-ladsgroup.json
  • 15:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 15:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 15:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70276 and previous config saved to /var/cache/conftool/dbconfig/20241017-153238-ladsgroup.json
  • 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P70275 and previous config saved to /var/cache/conftool/dbconfig/20241017-152040-ladsgroup.json
  • 15:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70274 and previous config saved to /var/cache/conftool/dbconfig/20241017-151731-ladsgroup.json
  • 15:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P70273 and previous config saved to /var/cache/conftool/dbconfig/20241017-151216-arnaudb.json
  • 15:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 15:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70272 and previous config saved to /var/cache/conftool/dbconfig/20241017-151204-arnaudb.json
  • 15:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P70271 and previous config saved to /var/cache/conftool/dbconfig/20241017-150535-ladsgroup.json
  • 15:05 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:05 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 15:03 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 15:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70270 and previous config saved to /var/cache/conftool/dbconfig/20241017-150224-ladsgroup.json
  • 15:01 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 15:00 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 15:00 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:59 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70269 and previous config saved to /var/cache/conftool/dbconfig/20241017-145657-arnaudb.json
  • 14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:54 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:53 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:53 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:52 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P70268 and previous config saved to /var/cache/conftool/dbconfig/20241017-145030-ladsgroup.json
  • 14:51 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:51 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:51 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:50 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:50 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:48 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:48 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:47 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70267 and previous config saved to /var/cache/conftool/dbconfig/20241017-144717-ladsgroup.json
  • 14:43 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:43 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70266 and previous config saved to /var/cache/conftool/dbconfig/20241017-144150-arnaudb.json
  • 14:41 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 14:39 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:38 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:38 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:31 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
  • 14:28 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
  • 14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70265 and previous config saved to /var/cache/conftool/dbconfig/20241017-142643-arnaudb.json
  • 14:09 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2005.codfw.wmnet with OS bookworm
  • 14:08 urbanecm@deploy2002: Finished scap sync-world: Backport for Bump wikimedia/parsoid to 0.20.0-a26 (T377287), Bump wikimedia/parsoid to 0.20.0-a26 (T377287) (duration: 09m 41s)
  • 14:03 urbanecm@deploy2002: cscott, urbanecm: Continuing with sync
  • 14:00 urbanecm@deploy2002: cscott, urbanecm: Backport for Bump wikimedia/parsoid to 0.20.0-a26 (T377287), Bump wikimedia/parsoid to 0.20.0-a26 (T377287) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:00 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 13:59 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 13:58 urbanecm@deploy2002: Started scap sync-world: Backport for Bump wikimedia/parsoid to 0.20.0-a26 (T377287), Bump wikimedia/parsoid to 0.20.0-a26 (T377287)
  • 13:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70264 and previous config saved to /var/cache/conftool/dbconfig/20241017-134651-ladsgroup.json
  • 13:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 13:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70263 and previous config saved to /var/cache/conftool/dbconfig/20241017-134636-ladsgroup.json
  • 13:35 urbanecm@deploy2002: Finished scap sync-world: Backport for Set $wgAllowRawHtmlCopyrightMessages = false (T375789), tests: ensure maintenance base class has always been requierd (T377391 T357535) (duration: 08m 07s)
  • 13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70261 and previous config saved to /var/cache/conftool/dbconfig/20241017-133129-ladsgroup.json
  • 13:30 urbanecm@deploy2002: cscott, urbanecm, matmarex: Continuing with sync
  • 13:29 urbanecm@deploy2002: cscott, urbanecm, matmarex: Backport for Set $wgAllowRawHtmlCopyrightMessages = false (T375789), tests: ensure maintenance base class has always been requierd (T377391 T357535) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:29 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript updateCollation.php --wiki=cswikivoyage --previous-collation=uppercase # T377446
  • 13:27 urbanecm@deploy2002: Started scap sync-world: Backport for Set $wgAllowRawHtmlCopyrightMessages = false (T375789), tests: ensure maintenance base class has always been requierd (T377391 T357535)
  • 13:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70260 and previous config saved to /var/cache/conftool/dbconfig/20241017-132617-arnaudb.json
  • 13:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 13:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 13:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 13:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 13:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 13:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 13:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 13:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 13:22 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:22 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:18 inflatador: bking@wdqs1015 depooling to catch up on lag
  • 13:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70258 and previous config saved to /var/cache/conftool/dbconfig/20241017-131622-ladsgroup.json
  • 13:14 urbanecm@deploy2002: Finished scap sync-world: Backport for cswikivoyage: Set category collation to uca-cs-u-kn (T377446), QuickSurveys: Update safety survey coverage (T376517) (duration: 07m 23s)
  • 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T376905)', diff saved to https://phabricator.wikimedia.org/P70257 and previous config saved to /var/cache/conftool/dbconfig/20241017-131012-ladsgroup.json
  • 13:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 13:10 urbanecm@deploy2002: kharlan, urbanecm: Continuing with sync
  • 13:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 13:09 urbanecm@deploy2002: kharlan, urbanecm: Backport for cswikivoyage: Set category collation to uca-cs-u-kn (T377446), QuickSurveys: Update safety survey coverage (T376517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70256 and previous config saved to /var/cache/conftool/dbconfig/20241017-130947-ladsgroup.json
  • 13:09 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:07 urbanecm@deploy2002: Started scap sync-world: Backport for cswikivoyage: Set category collation to uca-cs-u-kn (T377446), QuickSurveys: Update safety survey coverage (T376517)
  • 13:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70255 and previous config saved to /var/cache/conftool/dbconfig/20241017-130115-ladsgroup.json
  • 13:00 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2005.codfw.wmnet with OS bookworm
  • 12:59 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 12:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P70254 and previous config saved to /var/cache/conftool/dbconfig/20241017-125440-ladsgroup.json
  • 12:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P70253 and previous config saved to /var/cache/conftool/dbconfig/20241017-123932-ladsgroup.json
  • 12:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70252 and previous config saved to /var/cache/conftool/dbconfig/20241017-122425-ladsgroup.json
  • 12:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T376905)', diff saved to https://phabricator.wikimedia.org/P70251 and previous config saved to /var/cache/conftool/dbconfig/20241017-121525-ladsgroup.json
  • 12:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 12:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70250 and previous config saved to /var/cache/conftool/dbconfig/20241017-120049-ladsgroup.json
  • 12:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70249 and previous config saved to /var/cache/conftool/dbconfig/20241017-120029-ladsgroup.json
  • 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70248 and previous config saved to /var/cache/conftool/dbconfig/20241017-114522-ladsgroup.json
  • 11:39 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1177.eqiad.wmnet
  • 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70247 and previous config saved to /var/cache/conftool/dbconfig/20241017-113014-ladsgroup.json
  • 11:29 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1177.eqiad.wmnet
  • 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70246 and previous config saved to /var/cache/conftool/dbconfig/20241017-111507-ladsgroup.json
  • 11:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70245 and previous config saved to /var/cache/conftool/dbconfig/20241017-110527-ladsgroup.json
  • 11:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 11:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 10:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 10:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 10:17 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubestagemaster2005.codfw.wmnet with reason: reimage
  • 10:17 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubestagemaster2005.codfw.wmnet with reason: reimage
  • 09:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:34 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host phab2002.codfw.wmnet with OS bullseye
  • 09:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 09:09 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add support for read-only users - oblivian@cumin1002"
  • 09:09 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add support for read-only users - oblivian@cumin1002
  • 09:08 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add support for read-only users - oblivian@cumin1002
  • 09:08 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add support for read-only users - oblivian@cumin1002"
  • 09:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: post clone', diff saved to https://phabricator.wikimedia.org/P70243 and previous config saved to /var/cache/conftool/dbconfig/20241017-090731-arnaudb.json
  • 08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: post clone', diff saved to https://phabricator.wikimedia.org/P70242 and previous config saved to /var/cache/conftool/dbconfig/20241017-085226-arnaudb.json
  • 08:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: post clone', diff saved to https://phabricator.wikimedia.org/P70241 and previous config saved to /var/cache/conftool/dbconfig/20241017-083721-arnaudb.json
  • 08:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70240 and previous config saved to /var/cache/conftool/dbconfig/20241017-082215-arnaudb.json
  • 08:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2149 to reclone on db2205 - T377276', diff saved to https://phabricator.wikimedia.org/P70239 and previous config saved to /var/cache/conftool/dbconfig/20241017-081822-arnaudb.json
  • 08:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: post clone', diff saved to https://phabricator.wikimedia.org/P70238 and previous config saved to /var/cache/conftool/dbconfig/20241017-081802-arnaudb.json
  • 08:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2149.codfw.wmnet onto db2205.codfw.wmnet
  • 08:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
  • 08:01 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
  • 07:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:55 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:51 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
  • 07:48 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
  • 07:37 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 07:37 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 07:37 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:28 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestagemaster2005.codfw.wmnet with OS bookworm
  • 07:19 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: cleanup removed label_count field on next re-index (T377226) (duration: 10m 40s)
  • 07:18 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=kubestagemaster2005.codfw.wmnet
  • 07:14 dcausse@deploy2002: dcausse: Continuing with sync
  • 07:13 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kubestagemaster2005.codfw.wmnet with reason: reimage
  • 07:13 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on kubestagemaster2005.codfw.wmnet with reason: reimage
  • 07:13 dcausse@deploy2002: dcausse: Backport for cirrus: cleanup removed label_count field on next re-index (T377226) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:08 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: cleanup removed label_count field on next re-index (T377226)
  • 07:00 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2205.codfw.wmnet
  • 07:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2149 to reclone on db2205 - T377276', diff saved to https://phabricator.wikimedia.org/P70237 and previous config saved to /var/cache/conftool/dbconfig/20241017-070015-arnaudb.json
  • 06:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2205.codfw.wmnet with OS bookworm
  • 06:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 100%: T367781', diff saved to https://phabricator.wikimedia.org/P70236 and previous config saved to /var/cache/conftool/dbconfig/20241017-063238-arnaudb.json
  • 06:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2205.codfw.wmnet with reason: host reimage
  • 06:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2205.codfw.wmnet with reason: host reimage
  • 06:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 75%: T367781', diff saved to https://phabricator.wikimedia.org/P70235 and previous config saved to /var/cache/conftool/dbconfig/20241017-061732-arnaudb.json
  • 06:07 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2205.codfw.wmnet with OS bookworm
  • 06:02 arnaudb@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 50%: T367781', diff saved to https://phabricator.wikimedia.org/P70234 and previous config saved to /var/cache/conftool/dbconfig/20241017-060227-arnaudb.json
  • 05:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1219 (re)pooling @ 25%: T367781', diff saved to https://phabricator.wikimedia.org/P70233 and previous config saved to /var/cache/conftool/dbconfig/20241017-054722-arnaudb.json
  • 05:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T376905)', diff saved to https://phabricator.wikimedia.org/P70231 and previous config saved to /var/cache/conftool/dbconfig/20241017-051700-ladsgroup.json
  • 05:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P70230 and previous config saved to /var/cache/conftool/dbconfig/20241017-050153-ladsgroup.json
  • 04:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P70229 and previous config saved to /var/cache/conftool/dbconfig/20241017-044646-ladsgroup.json
  • 04:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T376905)', diff saved to https://phabricator.wikimedia.org/P70228 and previous config saved to /var/cache/conftool/dbconfig/20241017-043139-ladsgroup.json
  • 04:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2222 (T376905)', diff saved to https://phabricator.wikimedia.org/P70227 and previous config saved to /var/cache/conftool/dbconfig/20241017-042440-ladsgroup.json
  • 04:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2222.codfw.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2222.codfw.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70226 and previous config saved to /var/cache/conftool/dbconfig/20241017-042413-ladsgroup.json
  • 04:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P70225 and previous config saved to /var/cache/conftool/dbconfig/20241017-040906-ladsgroup.json
  • 03:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P70224 and previous config saved to /var/cache/conftool/dbconfig/20241017-035359-ladsgroup.json
  • 03:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70223 and previous config saved to /var/cache/conftool/dbconfig/20241017-033852-ladsgroup.json
  • 03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70222 and previous config saved to /var/cache/conftool/dbconfig/20241017-033144-ladsgroup.json
  • 03:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2221.codfw.wmnet with reason: Maintenance
  • 03:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2221.codfw.wmnet with reason: Maintenance
  • 03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T376905)', diff saved to https://phabricator.wikimedia.org/P70221 and previous config saved to /var/cache/conftool/dbconfig/20241017-033118-ladsgroup.json
  • 03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P70220 and previous config saved to /var/cache/conftool/dbconfig/20241017-031611-ladsgroup.json
  • 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P70219 and previous config saved to /var/cache/conftool/dbconfig/20241017-030104-ladsgroup.json
  • 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T376905)', diff saved to https://phabricator.wikimedia.org/P70218 and previous config saved to /var/cache/conftool/dbconfig/20241017-024557-ladsgroup.json
  • 02:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T376905)', diff saved to https://phabricator.wikimedia.org/P70217 and previous config saved to /var/cache/conftool/dbconfig/20241017-023857-ladsgroup.json
  • 02:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 02:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 02:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T376905)', diff saved to https://phabricator.wikimedia.org/P70216 and previous config saved to /var/cache/conftool/dbconfig/20241017-023831-ladsgroup.json
  • 02:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P70215 and previous config saved to /var/cache/conftool/dbconfig/20241017-022324-ladsgroup.json
  • 02:18 tstarling@deploy2002: Synchronized wmf-config/InitialiseSettings.php: T4085 Enable en on Commons and Meta (duration: 06m 34s)
  • 02:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P70214 and previous config saved to /var/cache/conftool/dbconfig/20241017-020817-ladsgroup.json
  • 01:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T376905)', diff saved to https://phabricator.wikimedia.org/P70213 and previous config saved to /var/cache/conftool/dbconfig/20241017-015310-ladsgroup.json
  • 01:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T376905)', diff saved to https://phabricator.wikimedia.org/P70212 and previous config saved to /var/cache/conftool/dbconfig/20241017-014500-ladsgroup.json
  • 01:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 01:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 01:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T376905)', diff saved to https://phabricator.wikimedia.org/P70211 and previous config saved to /var/cache/conftool/dbconfig/20241017-013926-ladsgroup.json
  • 01:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P70210 and previous config saved to /var/cache/conftool/dbconfig/20241017-012419-ladsgroup.json
  • 01:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P70209 and previous config saved to /var/cache/conftool/dbconfig/20241017-010912-ladsgroup.json
  • 00:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T376905)', diff saved to https://phabricator.wikimedia.org/P70208 and previous config saved to /var/cache/conftool/dbconfig/20241017-005405-ladsgroup.json
  • 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T376905)', diff saved to https://phabricator.wikimedia.org/P70207 and previous config saved to /var/cache/conftool/dbconfig/20241017-004537-ladsgroup.json
  • 00:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 00:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T376905)', diff saved to https://phabricator.wikimedia.org/P70206 and previous config saved to /var/cache/conftool/dbconfig/20241017-004511-ladsgroup.json
  • 00:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P70204 and previous config saved to /var/cache/conftool/dbconfig/20241017-003004-ladsgroup.json
  • 00:26 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 00:25 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 00:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P70203 and previous config saved to /var/cache/conftool/dbconfig/20241017-001457-ladsgroup.json

2024-10-16

  • 23:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T376905)', diff saved to https://phabricator.wikimedia.org/P70202 and previous config saved to /var/cache/conftool/dbconfig/20241016-235950-ladsgroup.json
  • 23:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T376905)', diff saved to https://phabricator.wikimedia.org/P70201 and previous config saved to /var/cache/conftool/dbconfig/20241016-235129-ladsgroup.json
  • 23:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T376905)', diff saved to https://phabricator.wikimedia.org/P70200 and previous config saved to /var/cache/conftool/dbconfig/20241016-235102-ladsgroup.json
  • 23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P70199 and previous config saved to /var/cache/conftool/dbconfig/20241016-233555-ladsgroup.json
  • 23:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P70198 and previous config saved to /var/cache/conftool/dbconfig/20241016-232048-ladsgroup.json
  • 23:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T376905)', diff saved to https://phabricator.wikimedia.org/P70197 and previous config saved to /var/cache/conftool/dbconfig/20241016-230541-ladsgroup.json
  • 22:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T376905)', diff saved to https://phabricator.wikimedia.org/P70196 and previous config saved to /var/cache/conftool/dbconfig/20241016-225716-ladsgroup.json
  • 22:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 22:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 22:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 22:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 22:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T376905)', diff saved to https://phabricator.wikimedia.org/P70195 and previous config saved to /var/cache/conftool/dbconfig/20241016-225646-ladsgroup.json
  • 22:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P70194 and previous config saved to /var/cache/conftool/dbconfig/20241016-224139-ladsgroup.json
  • 22:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P70193 and previous config saved to /var/cache/conftool/dbconfig/20241016-222632-ladsgroup.json
  • 22:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T376905)', diff saved to https://phabricator.wikimedia.org/P70192 and previous config saved to /var/cache/conftool/dbconfig/20241016-221125-ladsgroup.json
  • 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T376905)', diff saved to https://phabricator.wikimedia.org/P70191 and previous config saved to /var/cache/conftool/dbconfig/20241016-220053-ladsgroup.json
  • 22:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 22:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 21:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 21:17 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 21:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 21:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 20:44 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
  • 20:44 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
  • 20:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 20:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 20:39 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:39 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:37 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:37 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:31 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374 (duration: 00m 08s)
  • 20:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70189 and previous config saved to /var/cache/conftool/dbconfig/20241016-203034-ladsgroup.json
  • 20:30 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374
  • 20:29 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:29 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:26 jhuneidi@deploy2002: Finished scap sync-world: Backport for Make wikitech a target for CentralNotice banners (T377030) (duration: 10m 02s)
  • 20:21 jhuneidi@deploy2002: ejegg, jhuneidi: Continuing with sync
  • 20:18 jhuneidi@deploy2002: ejegg, jhuneidi: Backport for Make wikitech a target for CentralNotice banners (T377030) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:18 mutante: phab2002 - ln -s /var/lib/scap/scap/bin/scap /usr/bin/scap
  • 20:17 mutante: phab2002 - after manually running bootstrap-scap-target.sh and "Scap from local bullseye wheels successfully installed at /var/lib/scap/scap" still "cannot open `/usr/bin/scap' (No such file or directory)" though. T303559 T310740 T377374
  • 20:17 jhuneidi@deploy2002: Started scap sync-world: Backport for Make wikitech a target for CentralNotice banners (T377030)
  • 20:16 mutante: phab2002 - manually bootstrapping scap since puppet did not do it due to dependency cycles: sudo -u scap /usr/local/bin/bootstrap-scap-target.sh deploy2002.codfw.wmnet /var/lib/scap T303559 T310740 T377374
  • 20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P70188 and previous config saved to /var/cache/conftool/dbconfig/20241016-201527-ladsgroup.json
  • 20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P70187 and previous config saved to /var/cache/conftool/dbconfig/20241016-200020-ladsgroup.json
  • 19:54 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-out1001.wikimedia.org
  • 19:50 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out1001.wikimedia.org
  • 19:49 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-out2001.wikimedia.org
  • 19:47 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org
  • 19:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70186 and previous config saved to /var/cache/conftool/dbconfig/20241016-194513-ladsgroup.json
  • 19:47 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mx-out2001.wikimedia.org
  • 19:47 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org
  • 19:46 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mx-out2001.wikimedia.org
  • 19:45 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org
  • 19:45 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx1001.wikimedia.org
  • 19:44 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:44 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002"
  • 19:43 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002"
  • 19:42 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mx-out2001.wikimedia.org
  • 19:42 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-out2001.wikimedia.org
  • 19:40 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
  • 19:36 jhathaway@cumin1002: START - Cookbook sre.hosts.decommission for hosts mx1001.wikimedia.org
  • 19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T376905)', diff saved to https://phabricator.wikimedia.org/P70185 and previous config saved to /var/cache/conftool/dbconfig/20241016-193500-ladsgroup.json
  • 19:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70184 and previous config saved to /var/cache/conftool/dbconfig/20241016-193433-ladsgroup.json
  • 19:30 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374 (duration: 10m 42s)
  • 19:19 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: deploy phab2002 for T377374
  • 19:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70183 and previous config saved to /var/cache/conftool/dbconfig/20241016-191926-ladsgroup.json
  • 19:16 inflatador: bking@stat1011 racadm>>racadm jobqueue create BIOS.Setup.1-1 Commit JID = JID_291241139935 T376813
  • 19:14 inflatador: bking@stat1011 racadm>>racadm set BIOS.MemSettings.NodeInterleave Enabled T376813
  • 19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P70182 and previous config saved to /var/cache/conftool/dbconfig/20241016-190419-ladsgroup.json
  • 18:54 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
  • 18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70181 and previous config saved to /var/cache/conftool/dbconfig/20241016-184912-ladsgroup.json
  • 18:47 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mx2001.wikimedia.org
  • 18:47 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:46 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002"
  • 18:45 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mx2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jhathaway@cumin1002"
  • 18:43 papaul: maintenance on mr1-ulsfo complete
  • 18:41 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
  • 18:36 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
  • 18:35 jhathaway@cumin1002: START - Cookbook sre.hosts.decommission for hosts mx2001.wikimedia.org
  • 18:33 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
  • 18:32 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
  • 18:32 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
  • 18:31 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 18:31 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: host reimage
  • 18:27 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:27 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:21 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:20 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:17 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.27 refs T375658
  • 18:13 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host phab2002
  • 18:13 dzahn@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host phab2002
  • 18:13 dzahn@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host phab2002
  • 18:12 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab2002.codfw.wmnet 54.32.192.10.in-addr.arpa 4.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 18:12 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache phab2002.codfw.wmnet 54.32.192.10.in-addr.arpa 4.5.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 18:12 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:12 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host phab2002 - dzahn@cumin2002"
  • 18:11 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:11 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host phab2002 - dzahn@cumin2002"
  • 18:11 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:06 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 18:05 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:04 cdanis@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 18:01 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 18:00 papaul: ongoing maintenance on mr1-ulsfo
  • 18:00 cdanis@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:58 dzahn@cumin2002: START - Cookbook sre.hosts.move-vlan for host phab2002
  • 17:58 cdanis@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:57 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host phab2002.codfw.wmnet with OS bullseye
  • 17:56 cdanis@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:55 cdanis@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T376905)', diff saved to https://phabricator.wikimedia.org/P70179 and previous config saved to /var/cache/conftool/dbconfig/20241016-174847-ladsgroup.json
  • 17:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 17:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T376905)', diff saved to https://phabricator.wikimedia.org/P70178 and previous config saved to /var/cache/conftool/dbconfig/20241016-174821-ladsgroup.json
  • 17:48 swfrench@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:48 swfrench@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly allocated LVS VIPs for mw-web-next and mw-api-ext-next - swfrench@cumin2002"
  • 17:41 swfrench@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly allocated LVS VIPs for mw-web-next and mw-api-ext-next - swfrench@cumin2002"
  • 17:39 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:38 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:37 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
  • 17:37 swfrench@cumin2002: START - Cookbook sre.dns.netbox
  • 17:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P70177 and previous config saved to /var/cache/conftool/dbconfig/20241016-173314-ladsgroup.json
  • 17:20 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P70176 and previous config saved to /var/cache/conftool/dbconfig/20241016-171807-ladsgroup.json
  • 17:16 xcollazo@deploy2002: Finished deploy [analytics/refinery@f186c94] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f186c94a] (duration: 03m 44s)
  • 17:13 xcollazo@deploy2002: Started deploy [analytics/refinery@f186c94] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f186c94a]
  • 17:12 xcollazo@deploy2002: Finished deploy [analytics/refinery@f186c94] (thin): Regular analytics weekly train THIN [analytics/refinery@f186c94a] (duration: 05m 11s)
  • 17:06 xcollazo@deploy2002: Started deploy [analytics/refinery@f186c94] (thin): Regular analytics weekly train THIN [analytics/refinery@f186c94a]
  • 17:06 xcollazo@deploy2002: Finished deploy [analytics/refinery@f186c94]: Regular analytics weekly train [analytics/refinery@f186c94a] (duration: 08m 54s)
  • 17:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T376905)', diff saved to https://phabricator.wikimedia.org/P70175 and previous config saved to /var/cache/conftool/dbconfig/20241016-170300-ladsgroup.json
  • 16:57 xcollazo@deploy2002: Started deploy [analytics/refinery@f186c94]: Regular analytics weekly train [analytics/refinery@f186c94a]
  • 16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T376905)', diff saved to https://phabricator.wikimedia.org/P70174 and previous config saved to /var/cache/conftool/dbconfig/20241016-165343-ladsgroup.json
  • 16:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70173 and previous config saved to /var/cache/conftool/dbconfig/20241016-165317-ladsgroup.json
  • 16:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P70172 and previous config saved to /var/cache/conftool/dbconfig/20241016-163810-ladsgroup.json
  • 16:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P70171 and previous config saved to /var/cache/conftool/dbconfig/20241016-162303-ladsgroup.json
  • 16:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70170 and previous config saved to /var/cache/conftool/dbconfig/20241016-160756-ladsgroup.json
  • 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T376905)', diff saved to https://phabricator.wikimedia.org/P70169 and previous config saved to /var/cache/conftool/dbconfig/20241016-155948-ladsgroup.json
  • 15:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 15:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70168 and previous config saved to /var/cache/conftool/dbconfig/20241016-155450-ladsgroup.json
  • 15:52 papaul: maintenance on mr1-eqsin complete
  • 15:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70167 and previous config saved to /var/cache/conftool/dbconfig/20241016-153943-ladsgroup.json
  • 15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P70166 and previous config saved to /var/cache/conftool/dbconfig/20241016-152436-ladsgroup.json
  • 15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70165 and previous config saved to /var/cache/conftool/dbconfig/20241016-150928-ladsgroup.json
  • 15:05 papaul: ongoing maintenance on mr1-eqsin
  • 14:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:41 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] beta: Lower batch size for reassignMenteesJob (T376124) (duration: 06m 46s)
  • 14:35 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] beta: Lower batch size for reassignMenteesJob (T376124)
  • 14:25 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:25 Lucas_WMDE: [cont.] 7)]], Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226) (duration: 11m 36s)
  • {{safesubst:SAL entry|1=14:24 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197), Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226), [[gerrit:1080703|Tests: Skip testViewForExistingGlobalTemporaryAccount (T37719}}
  • 14:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:20 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 14:19 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002"
  • 14:19 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - oblivian@cumin1002
  • 14:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 14:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367856)', diff saved to https://phabricator.wikimedia.org/P70164 and previous config saved to /var/cache/conftool/dbconfig/20241016-141819-ladsgroup.json
  • 14:18 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - oblivian@cumin1002
  • 14:18 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002"
  • 14:17 oblivian@cumin1002: END (FAIL) - Cookbook sre.deploy.hiddenparma (exit_code=99) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002"
  • 14:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - oblivian@cumin1002"
  • 14:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 14:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 14:15 Lucas_WMDE: [cont.] ], Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • {{safesubst:SAL entry|1=14:15 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197), Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226), Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197)]}}
  • 14:13 Lucas_WMDE: [cont.] ), Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226)
  • {{safesubst:SAL entry|1=14:13 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197), Hard-code LabelCountField::NAME (T377226), Remove LabelCountField (T377226), Drop label_count field (LabelCountField) (T377226), [[gerrit:1080703|Tests: Skip testViewForExistingGlobalTemporaryAccount (T377197}}
  • 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70163 and previous config saved to /var/cache/conftool/dbconfig/20241016-140902-ladsgroup.json
  • 14:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 14:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 14:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70162 and previous config saved to /var/cache/conftool/dbconfig/20241016-140835-ladsgroup.json
  • 14:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70161 and previous config saved to /var/cache/conftool/dbconfig/20241016-140312-ladsgroup.json
  • 13:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70160 and previous config saved to /var/cache/conftool/dbconfig/20241016-135328-ladsgroup.json
  • 13:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70159 and previous config saved to /var/cache/conftool/dbconfig/20241016-134805-ladsgroup.json
  • 13:43 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
  • 13:41 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
  • 13:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70158 and previous config saved to /var/cache/conftool/dbconfig/20241016-133821-ladsgroup.json
  • 13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367856)', diff saved to https://phabricator.wikimedia.org/P70157 and previous config saved to /var/cache/conftool/dbconfig/20241016-133257-ladsgroup.json
  • 13:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Update Z669x references to Z609x (duration: 08m 23s)
  • 13:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70156 and previous config saved to /var/cache/conftool/dbconfig/20241016-132314-ladsgroup.json
  • 13:20 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, jforrester: Continuing with sync
  • 13:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, jforrester: Backport for Update Z669x references to Z609x synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Update Z669x references to Z609x
  • 13:16 Dreamy_Jazz: Started time limited scan on enwiki - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 13:16 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Remove wgGEUseNewImpactModule config (T350077) (duration: 11m 35s)
  • 13:11 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cyndywikime: Continuing with sync
  • 13:07 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cyndywikime: Backport for Remove wgGEUseNewImpactModule config (T350077) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Remove wgGEUseNewImpactModule config (T350077)
  • 12:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 12:52 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
  • 12:47 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 12:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 12:46 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:43 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 12:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1177
  • 12:35 stevemunene@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1177
  • 12:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1176
  • 12:34 stevemunene@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1176
  • 12:33 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 12:32 stevemunene@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:32 stevemunene@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly reassigned an-worker hosts in analytics eqiad - stevemunene@cumin1002"
  • 12:32 stevemunene@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly reassigned an-worker hosts in analytics eqiad - stevemunene@cumin1002"
  • 12:28 stevemunene@cumin1002: START - Cookbook sre.dns.netbox
  • 12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T376905)', diff saved to https://phabricator.wikimedia.org/P70155 and previous config saved to /var/cache/conftool/dbconfig/20241016-122248-ladsgroup.json
  • 12:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70154 and previous config saved to /var/cache/conftool/dbconfig/20241016-122206-ladsgroup.json
  • 12:15 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:14 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70153 and previous config saved to /var/cache/conftool/dbconfig/20241016-120659-ladsgroup.json
  • 11:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P70152 and previous config saved to /var/cache/conftool/dbconfig/20241016-115152-ladsgroup.json
  • 11:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70150 and previous config saved to /var/cache/conftool/dbconfig/20241016-113645-ladsgroup.json
  • 11:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T371742)', diff saved to https://phabricator.wikimedia.org/P70149 and previous config saved to /var/cache/conftool/dbconfig/20241016-113639-ladsgroup.json
  • 11:29 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:28 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:26 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:25 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:22 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70148 and previous config saved to /var/cache/conftool/dbconfig/20241016-112132-ladsgroup.json
  • 11:21 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70147 and previous config saved to /var/cache/conftool/dbconfig/20241016-110625-ladsgroup.json
  • 10:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T371742)', diff saved to https://phabricator.wikimedia.org/P70146 and previous config saved to /var/cache/conftool/dbconfig/20241016-105118-ladsgroup.json
  • 10:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T376905)', diff saved to https://phabricator.wikimedia.org/P70145 and previous config saved to /var/cache/conftool/dbconfig/20241016-103620-ladsgroup.json
  • 10:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 10:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 10:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T376905)', diff saved to https://phabricator.wikimedia.org/P70144 and previous config saved to /var/cache/conftool/dbconfig/20241016-103553-ladsgroup.json
  • 10:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P70143 and previous config saved to /var/cache/conftool/dbconfig/20241016-102046-ladsgroup.json
  • 10:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P70142 and previous config saved to /var/cache/conftool/dbconfig/20241016-100539-ladsgroup.json
  • 09:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T376905)', diff saved to https://phabricator.wikimedia.org/P70141 and previous config saved to /var/cache/conftool/dbconfig/20241016-095032-ladsgroup.json
  • 09:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T376905)', diff saved to https://phabricator.wikimedia.org/P70140 and previous config saved to /var/cache/conftool/dbconfig/20241016-093852-ladsgroup.json
  • 09:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 09:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 09:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 09:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T376905)', diff saved to https://phabricator.wikimedia.org/P70139 and previous config saved to /var/cache/conftool/dbconfig/20241016-093147-ladsgroup.json
  • 09:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T371742)', diff saved to https://phabricator.wikimedia.org/P70138 and previous config saved to /var/cache/conftool/dbconfig/20241016-092219-ladsgroup.json
  • 09:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 09:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T371742)', diff saved to https://phabricator.wikimedia.org/P70137 and previous config saved to /var/cache/conftool/dbconfig/20241016-092157-ladsgroup.json
  • 09:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P70136 and previous config saved to /var/cache/conftool/dbconfig/20241016-091640-ladsgroup.json
  • 09:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70134 and previous config saved to /var/cache/conftool/dbconfig/20241016-090650-ladsgroup.json
  • 09:04 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P70133 and previous config saved to /var/cache/conftool/dbconfig/20241016-090133-ladsgroup.json
  • 08:57 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70132 and previous config saved to /var/cache/conftool/dbconfig/20241016-085143-ladsgroup.json
  • 08:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T376905)', diff saved to https://phabricator.wikimedia.org/P70131 and previous config saved to /var/cache/conftool/dbconfig/20241016-084626-ladsgroup.json
  • 08:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T376905)', diff saved to https://phabricator.wikimedia.org/P70130 and previous config saved to /var/cache/conftool/dbconfig/20241016-083651-ladsgroup.json
  • 08:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 08:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T371742)', diff saved to https://phabricator.wikimedia.org/P70129 and previous config saved to /var/cache/conftool/dbconfig/20241016-083636-ladsgroup.json
  • 08:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 08:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:07 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:05 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:04 brouberol@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:03 brouberol@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:02 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:01 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:41 awight: UTC morning deployments done
  • 07:40 awight@deploy2002: Finished scap sync-world: Backport for zhwiki: Revise contact page deprecated usage (duration: 09m 07s)
  • 07:35 awight@deploy2002: awight, hamishz: Continuing with sync
  • 07:34 awight@deploy2002: awight, hamishz: Backport for zhwiki: Revise contact page deprecated usage synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:31 awight@deploy2002: Started scap sync-world: Backport for zhwiki: Revise contact page deprecated usage
  • 07:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70128 and previous config saved to /var/cache/conftool/dbconfig/20241016-072501-ladsgroup.json
  • 07:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P70127 and previous config saved to /var/cache/conftool/dbconfig/20241016-070954-ladsgroup.json
  • 07:09 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 07:08 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T371742)', diff saved to https://phabricator.wikimedia.org/P70126 and previous config saved to /var/cache/conftool/dbconfig/20241016-070246-ladsgroup.json
  • 07:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 07:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70125 and previous config saved to /var/cache/conftool/dbconfig/20241016-070224-ladsgroup.json
  • 06:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P70124 and previous config saved to /var/cache/conftool/dbconfig/20241016-065447-ladsgroup.json
  • 06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70123 and previous config saved to /var/cache/conftool/dbconfig/20241016-064717-ladsgroup.json
  • 06:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70122 and previous config saved to /var/cache/conftool/dbconfig/20241016-063940-ladsgroup.json
  • 06:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70121 and previous config saved to /var/cache/conftool/dbconfig/20241016-063210-ladsgroup.json
  • 06:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T376905)', diff saved to https://phabricator.wikimedia.org/P70120 and previous config saved to /var/cache/conftool/dbconfig/20241016-063132-ladsgroup.json
  • 06:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 06:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 06:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70119 and previous config saved to /var/cache/conftool/dbconfig/20241016-063107-ladsgroup.json
  • 06:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70118 and previous config saved to /var/cache/conftool/dbconfig/20241016-061703-ladsgroup.json
  • 06:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P70117 and previous config saved to /var/cache/conftool/dbconfig/20241016-061558-ladsgroup.json
  • 06:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P70116 and previous config saved to /var/cache/conftool/dbconfig/20241016-060051-ladsgroup.json
  • 05:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70115 and previous config saved to /var/cache/conftool/dbconfig/20241016-054544-ladsgroup.json
  • 05:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T376905)', diff saved to https://phabricator.wikimedia.org/P70114 and previous config saved to /var/cache/conftool/dbconfig/20241016-053943-ladsgroup.json
  • 05:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 05:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 05:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70113 and previous config saved to /var/cache/conftool/dbconfig/20241016-053918-ladsgroup.json
  • 05:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P70112 and previous config saved to /var/cache/conftool/dbconfig/20241016-052411-ladsgroup.json
  • 05:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P70111 and previous config saved to /var/cache/conftool/dbconfig/20241016-050904-ladsgroup.json
  • 04:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70110 and previous config saved to /var/cache/conftool/dbconfig/20241016-045356-ladsgroup.json
  • 04:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T376905)', diff saved to https://phabricator.wikimedia.org/P70109 and previous config saved to /var/cache/conftool/dbconfig/20241016-044657-ladsgroup.json
  • 04:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 04:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 04:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 04:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 04:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70108 and previous config saved to /var/cache/conftool/dbconfig/20241016-044204-ladsgroup.json
  • 04:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T371742)', diff saved to https://phabricator.wikimedia.org/P70107 and previous config saved to /var/cache/conftool/dbconfig/20241016-043757-ladsgroup.json
  • 04:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 04:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 04:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70106 and previous config saved to /var/cache/conftool/dbconfig/20241016-043734-ladsgroup.json
  • 04:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P70105 and previous config saved to /var/cache/conftool/dbconfig/20241016-042657-ladsgroup.json
  • 04:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70104 and previous config saved to /var/cache/conftool/dbconfig/20241016-042227-ladsgroup.json
  • 04:22 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
  • 04:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
  • 04:18 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 04:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P70103 and previous config saved to /var/cache/conftool/dbconfig/20241016-041150-ladsgroup.json
  • 04:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P70102 and previous config saved to /var/cache/conftool/dbconfig/20241016-040721-ladsgroup.json
  • 04:05 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:05 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
  • 04:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for new frack devices - pt1979@cumin2002"
  • 04:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 03:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70101 and previous config saved to /var/cache/conftool/dbconfig/20241016-035643-ladsgroup.json
  • 03:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70100 and previous config saved to /var/cache/conftool/dbconfig/20241016-035214-ladsgroup.json
  • 03:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T376905)', diff saved to https://phabricator.wikimedia.org/P70099 and previous config saved to /var/cache/conftool/dbconfig/20241016-034932-ladsgroup.json
  • 03:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 03:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 03:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70098 and previous config saved to /var/cache/conftool/dbconfig/20241016-034907-ladsgroup.json
  • 03:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P70097 and previous config saved to /var/cache/conftool/dbconfig/20241016-033400-ladsgroup.json
  • 03:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P70096 and previous config saved to /var/cache/conftool/dbconfig/20241016-031852-ladsgroup.json
  • 03:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70095 and previous config saved to /var/cache/conftool/dbconfig/20241016-030345-ladsgroup.json
  • 02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T376905)', diff saved to https://phabricator.wikimedia.org/P70094 and previous config saved to /var/cache/conftool/dbconfig/20241016-025633-ladsgroup.json
  • 02:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 02:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70093 and previous config saved to /var/cache/conftool/dbconfig/20241016-025608-ladsgroup.json
  • 02:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P70092 and previous config saved to /var/cache/conftool/dbconfig/20241016-024101-ladsgroup.json
  • 02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P70091 and previous config saved to /var/cache/conftool/dbconfig/20241016-022554-ladsgroup.json
  • 02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T371742)', diff saved to https://phabricator.wikimedia.org/P70090 and previous config saved to /var/cache/conftool/dbconfig/20241016-021358-ladsgroup.json
  • 02:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 02:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 02:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70089 and previous config saved to /var/cache/conftool/dbconfig/20241016-021347-ladsgroup.json
  • 02:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70088 and previous config saved to /var/cache/conftool/dbconfig/20241016-021047-ladsgroup.json
  • 02:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T376905)', diff saved to https://phabricator.wikimedia.org/P70087 and previous config saved to /var/cache/conftool/dbconfig/20241016-020333-ladsgroup.json
  • 02:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 02:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 02:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70086 and previous config saved to /var/cache/conftool/dbconfig/20241016-020308-ladsgroup.json
  • 01:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70085 and previous config saved to /var/cache/conftool/dbconfig/20241016-015840-ladsgroup.json
  • 01:50 eileen: tools upgraded from 62f2d170 to 68f64e43
  • 01:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P70084 and previous config saved to /var/cache/conftool/dbconfig/20241016-014801-ladsgroup.json
  • 01:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P70083 and previous config saved to /var/cache/conftool/dbconfig/20241016-014333-ladsgroup.json
  • 01:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P70082 and previous config saved to /var/cache/conftool/dbconfig/20241016-013254-ladsgroup.json
  • 01:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70081 and previous config saved to /var/cache/conftool/dbconfig/20241016-012826-ladsgroup.json
  • 01:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70080 and previous config saved to /var/cache/conftool/dbconfig/20241016-011747-ladsgroup.json
  • 01:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T376905)', diff saved to https://phabricator.wikimedia.org/P70079 and previous config saved to /var/cache/conftool/dbconfig/20241016-011036-ladsgroup.json
  • 01:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 01:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 01:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70078 and previous config saved to /var/cache/conftool/dbconfig/20241016-011010-ladsgroup.json
  • 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P70077 and previous config saved to /var/cache/conftool/dbconfig/20241016-005500-ladsgroup.json
  • 00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P70076 and previous config saved to /var/cache/conftool/dbconfig/20241016-003953-ladsgroup.json
  • 00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70075 and previous config saved to /var/cache/conftool/dbconfig/20241016-002446-ladsgroup.json
  • 00:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T376905)', diff saved to https://phabricator.wikimedia.org/P70074 and previous config saved to /var/cache/conftool/dbconfig/20241016-001629-ladsgroup.json
  • 00:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 00:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 00:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70073 and previous config saved to /var/cache/conftool/dbconfig/20241016-001604-ladsgroup.json
  • 00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P70072 and previous config saved to /var/cache/conftool/dbconfig/20241016-000057-ladsgroup.json

2024-10-15

  • 23:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T371742)', diff saved to https://phabricator.wikimedia.org/P70071 and previous config saved to /var/cache/conftool/dbconfig/20241015-235055-ladsgroup.json
  • 23:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 23:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 23:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T371742)', diff saved to https://phabricator.wikimedia.org/P70070 and previous config saved to /var/cache/conftool/dbconfig/20241015-235017-ladsgroup.json
  • 23:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P70069 and previous config saved to /var/cache/conftool/dbconfig/20241015-234550-ladsgroup.json
  • 23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P70068 and previous config saved to /var/cache/conftool/dbconfig/20241015-233510-ladsgroup.json
  • 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70067 and previous config saved to /var/cache/conftool/dbconfig/20241015-233043-ladsgroup.json
  • 23:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T376905)', diff saved to https://phabricator.wikimedia.org/P70066 and previous config saved to /var/cache/conftool/dbconfig/20241015-232456-ladsgroup.json
  • 23:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 23:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 23:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 23:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 23:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T376905)', diff saved to https://phabricator.wikimedia.org/P70065 and previous config saved to /var/cache/conftool/dbconfig/20241015-232423-ladsgroup.json
  • 23:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P70064 and previous config saved to /var/cache/conftool/dbconfig/20241015-232003-ladsgroup.json
  • 23:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P70063 and previous config saved to /var/cache/conftool/dbconfig/20241015-230916-ladsgroup.json
  • 23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T371742)', diff saved to https://phabricator.wikimedia.org/P70062 and previous config saved to /var/cache/conftool/dbconfig/20241015-230456-ladsgroup.json
  • 22:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P70061 and previous config saved to /var/cache/conftool/dbconfig/20241015-225409-ladsgroup.json
  • 22:48 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 22:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T376905)', diff saved to https://phabricator.wikimedia.org/P70060 and previous config saved to /var/cache/conftool/dbconfig/20241015-223902-ladsgroup.json
  • 22:38 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T376905)', diff saved to https://phabricator.wikimedia.org/P70059 and previous config saved to /var/cache/conftool/dbconfig/20241015-222936-ladsgroup.json
  • 22:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 22:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 22:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70058 and previous config saved to /var/cache/conftool/dbconfig/20241015-222911-ladsgroup.json
  • 22:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 22:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 22:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 22:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P70057 and previous config saved to /var/cache/conftool/dbconfig/20241015-221404-ladsgroup.json
  • 22:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P70056 and previous config saved to /var/cache/conftool/dbconfig/20241015-221356-ladsgroup.json
  • 22:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P70055 and previous config saved to /var/cache/conftool/dbconfig/20241015-220316-ladsgroup.json
  • 21:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P70054 and previous config saved to /var/cache/conftool/dbconfig/20241015-215857-ladsgroup.json
  • 21:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70053 and previous config saved to /var/cache/conftool/dbconfig/20241015-215849-ladsgroup.json
  • 21:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
  • 21:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P70052 and previous config saved to /var/cache/conftool/dbconfig/20241015-214811-ladsgroup.json
  • 21:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70051 and previous config saved to /var/cache/conftool/dbconfig/20241015-214350-ladsgroup.json
  • 21:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P70050 and previous config saved to /var/cache/conftool/dbconfig/20241015-214342-ladsgroup.json
  • 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T376905)', diff saved to https://phabricator.wikimedia.org/P70049 and previous config saved to /var/cache/conftool/dbconfig/20241015-213423-ladsgroup.json
  • 21:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 21:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 21:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P70048 and previous config saved to /var/cache/conftool/dbconfig/20241015-213305-ladsgroup.json
  • 21:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T371742)', diff saved to https://phabricator.wikimedia.org/P70047 and previous config saved to /var/cache/conftool/dbconfig/20241015-213227-ladsgroup.json
  • 21:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 21:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 21:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T371742)', diff saved to https://phabricator.wikimedia.org/P70046 and previous config saved to /var/cache/conftool/dbconfig/20241015-213203-ladsgroup.json
  • 21:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 21:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 21:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P70045 and previous config saved to /var/cache/conftool/dbconfig/20241015-212835-ladsgroup.json
  • 21:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2205.codfw.wmnet with reason: Sad
  • 21:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2205.codfw.wmnet with reason: Sad
  • 21:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T370903)', diff saved to https://phabricator.wikimedia.org/P70044 and previous config saved to /var/cache/conftool/dbconfig/20241015-212431-ladsgroup.json
  • 21:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 21:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P70043 and previous config saved to /var/cache/conftool/dbconfig/20241015-211800-ladsgroup.json
  • 21:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P70042 and previous config saved to /var/cache/conftool/dbconfig/20241015-211656-ladsgroup.json
  • 21:04 cjming: end of UTC late backport window
  • 21:04 cjming@deploy2002: Finished scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) (duration: 06m 51s)
  • 21:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P70041 and previous config saved to /var/cache/conftool/dbconfig/20241015-210149-ladsgroup.json
  • 20:59 cjming@deploy2002: cjming, matmarex: Continuing with sync
  • 20:59 cjming@deploy2002: cjming, matmarex: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:57 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2194.codfw.wmnet onto db2205.codfw.wmnet
  • 20:57 cjming@deploy2002: Started scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646)
  • 20:56 cjming@deploy2002: Finished scap sync-world: Backport for Redirect all namespace-in-Wikipedia cases to Wikipedia (T376923) (duration: 12m 33s)
  • 20:51 cjming@deploy2002: cjming, pppery: Continuing with sync
  • 20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T371742)', diff saved to https://phabricator.wikimedia.org/P70040 and previous config saved to /var/cache/conftool/dbconfig/20241015-204642-ladsgroup.json
  • 20:46 cjming@deploy2002: cjming, pppery: Backport for Redirect all namespace-in-Wikipedia cases to Wikipedia (T376923) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:43 cjming@deploy2002: Started scap sync-world: Backport for Redirect all namespace-in-Wikipedia cases to Wikipedia (T376923)
  • 20:42 cjming@deploy2002: Finished scap sync-world: Backport for Missing.php: Improve detection of interwikis in certain cases (T363538) (duration: 08m 50s)
  • 20:37 cjming@deploy2002: cjming, pppery: Continuing with sync
  • 20:35 cjming@deploy2002: cjming, pppery: Backport for Missing.php: Improve detection of interwikis in certain cases (T363538) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:33 cjming@deploy2002: Started scap sync-world: Backport for Missing.php: Improve detection of interwikis in certain cases (T363538)
  • 20:31 cjming@deploy2002: Finished scap sync-world: Backport for contactpages: Move stewards contactpage to MetaContactPages.php (duration: 10m 56s)
  • 20:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
  • 20:27 cjming@deploy2002: ammarpad, cjming: Continuing with sync
  • 20:23 cjming@deploy2002: ammarpad, cjming: Backport for contactpages: Move stewards contactpage to MetaContactPages.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:20 cjming@deploy2002: Started scap sync-world: Backport for contactpages: Move stewards contactpage to MetaContactPages.php
  • 20:16 cjming@deploy2002: Finished scap sync-world: Backport for Remove legacy UI actions tracking (T376065) (duration: 12m 28s)
  • 20:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
  • 20:12 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 20:12 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 20:11 cjming@deploy2002: ksarabia, cjming: Continuing with sync
  • 20:11 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 20:10 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 20:10 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:09 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye
  • 20:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2081.codfw.wmnet with OS bullseye
  • 20:07 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 20:07 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 20:06 cjming@deploy2002: ksarabia, cjming: Backport for Remove legacy UI actions tracking (T376065) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:05 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 20:04 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 20:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:03 cjming@deploy2002: Started scap sync-world: Backport for Remove legacy UI actions tracking (T376065)
  • 20:03 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:02 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 20:01 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 20:00 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 19:59 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 19:56 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:56 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:16 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.27 refs T375658
  • 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T371742)', diff saved to https://phabricator.wikimedia.org/P70039 and previous config saved to /var/cache/conftool/dbconfig/20241015-191345-ladsgroup.json
  • 19:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 19:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 19:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T371742)', diff saved to https://phabricator.wikimedia.org/P70038 and previous config saved to /var/cache/conftool/dbconfig/20241015-191322-ladsgroup.json
  • 19:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367781)', diff saved to https://phabricator.wikimedia.org/P70037 and previous config saved to /var/cache/conftool/dbconfig/20241015-190231-arnaudb.json
  • 18:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70036 and previous config saved to /var/cache/conftool/dbconfig/20241015-185814-ladsgroup.json
  • 18:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
  • 18:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
  • 18:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
  • 18:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P70035 and previous config saved to /var/cache/conftool/dbconfig/20241015-184724-arnaudb.json
  • 18:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P70034 and previous config saved to /var/cache/conftool/dbconfig/20241015-184307-ladsgroup.json
  • 18:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:39 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2082
  • 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2081
  • 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2083
  • 18:37 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2083
  • 18:37 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2082
  • 18:36 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2081
  • 18:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2081-3 to codfw - jhancock@cumin2002"
  • 18:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2081-3 to codfw - jhancock@cumin2002"
  • 18:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P70033 and previous config saved to /var/cache/conftool/dbconfig/20241015-183218-arnaudb.json
  • 18:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 18:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T371742)', diff saved to https://phabricator.wikimedia.org/P70032 and previous config saved to /var/cache/conftool/dbconfig/20241015-182800-ladsgroup.json
  • 18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T376905)', diff saved to https://phabricator.wikimedia.org/P70031 and previous config saved to /var/cache/conftool/dbconfig/20241015-181930-ladsgroup.json
  • 18:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367781)', diff saved to https://phabricator.wikimedia.org/P70030 and previous config saved to /var/cache/conftool/dbconfig/20241015-181711-arnaudb.json
  • 18:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T367781)', diff saved to https://phabricator.wikimedia.org/P70029 and previous config saved to /var/cache/conftool/dbconfig/20241015-181455-arnaudb.json
  • 18:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 18:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 18:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367781)', diff saved to https://phabricator.wikimedia.org/P70028 and previous config saved to /var/cache/conftool/dbconfig/20241015-181433-arnaudb.json
  • 18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P70027 and previous config saved to /var/cache/conftool/dbconfig/20241015-180423-ladsgroup.json
  • 17:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P70026 and previous config saved to /var/cache/conftool/dbconfig/20241015-175926-arnaudb.json
  • 17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P70025 and previous config saved to /var/cache/conftool/dbconfig/20241015-174916-ladsgroup.json
  • 17:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P70024 and previous config saved to /var/cache/conftool/dbconfig/20241015-174419-arnaudb.json
  • 17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T376905)', diff saved to https://phabricator.wikimedia.org/P70023 and previous config saved to /var/cache/conftool/dbconfig/20241015-173409-ladsgroup.json
  • 17:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367781)', diff saved to https://phabricator.wikimedia.org/P70022 and previous config saved to /var/cache/conftool/dbconfig/20241015-172912-arnaudb.json
  • 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T376905)', diff saved to https://phabricator.wikimedia.org/P70021 and previous config saved to /var/cache/conftool/dbconfig/20241015-172714-ladsgroup.json
  • 17:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 17:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T367781)', diff saved to https://phabricator.wikimedia.org/P70020 and previous config saved to /var/cache/conftool/dbconfig/20241015-172657-arnaudb.json
  • 17:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 17:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70019 and previous config saved to /var/cache/conftool/dbconfig/20241015-172648-ladsgroup.json
  • 17:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 17:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 17:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 17:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367781)', diff saved to https://phabricator.wikimedia.org/P70018 and previous config saved to /var/cache/conftool/dbconfig/20241015-172610-arnaudb.json
  • 17:13 swfrench@deploy2002: Finished scap sync-world: Testing scap after mediawiki-deployments.yaml format change - T370934 (duration: 02m 47s)
  • 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P70017 and previous config saved to /var/cache/conftool/dbconfig/20241015-171141-ladsgroup.json
  • 17:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P70016 and previous config saved to /var/cache/conftool/dbconfig/20241015-171103-arnaudb.json
  • 17:10 swfrench@deploy2002: Started scap sync-world: Testing scap after mediawiki-deployments.yaml format change - T370934
  • 16:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P70015 and previous config saved to /var/cache/conftool/dbconfig/20241015-165634-ladsgroup.json
  • 16:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T371742)', diff saved to https://phabricator.wikimedia.org/P70014 and previous config saved to /var/cache/conftool/dbconfig/20241015-165608-ladsgroup.json
  • 16:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P70013 and previous config saved to /var/cache/conftool/dbconfig/20241015-165556-arnaudb.json
  • 16:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 16:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T371742)', diff saved to https://phabricator.wikimedia.org/P70012 and previous config saved to /var/cache/conftool/dbconfig/20241015-165539-ladsgroup.json
  • 16:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70011 and previous config saved to /var/cache/conftool/dbconfig/20241015-164127-ladsgroup.json
  • 16:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367781)', diff saved to https://phabricator.wikimedia.org/P70010 and previous config saved to /var/cache/conftool/dbconfig/20241015-164050-arnaudb.json
  • 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P70009 and previous config saved to /var/cache/conftool/dbconfig/20241015-164032-ladsgroup.json
  • 16:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T367781)', diff saved to https://phabricator.wikimedia.org/P70008 and previous config saved to /var/cache/conftool/dbconfig/20241015-163834-arnaudb.json
  • 16:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 16:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 16:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367781)', diff saved to https://phabricator.wikimedia.org/P70007 and previous config saved to /var/cache/conftool/dbconfig/20241015-163812-arnaudb.json
  • 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70006 and previous config saved to /var/cache/conftool/dbconfig/20241015-163419-ladsgroup.json
  • 16:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T376905)', diff saved to https://phabricator.wikimedia.org/P70005 and previous config saved to /var/cache/conftool/dbconfig/20241015-163404-ladsgroup.json
  • 16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P70004 and previous config saved to /var/cache/conftool/dbconfig/20241015-162525-ladsgroup.json
  • 16:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P70003 and previous config saved to /var/cache/conftool/dbconfig/20241015-162305-arnaudb.json
  • 16:21 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db2194.codfw.wmnet onto db2205.codfw.wmnet
  • 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P70002 and previous config saved to /var/cache/conftool/dbconfig/20241015-161934-ladsgroup.json
  • 16:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P70001 and previous config saved to /var/cache/conftool/dbconfig/20241015-161858-ladsgroup.json
  • 16:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T371742)', diff saved to https://phabricator.wikimedia.org/P70000 and previous config saved to /var/cache/conftool/dbconfig/20241015-161018-ladsgroup.json
  • 16:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P69999 and previous config saved to /var/cache/conftool/dbconfig/20241015-160758-arnaudb.json
  • 16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P69998 and previous config saved to /var/cache/conftool/dbconfig/20241015-160351-ladsgroup.json
  • 16:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db2205 T377164', diff saved to https://phabricator.wikimedia.org/P69997 and previous config saved to /var/cache/conftool/dbconfig/20241015-160106-ladsgroup.json
  • 15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367781)', diff saved to https://phabricator.wikimedia.org/P69996 and previous config saved to /var/cache/conftool/dbconfig/20241015-155251-arnaudb.json
  • 15:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Promote db2209 to s3 primary and set section read-write T377164', diff saved to https://phabricator.wikimedia.org/P69995 and previous config saved to /var/cache/conftool/dbconfig/20241015-155240-ladsgroup.json
  • 15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T376905)', diff saved to https://phabricator.wikimedia.org/P69994 and previous config saved to /var/cache/conftool/dbconfig/20241015-154844-ladsgroup.json
  • 15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set s3 codfw as read-only for maintenance - T377164', diff saved to https://phabricator.wikimedia.org/P69993 and previous config saved to /var/cache/conftool/dbconfig/20241015-154834-ladsgroup.json
  • 15:48 Amir1: Starting s3 codfw failover from db2205 to db2209 - T377164
  • 15:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T367781)', diff saved to https://phabricator.wikimedia.org/P69992 and previous config saved to /var/cache/conftool/dbconfig/20241015-154318-arnaudb.json
  • 15:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 15:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69991 and previous config saved to /var/cache/conftool/dbconfig/20241015-154256-arnaudb.json
  • 15:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set db2209 with weight 0 T377164', diff saved to https://phabricator.wikimedia.org/P69990 and previous config saved to /var/cache/conftool/dbconfig/20241015-154228-ladsgroup.json
  • 15:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T377164
  • 15:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T377164
  • 15:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T376905)', diff saved to https://phabricator.wikimedia.org/P69989 and previous config saved to /var/cache/conftool/dbconfig/20241015-154027-ladsgroup.json
  • 15:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69988 and previous config saved to /var/cache/conftool/dbconfig/20241015-154002-ladsgroup.json
  • 15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P69987 and previous config saved to /var/cache/conftool/dbconfig/20241015-152749-arnaudb.json
  • 15:26 akosiaris: run gnt-cluster verify-disks after ganeti1034 forceful reboot
  • 15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P69986 and previous config saved to /var/cache/conftool/dbconfig/20241015-152456-ladsgroup.json
  • 15:22 volans: force-rebooting ganeti1034 stuck due to drbd traces via mgmt
  • 15:19 akosiaris@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1034.eqiad.wmnet
  • 15:17 akosiaris: drain ganeti1034 of VMs, hardware might be misbehaving
  • 15:16 akosiaris@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
  • 15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P69985 and previous config saved to /var/cache/conftool/dbconfig/20241015-151243-arnaudb.json
  • 15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P69984 and previous config saved to /var/cache/conftool/dbconfig/20241015-150948-ladsgroup.json
  • 14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69983 and previous config saved to /var/cache/conftool/dbconfig/20241015-145734-arnaudb.json
  • 14:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1001.eqiad.wmnet
  • 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T367781)', diff saved to https://phabricator.wikimedia.org/P69982 and previous config saved to /var/cache/conftool/dbconfig/20241015-145517-arnaudb.json
  • 14:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 14:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T367781)', diff saved to https://phabricator.wikimedia.org/P69981 and previous config saved to /var/cache/conftool/dbconfig/20241015-145453-arnaudb.json
  • 14:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69980 and previous config saved to /var/cache/conftool/dbconfig/20241015-145441-ladsgroup.json
  • 14:48 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan1001.eqiad.wmnet
  • 14:47 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2001.codfw.wmnet
  • 14:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T376905)', diff saved to https://phabricator.wikimedia.org/P69979 and previous config saved to /var/cache/conftool/dbconfig/20241015-144631-ladsgroup.json
  • 14:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 14:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 14:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T376905)', diff saved to https://phabricator.wikimedia.org/P69978 and previous config saved to /var/cache/conftool/dbconfig/20241015-144606-ladsgroup.json
  • 14:45 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 24s)
  • 14:43 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 46s)
  • 14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P69977 and previous config saved to /var/cache/conftool/dbconfig/20241015-143946-arnaudb.json
  • 14:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T371742)', diff saved to https://phabricator.wikimedia.org/P69976 and previous config saved to /var/cache/conftool/dbconfig/20241015-143803-ladsgroup.json
  • 14:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 14:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 14:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T371742)', diff saved to https://phabricator.wikimedia.org/P69975 and previous config saved to /var/cache/conftool/dbconfig/20241015-143740-ladsgroup.json
  • 14:36 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan2001.codfw.wmnet
  • 14:35 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1003.eqiad.wmnet
  • 14:33 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1002.eqiad.wmnet
  • 14:31 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host matomo1003.eqiad.wmnet
  • 14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P69974 and previous config saved to /var/cache/conftool/dbconfig/20241015-143059-ladsgroup.json
  • 14:29 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:28 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan1002.eqiad.wmnet
  • 14:28 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:27 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:27 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:26 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P69973 and previous config saved to /var/cache/conftool/dbconfig/20241015-142439-arnaudb.json
  • 14:24 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet
  • 14:24 urbanecm@deploy2002: Finished scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) (duration: 33m 23s)
  • 14:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P69972 and previous config saved to /var/cache/conftool/dbconfig/20241015-142233-ladsgroup.json
  • 14:21 btullis@cumin1002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling reboot on A:schema
  • 14:19 urbanecm@deploy2002: urbanecm, matmarex: Continuing with sync
  • 14:17 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan2002.codfw.wmnet
  • 14:16 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
  • 14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P69971 and previous config saved to /var/cache/conftool/dbconfig/20241015-141552-ladsgroup.json
  • 14:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T367781)', diff saved to https://phabricator.wikimedia.org/P69970 and previous config saved to /var/cache/conftool/dbconfig/20241015-140932-arnaudb.json
  • 14:09 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P69969 and previous config saved to /var/cache/conftool/dbconfig/20241015-140726-ladsgroup.json
  • 14:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T367781)', diff saved to https://phabricator.wikimedia.org/P69968 and previous config saved to /var/cache/conftool/dbconfig/20241015-140716-arnaudb.json
  • 14:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:08 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1020.eqiad.wmnet
  • 14:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 14:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367781)', diff saved to https://phabricator.wikimedia.org/P69967 and previous config saved to /var/cache/conftool/dbconfig/20241015-140638-arnaudb.json
  • 14:05 btullis@cumin1002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling reboot on A:schema
  • 14:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T376905)', diff saved to https://phabricator.wikimedia.org/P69966 and previous config saved to /var/cache/conftool/dbconfig/20241015-140045-ladsgroup.json
  • 14:00 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1020.eqiad.wmnet
  • 13:57 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1019.eqiad.wmnet
  • 13:55 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
  • 13:54 urbanecm@deploy2002: urbanecm, matmarex: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T376905)', diff saved to https://phabricator.wikimedia.org/P69965 and previous config saved to /var/cache/conftool/dbconfig/20241015-135234-ladsgroup.json
  • 13:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T371742)', diff saved to https://phabricator.wikimedia.org/P69964 and previous config saved to /var/cache/conftool/dbconfig/20241015-135213-ladsgroup.json
  • 13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T376905)', diff saved to https://phabricator.wikimedia.org/P69963 and previous config saved to /var/cache/conftool/dbconfig/20241015-135208-ladsgroup.json
  • 13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P69962 and previous config saved to /var/cache/conftool/dbconfig/20241015-135131-arnaudb.json
  • 13:51 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1019.eqiad.wmnet
  • 13:50 urbanecm@deploy2002: Started scap sync-world: Backport for SkinComponentCopyright: Fix message existence check for history-copyright (T45646)
  • 13:48 herron@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
  • 13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P69961 and previous config saved to /var/cache/conftool/dbconfig/20241015-133701-ladsgroup.json
  • 13:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P69960 and previous config saved to /var/cache/conftool/dbconfig/20241015-133624-arnaudb.json
  • 13:32 urbanecm@deploy2002: Finished scap sync-world: Backport for eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337), s7: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 07m 44s)
  • 13:27 urbanecm@deploy2002: migr, urbanecm, zabe: Continuing with sync
  • 13:26 urbanecm@deploy2002: migr, urbanecm, zabe: Backport for eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337), s7: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:24 urbanecm@deploy2002: Started scap sync-world: Backport for eswiki: switch clearing link recommendations to PageSaveComplete hook (T372337), s7: Reduce revision-slots cache expiry to 60 seconds (T183490)
  • 13:23 urbanecm@deploy2002: Finished scap sync-world: Backport for [wikidatawiki] Enable the CampaignEvents extension (T375411), GrowthExperiments: update stream configuration to capture user id (T376833) (duration: 19m 25s)
  • 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P69959 and previous config saved to /var/cache/conftool/dbconfig/20241015-132154-ladsgroup.json
  • 13:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367781)', diff saved to https://phabricator.wikimedia.org/P69958 and previous config saved to /var/cache/conftool/dbconfig/20241015-132117-arnaudb.json
  • 13:19 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1018.eqiad.wmnet
  • 13:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T367781)', diff saved to https://phabricator.wikimedia.org/P69957 and previous config saved to /var/cache/conftool/dbconfig/20241015-131901-arnaudb.json
  • 13:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 13:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 13:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T367781)', diff saved to https://phabricator.wikimedia.org/P69956 and previous config saved to /var/cache/conftool/dbconfig/20241015-131839-arnaudb.json
  • 13:16 urbanecm@deploy2002: cyndywikime, daimona, urbanecm: Continuing with sync
  • 13:12 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1018.eqiad.wmnet
  • 13:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T370903)', diff saved to https://phabricator.wikimedia.org/P69955 and previous config saved to /var/cache/conftool/dbconfig/20241015-131122-ladsgroup.json
  • 13:11 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1017.eqiad.wmnet
  • 13:11 urbanecm@deploy2002: cyndywikime, daimona, urbanecm: Backport for [wikidatawiki] Enable the CampaignEvents extension (T375411), GrowthExperiments: update stream configuration to capture user id (T376833) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T376905)', diff saved to https://phabricator.wikimedia.org/P69954 and previous config saved to /var/cache/conftool/dbconfig/20241015-130647-ladsgroup.json
  • 13:04 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1017.eqiad.wmnet
  • 13:04 urbanecm@deploy2002: Started scap sync-world: Backport for [wikidatawiki] Enable the CampaignEvents extension (T375411), GrowthExperiments: update stream configuration to capture user id (T376833)
  • 13:03 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-presto1016.eqiad.wmnet
  • 13:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P69953 and previous config saved to /var/cache/conftool/dbconfig/20241015-130332-arnaudb.json
  • 12:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T376905)', diff saved to https://phabricator.wikimedia.org/P69952 and previous config saved to /var/cache/conftool/dbconfig/20241015-125748-ladsgroup.json
  • 12:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:57 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1016.eqiad.wmnet
  • 12:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69951 and previous config saved to /var/cache/conftool/dbconfig/20241015-125615-ladsgroup.json
  • 12:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T376905)', diff saved to https://phabricator.wikimedia.org/P69950 and previous config saved to /var/cache/conftool/dbconfig/20241015-125203-ladsgroup.json
  • 12:50 brouberol@cumin1002: END (FAIL) - Cookbook sre.presto.reboot-workers (exit_code=99) for Presto an-presto cluster: Reboot Presto nodes
  • 12:50 elukey: destroy old certs from puppetmaster1001's CA (parsoid.svc.{eqiad,codfw}.wmnet, debmonitor.discovery.wmnet)
  • 12:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P69949 and previous config saved to /var/cache/conftool/dbconfig/20241015-124825-arnaudb.json
  • 12:46 brouberol@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
  • 12:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69948 and previous config saved to /var/cache/conftool/dbconfig/20241015-124108-ladsgroup.json
  • 12:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P69947 and previous config saved to /var/cache/conftool/dbconfig/20241015-123656-ladsgroup.json
  • 12:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T367781)', diff saved to https://phabricator.wikimedia.org/P69946 and previous config saved to /var/cache/conftool/dbconfig/20241015-123318-arnaudb.json
  • 12:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T367781)', diff saved to https://phabricator.wikimedia.org/P69945 and previous config saved to /var/cache/conftool/dbconfig/20241015-123101-arnaudb.json
  • 12:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 12:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 12:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367781)', diff saved to https://phabricator.wikimedia.org/P69944 and previous config saved to /var/cache/conftool/dbconfig/20241015-123039-arnaudb.json
  • 12:30 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 12:29 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 12:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T370903)', diff saved to https://phabricator.wikimedia.org/P69943 and previous config saved to /var/cache/conftool/dbconfig/20241015-122601-ladsgroup.json
  • 12:24 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 12:24 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T370903)', diff saved to https://phabricator.wikimedia.org/P69942 and previous config saved to /var/cache/conftool/dbconfig/20241015-122251-ladsgroup.json
  • 12:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 12:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P69941 and previous config saved to /var/cache/conftool/dbconfig/20241015-122149-ladsgroup.json
  • 12:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T371742)', diff saved to https://phabricator.wikimedia.org/P69940 and previous config saved to /var/cache/conftool/dbconfig/20241015-121706-ladsgroup.json
  • 12:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 12:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 12:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P69939 and previous config saved to /var/cache/conftool/dbconfig/20241015-121532-arnaudb.json
  • 12:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T371742)', diff saved to https://phabricator.wikimedia.org/P69938 and previous config saved to /var/cache/conftool/dbconfig/20241015-121349-ladsgroup.json
  • 12:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T376905)', diff saved to https://phabricator.wikimedia.org/P69937 and previous config saved to /var/cache/conftool/dbconfig/20241015-120642-ladsgroup.json
  • 12:03 brouberol@cumin1002: END (FAIL) - Cookbook sre.presto.reboot-workers (exit_code=99) for Presto an-presto cluster: Reboot Presto nodes
  • 12:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P69936 and previous config saved to /var/cache/conftool/dbconfig/20241015-120025-arnaudb.json
  • 11:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69935 and previous config saved to /var/cache/conftool/dbconfig/20241015-115842-ladsgroup.json
  • 11:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T376905)', diff saved to https://phabricator.wikimedia.org/P69934 and previous config saved to /var/cache/conftool/dbconfig/20241015-115630-ladsgroup.json
  • 11:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T376905)', diff saved to https://phabricator.wikimedia.org/P69933 and previous config saved to /var/cache/conftool/dbconfig/20241015-115606-ladsgroup.json
  • 11:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367781)', diff saved to https://phabricator.wikimedia.org/P69932 and previous config saved to /var/cache/conftool/dbconfig/20241015-114518-arnaudb.json
  • 11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69931 and previous config saved to /var/cache/conftool/dbconfig/20241015-114336-ladsgroup.json
  • 11:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T367781)', diff saved to https://phabricator.wikimedia.org/P69930 and previous config saved to /var/cache/conftool/dbconfig/20241015-114302-arnaudb.json
  • 11:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 11:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 11:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69929 and previous config saved to /var/cache/conftool/dbconfig/20241015-114240-arnaudb.json
  • 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P69927 and previous config saved to /var/cache/conftool/dbconfig/20241015-114059-ladsgroup.json
  • 11:34 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T371742)', diff saved to https://phabricator.wikimedia.org/P69926 and previous config saved to /var/cache/conftool/dbconfig/20241015-112829-ladsgroup.json
  • 11:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P69925 and previous config saved to /var/cache/conftool/dbconfig/20241015-112733-arnaudb.json
  • 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P69924 and previous config saved to /var/cache/conftool/dbconfig/20241015-112551-ladsgroup.json
  • 11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P69923 and previous config saved to /var/cache/conftool/dbconfig/20241015-111226-arnaudb.json
  • 11:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T376905)', diff saved to https://phabricator.wikimedia.org/P69922 and previous config saved to /var/cache/conftool/dbconfig/20241015-111045-ladsgroup.json
  • 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T371742)', diff saved to https://phabricator.wikimedia.org/P69921 and previous config saved to /var/cache/conftool/dbconfig/20241015-110741-ladsgroup.json
  • 11:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T376905)', diff saved to https://phabricator.wikimedia.org/P69920 and previous config saved to /var/cache/conftool/dbconfig/20241015-110132-ladsgroup.json
  • 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69919 and previous config saved to /var/cache/conftool/dbconfig/20241015-105719-arnaudb.json
  • 10:53 tappof: expand LVs on prometheus instances (k8s-mlserve and k8s-stagin) T377196
  • 10:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T367781)', diff saved to https://phabricator.wikimedia.org/P69918 and previous config saved to /var/cache/conftool/dbconfig/20241015-105301-arnaudb.json
  • 10:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 10:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 10:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 10:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 10:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69917 and previous config saved to /var/cache/conftool/dbconfig/20241015-105213-arnaudb.json
  • 10:38 brouberol@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
  • 10:38 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2002.codfw.wmnet
  • 10:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P69915 and previous config saved to /var/cache/conftool/dbconfig/20241015-103706-arnaudb.json
  • 10:34 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk2002.codfw.wmnet
  • 10:30 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2003.codfw.wmnet
  • 10:26 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk2003.codfw.wmnet
  • 10:25 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2001.codfw.wmnet
  • 10:22 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk2001.codfw.wmnet
  • 10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P69914 and previous config saved to /var/cache/conftool/dbconfig/20241015-102159-arnaudb.json
  • 10:21 brouberol@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons.
  • 10:14 brouberol@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons.
  • 10:11 brouberol@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
  • 10:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69913 and previous config saved to /var/cache/conftool/dbconfig/20241015-100652-arnaudb.json
  • 10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T367781)', diff saved to https://phabricator.wikimedia.org/P69912 and previous config saved to /var/cache/conftool/dbconfig/20241015-100435-arnaudb.json
  • 10:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 10:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69911 and previous config saved to /var/cache/conftool/dbconfig/20241015-100413-arnaudb.json
  • 09:57 brouberol@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
  • 09:55 brouberol@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:dse-k8s-worker
  • 09:52 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P69910 and previous config saved to /var/cache/conftool/dbconfig/20241015-094906-arnaudb.json
  • 09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P69909 and previous config saved to /var/cache/conftool/dbconfig/20241015-093359-arnaudb.json
  • 09:26 brouberol@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
  • 09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69908 and previous config saved to /var/cache/conftool/dbconfig/20241015-091852-arnaudb.json
  • 09:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T367781)', diff saved to https://phabricator.wikimedia.org/P69907 and previous config saved to /var/cache/conftool/dbconfig/20241015-091635-arnaudb.json
  • 09:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 09:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 09:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 09:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 09:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 09:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 09:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 09:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 09:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69906 and previous config saved to /var/cache/conftool/dbconfig/20241015-091502-arnaudb.json
  • 09:07 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P69905 and previous config saved to /var/cache/conftool/dbconfig/20241015-085955-arnaudb.json
  • 08:47 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: init - oblivian@cumin2002
  • 08:46 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: init - oblivian@cumin2002
  • 08:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P69903 and previous config saved to /var/cache/conftool/dbconfig/20241015-084448-arnaudb.json
  • 08:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69902 and previous config saved to /var/cache/conftool/dbconfig/20241015-082941-arnaudb.json
  • 08:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
  • 08:27 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T367781)', diff saved to https://phabricator.wikimedia.org/P69901 and previous config saved to /var/cache/conftool/dbconfig/20241015-082727-arnaudb.json
  • 08:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
  • 08:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 08:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 08:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T367781)', diff saved to https://phabricator.wikimedia.org/P69900 and previous config saved to /var/cache/conftool/dbconfig/20241015-082704-arnaudb.json
  • 08:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P69899 and previous config saved to /var/cache/conftool/dbconfig/20241015-081157-arnaudb.json
  • 07:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P69898 and previous config saved to /var/cache/conftool/dbconfig/20241015-075650-arnaudb.json
  • 07:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69897 and previous config saved to /var/cache/conftool/dbconfig/20241015-074843-arnaudb.json
  • 07:47 hashar: Restarted Gerrit - T373897
  • 07:46 hashar@deploy2002: Finished deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit1003 - T373897 (duration: 00m 09s)
  • 07:46 hashar@deploy2002: Started deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit1003 - T373897
  • 07:42 hashar@deploy2002: Finished deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2002 - T373897 (duration: 00m 07s)
  • 07:42 hashar@deploy2002: Started deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2002 - T373897
  • 07:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T367781)', diff saved to https://phabricator.wikimedia.org/P69896 and previous config saved to /var/cache/conftool/dbconfig/20241015-074143-arnaudb.json
  • 07:40 hashar@deploy2002: Finished deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2003 - T373897 (duration: 00m 07s)
  • 07:40 hashar@deploy2002: Started deploy [gerrit/gerrit@2f0c927]: Gerrit to 3.10.2 on gerrit2003 - T373897
  • 07:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T367781)', diff saved to https://phabricator.wikimedia.org/P69895 and previous config saved to /var/cache/conftool/dbconfig/20241015-073928-arnaudb.json
  • 07:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 07:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 07:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T367781)', diff saved to https://phabricator.wikimedia.org/P69894 and previous config saved to /var/cache/conftool/dbconfig/20241015-073906-arnaudb.json
  • 07:38 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit[1003,2002-2003].wikimedia.org with reason: Gerrit 3.10.2 update
  • 07:38 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit[1003,2002-2003].wikimedia.org with reason: Gerrit 3.10.2 update
  • 07:35 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 07:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69893 and previous config saved to /var/cache/conftool/dbconfig/20241015-073338-arnaudb.json
  • 07:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P69892 and previous config saved to /var/cache/conftool/dbconfig/20241015-072359-arnaudb.json
  • 07:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69891 and previous config saved to /var/cache/conftool/dbconfig/20241015-071833-arnaudb.json
  • 07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P69890 and previous config saved to /var/cache/conftool/dbconfig/20241015-070852-arnaudb.json
  • 07:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: post sunday p.age T368098', diff saved to https://phabricator.wikimedia.org/P69889 and previous config saved to /var/cache/conftool/dbconfig/20241015-070327-arnaudb.json
  • 06:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T367781)', diff saved to https://phabricator.wikimedia.org/P69888 and previous config saved to /var/cache/conftool/dbconfig/20241015-065345-arnaudb.json
  • 06:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T367781)', diff saved to https://phabricator.wikimedia.org/P69887 and previous config saved to /var/cache/conftool/dbconfig/20241015-065130-arnaudb.json
  • 06:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 06:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 06:30 kart_: Updated MinT to 2024-10-11-113932-production
  • 06:27 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:18 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:16 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 06:08 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:38 _joe_: restart tomcat on idp1004
  • 05:35 _joe_: restart tomcat on idp2004
  • 05:15 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:10 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 04:00 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.24 (duration: 00m 56s)
  • 03:51 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.27 refs T375658 (duration: 48m 30s)
  • 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.27 refs T375658
  • 02:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 02:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 02:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T376905)', diff saved to https://phabricator.wikimedia.org/P69885 and previous config saved to /var/cache/conftool/dbconfig/20241015-024037-ladsgroup.json
  • 02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P69884 and previous config saved to /var/cache/conftool/dbconfig/20241015-022530-ladsgroup.json
  • 02:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P69883 and previous config saved to /var/cache/conftool/dbconfig/20241015-021023-ladsgroup.json
  • 01:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T376905)', diff saved to https://phabricator.wikimedia.org/P69882 and previous config saved to /var/cache/conftool/dbconfig/20241015-015516-ladsgroup.json
  • 01:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T376905)', diff saved to https://phabricator.wikimedia.org/P69881 and previous config saved to /var/cache/conftool/dbconfig/20241015-014831-ladsgroup.json
  • 01:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 01:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 01:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T376905)', diff saved to https://phabricator.wikimedia.org/P69880 and previous config saved to /var/cache/conftool/dbconfig/20241015-014803-ladsgroup.json
  • 01:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P69879 and previous config saved to /var/cache/conftool/dbconfig/20241015-013257-ladsgroup.json
  • 01:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P69878 and previous config saved to /var/cache/conftool/dbconfig/20241015-011749-ladsgroup.json
  • 01:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T376905)', diff saved to https://phabricator.wikimedia.org/P69877 and previous config saved to /var/cache/conftool/dbconfig/20241015-010242-ladsgroup.json
  • 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T376905)', diff saved to https://phabricator.wikimedia.org/P69876 and previous config saved to /var/cache/conftool/dbconfig/20241015-005551-ladsgroup.json
  • 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T370903)', diff saved to https://phabricator.wikimedia.org/P69875 and previous config saved to /var/cache/conftool/dbconfig/20241015-005546-ladsgroup.json
  • 00:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 00:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 00:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T376905)', diff saved to https://phabricator.wikimedia.org/P69874 and previous config saved to /var/cache/conftool/dbconfig/20241015-005525-ladsgroup.json
  • 00:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69873 and previous config saved to /var/cache/conftool/dbconfig/20241015-004039-ladsgroup.json
  • 00:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P69872 and previous config saved to /var/cache/conftool/dbconfig/20241015-004018-ladsgroup.json
  • 00:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69871 and previous config saved to /var/cache/conftool/dbconfig/20241015-002531-ladsgroup.json
  • 00:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P69870 and previous config saved to /var/cache/conftool/dbconfig/20241015-002511-ladsgroup.json
  • 00:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T370903)', diff saved to https://phabricator.wikimedia.org/P69869 and previous config saved to /var/cache/conftool/dbconfig/20241015-001024-ladsgroup.json
  • 00:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T376905)', diff saved to https://phabricator.wikimedia.org/P69868 and previous config saved to /var/cache/conftool/dbconfig/20241015-001004-ladsgroup.json
  • 00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T376905)', diff saved to https://phabricator.wikimedia.org/P69867 and previous config saved to /var/cache/conftool/dbconfig/20241015-000304-ladsgroup.json
  • 00:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 00:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 00:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69866 and previous config saved to /var/cache/conftool/dbconfig/20241015-000236-ladsgroup.json

2024-10-14

  • 23:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P69865 and previous config saved to /var/cache/conftool/dbconfig/20241014-234729-ladsgroup.json
  • 23:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P69864 and previous config saved to /var/cache/conftool/dbconfig/20241014-233222-ladsgroup.json
  • 23:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T370903)', diff saved to https://phabricator.wikimedia.org/P69863 and previous config saved to /var/cache/conftool/dbconfig/20241014-232857-ladsgroup.json
  • 23:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 23:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 23:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69862 and previous config saved to /var/cache/conftool/dbconfig/20241014-232835-ladsgroup.json
  • 23:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69861 and previous config saved to /var/cache/conftool/dbconfig/20241014-231715-ladsgroup.json
  • 23:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69860 and previous config saved to /var/cache/conftool/dbconfig/20241014-231328-ladsgroup.json
  • 23:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T376905)', diff saved to https://phabricator.wikimedia.org/P69859 and previous config saved to /var/cache/conftool/dbconfig/20241014-230903-ladsgroup.json
  • 23:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 23:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 23:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69858 and previous config saved to /var/cache/conftool/dbconfig/20241014-230838-ladsgroup.json
  • 22:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69857 and previous config saved to /var/cache/conftool/dbconfig/20241014-225818-ladsgroup.json
  • 22:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69856 and previous config saved to /var/cache/conftool/dbconfig/20241014-225528-ladsgroup.json
  • 22:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P69855 and previous config saved to /var/cache/conftool/dbconfig/20241014-225331-ladsgroup.json
  • 22:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69854 and previous config saved to /var/cache/conftool/dbconfig/20241014-224311-ladsgroup.json
  • 22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69853 and previous config saved to /var/cache/conftool/dbconfig/20241014-224022-ladsgroup.json
  • 22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P69852 and previous config saved to /var/cache/conftool/dbconfig/20241014-223824-ladsgroup.json
  • 22:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69851 and previous config saved to /var/cache/conftool/dbconfig/20241014-222515-ladsgroup.json
  • 22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69850 and previous config saved to /var/cache/conftool/dbconfig/20241014-222317-ladsgroup.json
  • 22:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69849 and previous config saved to /var/cache/conftool/dbconfig/20241014-222009-ladsgroup.json
  • 22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T376905)', diff saved to https://phabricator.wikimedia.org/P69848 and previous config saved to /var/cache/conftool/dbconfig/20241014-221508-ladsgroup.json
  • 22:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 22:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 22:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69847 and previous config saved to /var/cache/conftool/dbconfig/20241014-221443-ladsgroup.json
  • 22:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69846 and previous config saved to /var/cache/conftool/dbconfig/20241014-221008-ladsgroup.json
  • 22:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69845 and previous config saved to /var/cache/conftool/dbconfig/20241014-220504-ladsgroup.json
  • 22:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T370903)', diff saved to https://phabricator.wikimedia.org/P69844 and previous config saved to /var/cache/conftool/dbconfig/20241014-220134-ladsgroup.json
  • 22:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 22:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P69843 and previous config saved to /var/cache/conftool/dbconfig/20241014-215936-ladsgroup.json
  • 21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69842 and previous config saved to /var/cache/conftool/dbconfig/20241014-214958-ladsgroup.json
  • 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T371742)', diff saved to https://phabricator.wikimedia.org/P69841 and previous config saved to /var/cache/conftool/dbconfig/20241014-214515-ladsgroup.json
  • 21:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 21:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P69840 and previous config saved to /var/cache/conftool/dbconfig/20241014-214429-ladsgroup.json
  • 21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T367856)', diff saved to https://phabricator.wikimedia.org/P69839 and previous config saved to /var/cache/conftool/dbconfig/20241014-213902-ladsgroup.json
  • 21:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69838 and previous config saved to /var/cache/conftool/dbconfig/20241014-213453-ladsgroup.json
  • 21:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69837 and previous config saved to /var/cache/conftool/dbconfig/20241014-212922-ladsgroup.json
  • 21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T376905)', diff saved to https://phabricator.wikimedia.org/P69836 and previous config saved to /var/cache/conftool/dbconfig/20241014-212001-ladsgroup.json
  • 21:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 21:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 21:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69835 and previous config saved to /var/cache/conftool/dbconfig/20241014-211937-ladsgroup.json
  • 21:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P69834 and previous config saved to /var/cache/conftool/dbconfig/20241014-210430-ladsgroup.json
  • 20:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P69833 and previous config saved to /var/cache/conftool/dbconfig/20241014-204923-ladsgroup.json
  • 20:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69832 and previous config saved to /var/cache/conftool/dbconfig/20241014-203416-ladsgroup.json
  • 20:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T376905)', diff saved to https://phabricator.wikimedia.org/P69831 and previous config saved to /var/cache/conftool/dbconfig/20241014-202504-ladsgroup.json
  • 20:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 20:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69830 and previous config saved to /var/cache/conftool/dbconfig/20241014-202439-ladsgroup.json
  • 20:21 TheresNoTime: UTC late backport window done
  • 20:18 samtar@deploy2002: Finished scap sync-world: Backport for Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648) (duration: 08m 14s)
  • 20:14 samtar@deploy2002: samtar, pppery: Continuing with sync
  • 20:12 samtar@deploy2002: samtar, pppery: Backport for Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:10 samtar@deploy2002: Started scap sync-world: Backport for Missing.php: Redirect Scots Wiktionary to Scots Wikipedia (T249648)
  • 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P69829 and previous config saved to /var/cache/conftool/dbconfig/20241014-200932-ladsgroup.json
  • 19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P69828 and previous config saved to /var/cache/conftool/dbconfig/20241014-195425-ladsgroup.json
  • 19:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69827 and previous config saved to /var/cache/conftool/dbconfig/20241014-193918-ladsgroup.json
  • 19:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T376905)', diff saved to https://phabricator.wikimedia.org/P69826 and previous config saved to /var/cache/conftool/dbconfig/20241014-192956-ladsgroup.json
  • 19:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 18:57 aqu@deploy2002: Finished deploy [airflow-dags/analytics@a1a70ce]: Deploy last version for Refine staging [airflow-dags@a1a70ce8] (duration: 00m 29s)
  • 18:57 aqu@deploy2002: Started deploy [airflow-dags/analytics@a1a70ce]: Deploy last version for Refine staging [airflow-dags@a1a70ce8]
  • 18:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 18:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 18:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69825 and previous config saved to /var/cache/conftool/dbconfig/20241014-185225-ladsgroup.json
  • 18:47 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@a1a70ce]: Deploy last fixes on Refine staging [airflow-dags@a1a70ce8] (duration: 00m 13s)
  • 18:47 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@a1a70ce]: Deploy last fixes on Refine staging [airflow-dags@a1a70ce8]
  • 18:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P69824 and previous config saved to /var/cache/conftool/dbconfig/20241014-183718-ladsgroup.json
  • 18:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P69823 and previous config saved to /var/cache/conftool/dbconfig/20241014-182211-ladsgroup.json
  • 18:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69822 and previous config saved to /var/cache/conftool/dbconfig/20241014-180704-ladsgroup.json
  • 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P69821 and previous config saved to /var/cache/conftool/dbconfig/20241014-170647-ladsgroup.json
  • 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69820 and previous config saved to /var/cache/conftool/dbconfig/20241014-170123-ladsgroup.json
  • 16:51 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
  • 16:50 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
  • 16:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P69819 and previous config saved to /var/cache/conftool/dbconfig/20241014-164616-ladsgroup.json
  • 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P69818 and previous config saved to /var/cache/conftool/dbconfig/20241014-163109-ladsgroup.json
  • 16:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69817 and previous config saved to /var/cache/conftool/dbconfig/20241014-161602-ladsgroup.json
  • 16:03 sergi0: Running `sgimeno@mwmaint2002:~$ foreachwiki userOptions.php --delete --old=1 growthexperiments-tour-newimpact-discovery` (T376461)
  • 15:52 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
  • 15:46 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
  • 15:16 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P69816 and previous config saved to /var/cache/conftool/dbconfig/20241014-151546-ladsgroup.json
  • 15:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 15:15 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 15:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69815 and previous config saved to /var/cache/conftool/dbconfig/20241014-151521-ladsgroup.json
  • 15:07 elukey@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:06 elukey@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 15:05 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 15:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P69814 and previous config saved to /var/cache/conftool/dbconfig/20241014-150014-ladsgroup.json
  • 14:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P69813 and previous config saved to /var/cache/conftool/dbconfig/20241014-144507-ladsgroup.json
  • 14:43 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
  • 14:43 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:41 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:41 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:39 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69812 and previous config saved to /var/cache/conftool/dbconfig/20241014-143000-ladsgroup.json
  • 14:16 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-worker1177.eqiad.wmnet
  • 14:16 stevemunene@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:16 stevemunene@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1177.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
  • 14:16 stevemunene@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1177.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
  • 14:12 stevemunene@cumin1002: START - Cookbook sre.dns.netbox
  • 14:12 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:10 Lucas_WMDE: [untruncated duration: 06m 48s]
  • 14:09 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for refactor(tests): don't use per-method coverage annotation, refactor(HomepageHooks): extract method for simpler modifyability, Clear LinkRecommendation suggestions on page save (T364341 T372337), Run fixLinkRecommendationData even when disabled in CC (T373176) (duration: 0
  • 14:07 stevemunene@cumin1002: START - Cookbook sre.hosts.decommission for hosts an-worker1177.eqiad.wmnet
  • 14:07 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-worker1176.eqiad.wmnet
  • 14:07 stevemunene@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:07 stevemunene@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1176.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
  • 14:06 stevemunene@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1176.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1002"
  • 14:04 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Continuing with sync
  • 14:04 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Backport for refactor(tests): don't use per-method coverage annotation, refactor(HomepageHooks): extract method for simpler modifyability, Clear LinkRecommendation suggestions on page save (T364341 T372337), Run fixLinkRecommendationData even when disabled in CC (T373176) synced to
  • 14:03 stevemunene@cumin1002: START - Cookbook sre.dns.netbox
  • 14:02 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for refactor(tests): don't use per-method coverage annotation, refactor(HomepageHooks): extract method for simpler modifyability, Clear LinkRecommendation suggestions on page save (T364341 T372337), Run fixLinkRecommendationData even when disabled in CC (T373176)
  • 13:58 stevemunene@cumin1002: START - Cookbook sre.hosts.decommission for hosts an-worker1176.eqiad.wmnet
  • 13:46 ladsgroup@deploy2002: Finished scap sync-world: Backport for Update interwiki.php (duration: 07m 00s)
  • 13:45 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@fbcf880]: T375480 (duration: 01m 07s)
  • 13:44 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@fbcf880]: T375480
  • 13:41 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 13:41 ladsgroup@deploy2002: ladsgroup: Backport for Update interwiki.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:39 ladsgroup@deploy2002: Started scap sync-world: Backport for Update interwiki.php
  • 13:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-etcd1002.eqiad.wmnet
  • 13:35 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:35 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
  • 13:34 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
  • 13:31 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 13:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P69811 and previous config saved to /var/cache/conftool/dbconfig/20241014-132944-ladsgroup.json
  • 13:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 13:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 13:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69810 and previous config saved to /var/cache/conftool/dbconfig/20241014-132918-ladsgroup.json
  • 13:26 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-etcd1002.eqiad.wmnet
  • 13:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-etcd1001.eqiad.wmnet
  • 13:26 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:26 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
  • 13:26 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-etcd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
  • 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69809 and previous config saved to /var/cache/conftool/dbconfig/20241014-132409-ladsgroup.json
  • 13:22 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 13:18 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-etcd1001.eqiad.wmnet
  • 13:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: about to decom
  • 13:16 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1002.eqiad.wmnet with reason: about to decom
  • 13:15 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: about to decom
  • 13:15 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on aux-k8s-etcd1001.eqiad.wmnet with reason: about to decom
  • 13:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P69808 and previous config saved to /var/cache/conftool/dbconfig/20241014-131411-ladsgroup.json
  • 13:13 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [uawikimedia] Enable the CampaignEvents extension (T376695) (duration: 10m 19s)
  • 13:09 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
  • 13:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69807 and previous config saved to /var/cache/conftool/dbconfig/20241014-130904-ladsgroup.json
  • 13:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for [uawikimedia] Enable the CampaignEvents extension (T376695) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [uawikimedia] Enable the CampaignEvents extension (T376695)
  • 12:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P69806 and previous config saved to /var/cache/conftool/dbconfig/20241014-125904-ladsgroup.json
  • 12:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69805 and previous config saved to /var/cache/conftool/dbconfig/20241014-125358-ladsgroup.json
  • 12:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69804 and previous config saved to /var/cache/conftool/dbconfig/20241014-124554-arnaudb.json
  • 12:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 12:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 12:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69803 and previous config saved to /var/cache/conftool/dbconfig/20241014-124532-arnaudb.json
  • 12:44 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 12s)
  • 12:44 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
  • 12:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69802 and previous config saved to /var/cache/conftool/dbconfig/20241014-124357-ladsgroup.json
  • 12:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-worker1001.eqiad.wmnet
  • 12:43 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:43 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-worker1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
  • 12:41 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-worker1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
  • 12:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69801 and previous config saved to /var/cache/conftool/dbconfig/20241014-123853-ladsgroup.json
  • 12:37 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 12:32 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-worker1001.eqiad.wmnet
  • 12:32 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aux-k8s-ctrl1001.eqiad.wmnet
  • 12:32 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:32 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
  • 12:32 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aux-k8s-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin1002"
  • 12:30 hnowlan: removed all aqsv1 service components from aqs* hosts
  • 12:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P69800 and previous config saved to /var/cache/conftool/dbconfig/20241014-123025-arnaudb.json
  • 12:28 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 12:23 elukey@cumin1002: START - Cookbook sre.hosts.decommission for hosts aux-k8s-ctrl1001.eqiad.wmnet
  • 12:22 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-worker1001.eqiad.wmnet
  • 12:22 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-ctrl1001.eqiad.wmnet
  • 12:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P69799 and previous config saved to /var/cache/conftool/dbconfig/20241014-121518-arnaudb.json
  • 12:09 elukey: increase etcd k8s aux cluster from 3 -> 5 - T344230
  • 12:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69798 and previous config saved to /var/cache/conftool/dbconfig/20241014-120011-arnaudb.json
  • 11:59 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:59 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb2004-dev cloud-private adddress - aborrero@cumin1002"
  • 11:59 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb2004-dev cloud-private adddress - aborrero@cumin1002"
  • 11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T367781)', diff saved to https://phabricator.wikimedia.org/P69797 and previous config saved to /var/cache/conftool/dbconfig/20241014-115755-arnaudb.json
  • 11:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 11:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69796 and previous config saved to /var/cache/conftool/dbconfig/20241014-115732-arnaudb.json
  • 11:56 Dreamy_Jazz: Started time limited scan on enwiki for MediaModeration - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 11:56 aborrero@cumin1002: START - Cookbook sre.dns.netbox
  • 11:52 btullis@cumin1002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 11:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2194.codfw.wmnet onto db2227.codfw.wmnet
  • 11:50 btullis@cumin1002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
  • 11:50 hnowlan@deploy2002: Finished deploy [restbase/deploy@26112d4]: Remove unused AQS components. Add bdrwiki (T371761) (duration: 15m 38s)
  • 11:45 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P69794 and previous config saved to /var/cache/conftool/dbconfig/20241014-114341-ladsgroup.json
  • 11:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69793 and previous config saved to /var/cache/conftool/dbconfig/20241014-114316-ladsgroup.json
  • 11:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P69792 and previous config saved to /var/cache/conftool/dbconfig/20241014-114225-arnaudb.json
  • 11:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69791 and previous config saved to /var/cache/conftool/dbconfig/20241014-113941-arnaudb.json
  • 11:34 hnowlan@deploy2002: Started deploy [restbase/deploy@26112d4]: Remove unused AQS components. Add bdrwiki (T371761)
  • 11:31 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@c9a2532]: (no justification provided) (duration: 00m 08s)
  • 11:30 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@c9a2532]: (no justification provided)
  • 11:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P69790 and previous config saved to /var/cache/conftool/dbconfig/20241014-112809-ladsgroup.json
  • 11:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P69789 and previous config saved to /var/cache/conftool/dbconfig/20241014-112719-arnaudb.json
  • 11:26 claime: Running ./redis-check-aof --fix on rdb1014 tcp_6379 instance - T376961
  • 11:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69788 and previous config saved to /var/cache/conftool/dbconfig/20241014-112434-arnaudb.json
  • 11:16 ladsgroup@deploy2002: Finished scap sync-world: Creating bclwikisource (T377084) (duration: 06m 49s)
  • 11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P69787 and previous config saved to /var/cache/conftool/dbconfig/20241014-111302-ladsgroup.json
  • 11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69786 and previous config saved to /var/cache/conftool/dbconfig/20241014-111211-arnaudb.json
  • 11:10 ladsgroup@deploy2002: Started scap sync-world: Creating bclwikisource (T377084)
  • 11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T367781)', diff saved to https://phabricator.wikimedia.org/P69785 and previous config saved to /var/cache/conftool/dbconfig/20241014-110956-arnaudb.json
  • 11:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 11:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69784 and previous config saved to /var/cache/conftool/dbconfig/20241014-110933-arnaudb.json
  • 11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69783 and previous config saved to /var/cache/conftool/dbconfig/20241014-110927-arnaudb.json
  • 11:07 ladsgroup@deploy2002: Finished scap sync-world: Creating ibawiki (T376568) (duration: 06m 45s)
  • 11:05 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet
  • 11:01 ladsgroup@deploy2002: Started scap sync-world: Creating ibawiki (T376568)
  • 11:00 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
  • 10:58 ladsgroup@deploy2002: Finished scap sync-world: Creating annwiki (T376332) (duration: 06m 45s)
  • 10:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69782 and previous config saved to /var/cache/conftool/dbconfig/20241014-105755-ladsgroup.json
  • 10:55 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 10:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P69781 and previous config saved to /var/cache/conftool/dbconfig/20241014-105426-arnaudb.json
  • 10:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69780 and previous config saved to /var/cache/conftool/dbconfig/20241014-105421-arnaudb.json
  • 10:52 ladsgroup@deploy2002: Started scap sync-world: Creating annwiki (T376332)
  • 10:51 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 10:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T376905)', diff saved to https://phabricator.wikimedia.org/P69779 and previous config saved to /var/cache/conftool/dbconfig/20241014-104941-ladsgroup.json
  • 10:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 10:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 10:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69778 and previous config saved to /var/cache/conftool/dbconfig/20241014-104916-ladsgroup.json
  • 10:48 ladsgroup@deploy2002: Finished scap sync-world: Creating tddwiki (T375422) (duration: 06m 46s)
  • 10:44 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert1002.wikimedia.org with reason: init - oblivian@cumin2002
  • 10:44 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert1002.wikimedia.org with reason: init - oblivian@cumin2002
  • 10:42 ladsgroup@deploy2002: Started scap sync-world: Creating tddwiki (T375422)
  • 10:40 ladsgroup@deploy2002: Finished scap sync-world: Creating nrwiki (T375087) (duration: 06m 54s)
  • 10:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P69777 and previous config saved to /var/cache/conftool/dbconfig/20241014-103919-arnaudb.json
  • 10:35 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
  • 10:35 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
  • 10:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P69776 and previous config saved to /var/cache/conftool/dbconfig/20241014-103410-ladsgroup.json
  • 10:33 ladsgroup@deploy2002: Started scap sync-world: Creating nrwiki (T375087)
  • 10:31 ladsgroup@deploy2002: Finished scap sync-world: Backport for Add namespace translations for Tai Nüa (tdd) (T375421) (duration: 06m 45s)
  • 10:27 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 10:27 ladsgroup@deploy2002: ladsgroup: Backport for Add namespace translations for Tai Nüa (tdd) (T375421) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:25 ladsgroup@deploy2002: Started scap sync-world: Backport for Add namespace translations for Tai Nüa (tdd) (T375421)
  • 10:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69775 and previous config saved to /var/cache/conftool/dbconfig/20241014-102412-arnaudb.json
  • 10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69774 and previous config saved to /var/cache/conftool/dbconfig/20241014-102256-arnaudb.json
  • 10:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 10:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 10:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69773 and previous config saved to /var/cache/conftool/dbconfig/20241014-102234-arnaudb.json
  • 10:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P69772 and previous config saved to /var/cache/conftool/dbconfig/20241014-101903-ladsgroup.json
  • 10:17 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db2194.codfw.wmnet onto db2227.codfw.wmnet
  • 10:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69771 and previous config saved to /var/cache/conftool/dbconfig/20241014-101354-ladsgroup.json
  • 10:13 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1004.wikimedia.org
  • 10:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69770 and previous config saved to /var/cache/conftool/dbconfig/20241014-101246-ladsgroup.json
  • 10:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P69769 and previous config saved to /var/cache/conftool/dbconfig/20241014-100727-arnaudb.json
  • 10:06 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1004.wikimedia.org
  • 10:06 eoghan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists2001.wikimedia.org
  • 10:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69768 and previous config saved to /var/cache/conftool/dbconfig/20241014-100356-ladsgroup.json
  • 10:00 akosiaris: powercycle rdb1014 T376961
  • 10:00 eoghan@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists2001.wikimedia.org
  • 10:00 oblivian@cumin2002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
  • 10:00 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
  • 10:00 ladsgroup@deploy2002: Finished scap sync-world: Creating rskwiki (T374963) (duration: 18m 38s)
  • 09:59 oblivian@cumin2002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
  • 09:59 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert2002.wikimedia.org with reason: init - oblivian@cumin2002
  • 09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69767 and previous config saved to /var/cache/conftool/dbconfig/20241014-095354-arnaudb.json
  • 09:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 09:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69766 and previous config saved to /var/cache/conftool/dbconfig/20241014-095331-arnaudb.json
  • 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P69765 and previous config saved to /var/cache/conftool/dbconfig/20241014-095220-arnaudb.json
  • 09:41 ladsgroup@deploy2002: Started scap sync-world: Creating rskwiki (T374963)
  • 09:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69764 and previous config saved to /var/cache/conftool/dbconfig/20241014-093824-arnaudb.json
  • 09:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69763 and previous config saved to /var/cache/conftool/dbconfig/20241014-093713-arnaudb.json
  • 09:36 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T367781)', diff saved to https://phabricator.wikimedia.org/P69762 and previous config saved to /var/cache/conftool/dbconfig/20241014-093459-arnaudb.json
  • 09:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 09:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69761 and previous config saved to /var/cache/conftool/dbconfig/20241014-093418-arnaudb.json
  • 09:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69760 and previous config saved to /var/cache/conftool/dbconfig/20241014-092317-arnaudb.json
  • 09:21 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P69759 and previous config saved to /var/cache/conftool/dbconfig/20241014-091911-arnaudb.json
  • 09:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 09:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69758 and previous config saved to /var/cache/conftool/dbconfig/20241014-090810-arnaudb.json
  • 09:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P69757 and previous config saved to /var/cache/conftool/dbconfig/20241014-090403-arnaudb.json
  • 09:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P69756 and previous config saved to /var/cache/conftool/dbconfig/20241014-090340-ladsgroup.json
  • 09:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 09:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 09:01 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2005.codfw.wmnet
  • 08:58 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 08:55 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
  • 08:55 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2004.codfw.wmnet
  • 08:49 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2004.codfw.wmnet
  • 08:49 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2003.codfw.wmnet
  • 08:49 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 08:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69755 and previous config saved to /var/cache/conftool/dbconfig/20241014-084856-arnaudb.json
  • 08:48 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 08:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T367781)', diff saved to https://phabricator.wikimedia.org/P69754 and previous config saved to /var/cache/conftool/dbconfig/20241014-084643-arnaudb.json
  • 08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 08:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 08:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69753 and previous config saved to /var/cache/conftool/dbconfig/20241014-084620-arnaudb.json
  • 08:43 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2003.codfw.wmnet
  • 08:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:40 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P69752 and previous config saved to /var/cache/conftool/dbconfig/20241014-083113-arnaudb.json
  • 08:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P69751 and previous config saved to /var/cache/conftool/dbconfig/20241014-081606-arnaudb.json
  • 08:13 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2003.codfw.wmnet
  • 08:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:12 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:10 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2004.codfw.wmnet
  • 08:08 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2003.codfw.wmnet
  • 08:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69750 and previous config saved to /var/cache/conftool/dbconfig/20241014-080744-arnaudb.json
  • 08:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 08:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 08:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69749 and previous config saved to /var/cache/conftool/dbconfig/20241014-080721-arnaudb.json
  • 08:07 jayme@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kubestagemaster2005.codfw.wmnet
  • 08:02 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2004.codfw.wmnet
  • 08:01 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
  • 08:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69748 and previous config saved to /var/cache/conftool/dbconfig/20241014-080059-arnaudb.json
  • 08:00 jayme@cumin1002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM kubestagemaster2005.codfw.wmnet
  • 08:00 jayme@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM kubestagemaster2005.codfw.wmnet
  • 07:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T367781)', diff saved to https://phabricator.wikimedia.org/P69747 and previous config saved to /var/cache/conftool/dbconfig/20241014-075845-arnaudb.json
  • 07:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 07:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 07:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69746 and previous config saved to /var/cache/conftool/dbconfig/20241014-075823-arnaudb.json
  • 07:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 07:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 07:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69745 and previous config saved to /var/cache/conftool/dbconfig/20241014-075214-arnaudb.json
  • 07:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P69744 and previous config saved to /var/cache/conftool/dbconfig/20241014-074317-arnaudb.json
  • 07:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69743 and previous config saved to /var/cache/conftool/dbconfig/20241014-073707-arnaudb.json
  • 07:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P69742 and previous config saved to /var/cache/conftool/dbconfig/20241014-072810-arnaudb.json
  • 07:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69741 and previous config saved to /var/cache/conftool/dbconfig/20241014-072201-arnaudb.json
  • 07:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69740 and previous config saved to /var/cache/conftool/dbconfig/20241014-071302-arnaudb.json
  • 07:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T367781)', diff saved to https://phabricator.wikimedia.org/P69739 and previous config saved to /var/cache/conftool/dbconfig/20241014-071048-arnaudb.json
  • 07:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 07:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 07:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69738 and previous config saved to /var/cache/conftool/dbconfig/20241014-071026-arnaudb.json
  • 06:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P69737 and previous config saved to /var/cache/conftool/dbconfig/20241014-065519-arnaudb.json
  • 06:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P69736 and previous config saved to /var/cache/conftool/dbconfig/20241014-064012-arnaudb.json
  • 06:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69735 and previous config saved to /var/cache/conftool/dbconfig/20241014-062505-arnaudb.json
  • 06:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T367781)', diff saved to https://phabricator.wikimedia.org/P69734 and previous config saved to /var/cache/conftool/dbconfig/20241014-062249-arnaudb.json
  • 06:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 06:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 06:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69733 and previous config saved to /var/cache/conftool/dbconfig/20241014-062135-arnaudb.json
  • 06:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 06:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 06:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 06:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 04:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 04:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 04:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69732 and previous config saved to /var/cache/conftool/dbconfig/20241014-042443-ladsgroup.json
  • 04:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69731 and previous config saved to /var/cache/conftool/dbconfig/20241014-040936-ladsgroup.json
  • 03:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69730 and previous config saved to /var/cache/conftool/dbconfig/20241014-035429-ladsgroup.json
  • 03:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69729 and previous config saved to /var/cache/conftool/dbconfig/20241014-033922-ladsgroup.json
  • 03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T376905)', diff saved to https://phabricator.wikimedia.org/P69728 and previous config saved to /var/cache/conftool/dbconfig/20241014-033237-ladsgroup.json
  • 03:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 03:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 03:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 03:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 03:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69727 and previous config saved to /var/cache/conftool/dbconfig/20241014-032710-ladsgroup.json
  • 03:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P69726 and previous config saved to /var/cache/conftool/dbconfig/20241014-031203-ladsgroup.json
  • 02:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P69725 and previous config saved to /var/cache/conftool/dbconfig/20241014-025656-ladsgroup.json
  • 02:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69724 and previous config saved to /var/cache/conftool/dbconfig/20241014-024149-ladsgroup.json
  • 02:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T376905)', diff saved to https://phabricator.wikimedia.org/P69723 and previous config saved to /var/cache/conftool/dbconfig/20241014-023616-ladsgroup.json
  • 02:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 02:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 02:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69722 and previous config saved to /var/cache/conftool/dbconfig/20241014-023551-ladsgroup.json
  • 02:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P69721 and previous config saved to /var/cache/conftool/dbconfig/20241014-022044-ladsgroup.json
  • 02:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P69720 and previous config saved to /var/cache/conftool/dbconfig/20241014-020537-ladsgroup.json
  • 01:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69719 and previous config saved to /var/cache/conftool/dbconfig/20241014-015030-ladsgroup.json
  • 01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T376905)', diff saved to https://phabricator.wikimedia.org/P69718 and previous config saved to /var/cache/conftool/dbconfig/20241014-014435-ladsgroup.json
  • 01:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69717 and previous config saved to /var/cache/conftool/dbconfig/20241014-014410-ladsgroup.json
  • 01:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P69716 and previous config saved to /var/cache/conftool/dbconfig/20241014-012903-ladsgroup.json
  • 01:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P69715 and previous config saved to /var/cache/conftool/dbconfig/20241014-011356-ladsgroup.json
  • 00:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69714 and previous config saved to /var/cache/conftool/dbconfig/20241014-005849-ladsgroup.json
  • 00:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T376905)', diff saved to https://phabricator.wikimedia.org/P69713 and previous config saved to /var/cache/conftool/dbconfig/20241014-005056-ladsgroup.json
  • 00:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 00:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 00:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69712 and previous config saved to /var/cache/conftool/dbconfig/20241014-005042-ladsgroup.json
  • 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P69711 and previous config saved to /var/cache/conftool/dbconfig/20241014-003534-ladsgroup.json
  • 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P69710 and previous config saved to /var/cache/conftool/dbconfig/20241014-002027-ladsgroup.json
  • 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69709 and previous config saved to /var/cache/conftool/dbconfig/20241014-000520-ladsgroup.json

2024-10-13

  • 23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T376905)', diff saved to https://phabricator.wikimedia.org/P69708 and previous config saved to /var/cache/conftool/dbconfig/20241013-235726-ladsgroup.json
  • 23:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 23:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T376905)', diff saved to https://phabricator.wikimedia.org/P69707 and previous config saved to /var/cache/conftool/dbconfig/20241013-235701-ladsgroup.json
  • 23:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P69706 and previous config saved to /var/cache/conftool/dbconfig/20241013-234154-ladsgroup.json
  • 23:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P69705 and previous config saved to /var/cache/conftool/dbconfig/20241013-232647-ladsgroup.json
  • 23:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T376905)', diff saved to https://phabricator.wikimedia.org/P69704 and previous config saved to /var/cache/conftool/dbconfig/20241013-231140-ladsgroup.json
  • 23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T376905)', diff saved to https://phabricator.wikimedia.org/P69703 and previous config saved to /var/cache/conftool/dbconfig/20241013-230403-ladsgroup.json
  • 23:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 23:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 12:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: maintenance
  • 12:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: maintenance
  • 12:11 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db2147', diff saved to https://phabricator.wikimedia.org/P69702 and previous config saved to /var/cache/conftool/dbconfig/20241013-121154-arnaudb.json
  • 10:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367856)', diff saved to https://phabricator.wikimedia.org/P69701 and previous config saved to /var/cache/conftool/dbconfig/20241013-102205-ladsgroup.json
  • 10:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P69700 and previous config saved to /var/cache/conftool/dbconfig/20241013-100658-ladsgroup.json
  • 09:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P69699 and previous config saved to /var/cache/conftool/dbconfig/20241013-095151-ladsgroup.json
  • 09:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367856)', diff saved to https://phabricator.wikimedia.org/P69698 and previous config saved to /var/cache/conftool/dbconfig/20241013-093644-ladsgroup.json

2024-10-11

  • 22:18 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P{cephosd100[3-5]*} and (A:cephosd)
  • 21:38 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P{cephosd100[3-5]*} and (A:cephosd)
  • 21:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1002.eqiad.wmnet
  • 21:26 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1002.eqiad.wmnet
  • 21:24 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
  • 21:14 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
  • 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:49 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd
  • 16:40 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0 (duration: 00m 42s)
  • 16:39 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0
  • 16:38 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0 (duration: 01m 06s)
  • 16:38 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@c1d2914]: bump section topics to v0.16.0
  • 16:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2004-dev.codfw.wmnet with reason: host reimage
  • 16:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2004-dev.codfw.wmnet with reason: host reimage
  • 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
  • 16:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
  • 16:11 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@1fb69c4]: T376456 (duration: 01m 15s)
  • 16:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:10 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@1fb69c4]: T376456
  • 15:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 15:40 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd
  • 15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cloudgw - cmooney@cumin1002"
  • 15:37 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cloudgw - cmooney@cumin1002"
  • 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 15:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 15:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 14:48 eevans@deploy2002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 14:48 eevans@deploy2002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 14:47 urandom: upgrading data-gateway to v1.0.10
  • 14:46 eevans@deploy2002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 14:46 eevans@deploy2002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 14:39 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 14:38 eevans@deploy2002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 14:31 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@c9a2532]: (no justification provided) (duration: 00m 25s)
  • 14:30 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@c9a2532]: (no justification provided)
  • 13:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: T376988', diff saved to https://phabricator.wikimedia.org/P69695 and previous config saved to /var/cache/conftool/dbconfig/20241011-135903-arnaudb.json
  • 13:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 13:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: T376988', diff saved to https://phabricator.wikimedia.org/P69694 and previous config saved to /var/cache/conftool/dbconfig/20241011-134357-arnaudb.json
  • 13:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: T376988', diff saved to https://phabricator.wikimedia.org/P69693 and previous config saved to /var/cache/conftool/dbconfig/20241011-132852-arnaudb.json
  • 13:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: T376988', diff saved to https://phabricator.wikimedia.org/P69692 and previous config saved to /var/cache/conftool/dbconfig/20241011-131347-arnaudb.json
  • 13:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "renamed k8s prefixes descriptions in Netbox - ayounsi@cumin1002"
  • 13:12 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "renamed k8s prefixes descriptions in Netbox - ayounsi@cumin1002"
  • 13:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 12:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: T376988', diff saved to https://phabricator.wikimedia.org/P69691 and previous config saved to /var/cache/conftool/dbconfig/20241011-125841-arnaudb.json
  • 12:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: T376988', diff saved to https://phabricator.wikimedia.org/P69690 and previous config saved to /var/cache/conftool/dbconfig/20241011-124336-arnaudb.json
  • 12:37 hashar: Restarting Gerrit
  • 12:34 akosiaris@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts scandium.eqiad.wmnet
  • 12:34 akosiaris@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:34 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: scandium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1002"
  • 12:34 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: scandium.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - akosiaris@cumin1002"
  • 12:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 2%: T376988', diff saved to https://phabricator.wikimedia.org/P69688 and previous config saved to /var/cache/conftool/dbconfig/20241011-122830-arnaudb.json
  • 12:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: T376988', diff saved to https://phabricator.wikimedia.org/P69687 and previous config saved to /var/cache/conftool/dbconfig/20241011-121325-arnaudb.json
  • 11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T367856)', diff saved to https://phabricator.wikimedia.org/P69686 and previous config saved to /var/cache/conftool/dbconfig/20241011-114446-ladsgroup.json
  • 11:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 11:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P69685 and previous config saved to /var/cache/conftool/dbconfig/20241011-114424-ladsgroup.json
  • 11:36 akosiaris@cumin1002: START - Cookbook sre.dns.netbox
  • 11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P69684 and previous config saved to /var/cache/conftool/dbconfig/20241011-112917-ladsgroup.json
  • 11:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker2092.codfw.wmnet
  • 11:27 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2092.codfw.wmnet
  • 11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2092.codfw.wmnet
  • 11:26 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2092.codfw.wmnet
  • 11:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2092.codfw.wmnet with OS bullseye
  • 11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P69683 and previous config saved to /var/cache/conftool/dbconfig/20241011-111410-ladsgroup.json
  • 11:02 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
  • 10:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P69682 and previous config saved to /var/cache/conftool/dbconfig/20241011-105903-ladsgroup.json
  • 10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2092.codfw.wmnet with reason: host reimage
  • 10:57 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
  • 10:56 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
  • 10:56 cgoubert@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:55 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2092.codfw.wmnet with reason: host reimage
  • 10:53 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
  • 10:50 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd
  • 10:50 fabfur: enabled puppet on R:acme_chief::cert for T376800
  • 10:50 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:47 fabfur@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host acmechief2002.codfw.wmnet
  • 10:44 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief2002.codfw.wmnet
  • 10:44 fabfur: rebooting acmechief1002|2002 (sequentially) (T376800)
  • 10:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1002.eqiad.wmnet
  • 10:37 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief1002.eqiad.wmnet
  • 10:35 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2092.codfw.wmnet with OS bullseye
  • 10:34 fabfur: disabled puppet on acmechief1002 (T376800)
  • 10:33 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2175.codfw.wmnet with reason: index corruption
  • 10:33 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2175.codfw.wmnet with reason: index corruption
  • 10:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2092.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTARTand with Dell SCP reboot policy GRACEFUL
  • 10:27 jynus@cumin1002: dbctl commit (dc=all): 'depool db2175', diff saved to https://phabricator.wikimedia.org/P69680 and previous config saved to /var/cache/conftool/dbconfig/20241011-102706-jynus.json
  • 10:26 fabfur: disabling puppet on R:acme_chief::cert for T376800
  • 10:23 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2092.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTARTand with Dell SCP reboot policy GRACEFUL
  • 09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T367856)', diff saved to https://phabricator.wikimedia.org/P69678 and previous config saved to /var/cache/conftool/dbconfig/20241011-095847-ladsgroup.json
  • 09:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 09:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T367856)', diff saved to https://phabricator.wikimedia.org/P69677 and previous config saved to /var/cache/conftool/dbconfig/20241011-095826-ladsgroup.json
  • 09:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P69676 and previous config saved to /var/cache/conftool/dbconfig/20241011-094319-ladsgroup.json
  • 09:41 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd
  • 09:38 akosiaris@cumin1002: START - Cookbook sre.hosts.decommission for hosts scandium.eqiad.wmnet
  • 09:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P69675 and previous config saved to /var/cache/conftool/dbconfig/20241011-092812-ladsgroup.json
  • 09:27 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 09:18 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
  • 09:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T367856)', diff saved to https://phabricator.wikimedia.org/P69674 and previous config saved to /var/cache/conftool/dbconfig/20241011-091305-ladsgroup.json
  • 08:19 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 08:17 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 08:12 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 08:10 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
  • 08:10 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 08:02 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 08:00 moritzm: upload ircstream 0.13.0+wmf12u2 to apt.wikimedia.org (sync to latest git and the async_broadcast feature branch) T376014
  • 07:59 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
  • 07:56 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
  • 02:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T367781)', diff saved to https://phabricator.wikimedia.org/P69673 and previous config saved to /var/cache/conftool/dbconfig/20241011-021156-arnaudb.json
  • 01:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P69672 and previous config saved to /var/cache/conftool/dbconfig/20241011-015649-arnaudb.json
  • 01:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P69671 and previous config saved to /var/cache/conftool/dbconfig/20241011-014142-arnaudb.json
  • 01:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T367781)', diff saved to https://phabricator.wikimedia.org/P69670 and previous config saved to /var/cache/conftool/dbconfig/20241011-012635-arnaudb.json
  • 01:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T367781)', diff saved to https://phabricator.wikimedia.org/P69669 and previous config saved to /var/cache/conftool/dbconfig/20241011-012424-arnaudb.json
  • 01:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 01:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 01:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69668 and previous config saved to /var/cache/conftool/dbconfig/20241011-012401-arnaudb.json
  • 01:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69667 and previous config saved to /var/cache/conftool/dbconfig/20241011-010854-arnaudb.json
  • 00:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P69666 and previous config saved to /var/cache/conftool/dbconfig/20241011-005347-arnaudb.json
  • 00:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69665 and previous config saved to /var/cache/conftool/dbconfig/20241011-003840-arnaudb.json

2024-10-10

  • 23:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367781)', diff saved to https://phabricator.wikimedia.org/P69664 and previous config saved to /var/cache/conftool/dbconfig/20241010-233814-arnaudb.json
  • 23:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 23:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 23:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367781)', diff saved to https://phabricator.wikimedia.org/P69663 and previous config saved to /var/cache/conftool/dbconfig/20241010-233752-arnaudb.json
  • 23:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P69662 and previous config saved to /var/cache/conftool/dbconfig/20241010-232245-arnaudb.json
  • 23:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P69661 and previous config saved to /var/cache/conftool/dbconfig/20241010-230738-arnaudb.json
  • 22:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367781)', diff saved to https://phabricator.wikimedia.org/P69660 and previous config saved to /var/cache/conftool/dbconfig/20241010-225231-arnaudb.json
  • 22:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T367781)', diff saved to https://phabricator.wikimedia.org/P69659 and previous config saved to /var/cache/conftool/dbconfig/20241010-225019-arnaudb.json
  • 22:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 22:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 22:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69658 and previous config saved to /var/cache/conftool/dbconfig/20241010-224957-arnaudb.json
  • 22:37 cstone: payments-wiki upgraded from ebb42c67 to 40e4a592
  • 22:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P69657 and previous config saved to /var/cache/conftool/dbconfig/20241010-223450-arnaudb.json
  • 22:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P69656 and previous config saved to /var/cache/conftool/dbconfig/20241010-221943-arnaudb.json
  • 22:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69655 and previous config saved to /var/cache/conftool/dbconfig/20241010-220437-arnaudb.json
  • 22:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T367781)', diff saved to https://phabricator.wikimedia.org/P69654 and previous config saved to /var/cache/conftool/dbconfig/20241010-220125-arnaudb.json
  • 22:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 22:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 22:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 22:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 22:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69653 and previous config saved to /var/cache/conftool/dbconfig/20241010-220043-arnaudb.json
  • 21:52 jforrester@deploy2002: Finished deploy [integration/docroot@ff9e25a]: Add Codex PHP doc and source code link, for T375939 (duration: 00m 08s)
  • 21:52 jforrester@deploy2002: Started deploy [integration/docroot@ff9e25a]: Add Codex PHP doc and source code link, for T375939
  • 21:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69652 and previous config saved to /var/cache/conftool/dbconfig/20241010-214536-arnaudb.json
  • 21:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P69651 and previous config saved to /var/cache/conftool/dbconfig/20241010-213029-arnaudb.json
  • 21:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69650 and previous config saved to /var/cache/conftool/dbconfig/20241010-211522-arnaudb.json
  • 21:05 aqu@deploy2002: Finished deploy [airflow-dags/analytics@c9a2532]: Webrequest-Refine fix [airflow-dags@c9a2532e] (duration: 00m 51s)
  • 21:04 aqu@deploy2002: Started deploy [airflow-dags/analytics@c9a2532]: Webrequest-Refine fix [airflow-dags@c9a2532e]
  • 21:04 thcipriani@deploy2002: Finished scap sync-world: Backport for Update VE core submodule to master (c98f3a542) (T376901) (duration: 08m 56s)
  • 20:59 thcipriani@deploy2002: jforrester, thcipriani: Continuing with sync
  • 20:57 thcipriani@deploy2002: jforrester, thcipriani: Backport for Update VE core submodule to master (c98f3a542) (T376901) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:55 thcipriani@deploy2002: Started scap sync-world: Backport for Update VE core submodule to master (c98f3a542) (T376901)
  • 20:27 eileen: config revision changed from 150b02a9 to 3c6d2054
  • 20:23 thcipriani@deploy2002: Finished scap sync-world: Backport for REST: Make experimental endpoints available on beta and testwiki (T375512) (duration: 08m 34s)
  • 20:18 thcipriani@deploy2002: bpirkle, thcipriani: Continuing with sync
  • 20:16 thcipriani@deploy2002: bpirkle, thcipriani: Backport for REST: Make experimental endpoints available on beta and testwiki (T375512) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367781)', diff saved to https://phabricator.wikimedia.org/P69649 and previous config saved to /var/cache/conftool/dbconfig/20241010-201456-arnaudb.json
  • 20:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 20:14 thcipriani@deploy2002: Started scap sync-world: Backport for REST: Make experimental endpoints available on beta and testwiki (T375512)
  • 20:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 20:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69648 and previous config saved to /var/cache/conftool/dbconfig/20241010-201433-arnaudb.json
  • 20:05 eileen: civicrm upgraded from 07dee21c to ff3144dd
  • 19:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69647 and previous config saved to /var/cache/conftool/dbconfig/20241010-195926-arnaudb.json
  • 19:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P69646 and previous config saved to /var/cache/conftool/dbconfig/20241010-194419-arnaudb.json
  • 19:43 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Webrequest-Refine fix on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
  • 19:43 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Webrequest-Refine fix on test cluster [airflow-dags@4b69f503]
  • 19:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69645 and previous config saved to /var/cache/conftool/dbconfig/20241010-192912-arnaudb.json
  • 19:23 rzl@deploy2002: Finished scap sync-world: chart version bump for 1078720 (duration: 02m 09s)
  • 19:21 rzl@deploy2002: Started scap sync-world: chart version bump for 1078720
  • 19:06 eileen: config revision changed from ae4a5be9 to 150b02a9
  • 18:50 papaul: maintenance on mr1-eqiad complete
  • 18:44 eileen: tools upgraded from 632bf430 to 62f2d170
  • 18:29 eileen: tools upgraded from e9c05e30 to 632bf430
  • 18:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P69644 and previous config saved to /var/cache/conftool/dbconfig/20241010-182846-arnaudb.json
  • 18:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 18:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 18:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 18:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 18:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367781)', diff saved to https://phabricator.wikimedia.org/P69643 and previous config saved to /var/cache/conftool/dbconfig/20241010-182808-arnaudb.json
  • 18:14 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 18:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P69642 and previous config saved to /var/cache/conftool/dbconfig/20241010-181301-arnaudb.json
  • 18:08 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 18:00 papaul: ongoing maintenance on mr1-eqiad
  • 17:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P69641 and previous config saved to /var/cache/conftool/dbconfig/20241010-175754-arnaudb.json
  • 17:57 root@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for dbprov1001.eqiad.wmnet: Renew puppet certificate - root@cumin1002
  • 17:54 root@cumin1002: START - Cookbook sre.puppet.renew-cert for dbprov1001.eqiad.wmnet: Renew puppet certificate - root@cumin1002
  • 17:47 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool echostore in eqiad: Repooling echostore after migration to service mesh - T376766
  • 17:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367781)', diff saved to https://phabricator.wikimedia.org/P69640 and previous config saved to /var/cache/conftool/dbconfig/20241010-174247-arnaudb.json
  • 17:42 swfrench@cumin2002: START - Cookbook sre.discovery.service-route pool echostore in eqiad: Repooling echostore after migration to service mesh - T376766
  • 17:39 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 17:39 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 17:38 swfrench-wmf: removing echostore eqiad deployment (depooled) to unblock breaking change - T376766
  • 17:34 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:34 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:34 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:33 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:33 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:32 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:25 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool echostore in eqiad: Depooling echostore for migration to service mesh - T376766
  • 17:20 swfrench@cumin2002: START - Cookbook sre.discovery.service-route depool echostore in eqiad: Depooling echostore for migration to service mesh - T376766
  • 17:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:04 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool echostore in codfw: Repooling echostore after migration to service mesh - T376766
  • 16:59 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1001.eqiad.wmnet
  • 16:58 swfrench@cumin2002: START - Cookbook sre.discovery.service-route pool echostore in codfw: Repooling echostore after migration to service mesh - T376766
  • 16:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:53 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:51 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:51 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1003.eqiad.wmnet
  • 16:51 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1003.eqiad.wmnet
  • 16:50 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 16:50 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 16:49 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd1001.eqiad.wmnet
  • 16:47 swfrench-wmf: removing echostore codfw deployment (depooled) to unblock breaking change - T376766
  • 16:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T367781)', diff saved to https://phabricator.wikimedia.org/P69639 and previous config saved to /var/cache/conftool/dbconfig/20241010-164221-arnaudb.json
  • 16:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 16:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 16:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367781)', diff saved to https://phabricator.wikimedia.org/P69638 and previous config saved to /var/cache/conftool/dbconfig/20241010-164159-arnaudb.json
  • 16:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bookworm
  • 16:30 jhathaway@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 16:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69637 and previous config saved to /var/cache/conftool/dbconfig/20241010-162652-arnaudb.json
  • 16:23 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
  • 16:23 jhathaway@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003.eqiad.wmnet']
  • 16:21 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
  • 16:18 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool echostore in codfw: Depooling echostore for migration to service mesh - T376766
  • 16:13 swfrench@cumin2002: START - Cookbook sre.discovery.service-route depool echostore in codfw: Depooling echostore for migration to service mesh - T376766
  • 16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P69636 and previous config saved to /var/cache/conftool/dbconfig/20241010-161145-arnaudb.json
  • 16:04 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bookworm
  • 16:03 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage1003.eqiad.wmnet
  • 16:02 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1003.eqiad.wmnet
  • 15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367781)', diff saved to https://phabricator.wikimedia.org/P69635 and previous config saved to /var/cache/conftool/dbconfig/20241010-155638-arnaudb.json
  • 15:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T367781)', diff saved to https://phabricator.wikimedia.org/P69634 and previous config saved to /var/cache/conftool/dbconfig/20241010-155426-arnaudb.json
  • 15:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 15:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 15:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 15:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T367781)', diff saved to https://phabricator.wikimedia.org/P69633 and previous config saved to /var/cache/conftool/dbconfig/20241010-155345-arnaudb.json
  • 15:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:47 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
  • 15:40 papaul: mr1-drmrs maintenance complete
  • 15:39 dancy@deploy2002: Installation of scap version "4.110.0" completed for 211 hosts
  • 15:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P69632 and previous config saved to /var/cache/conftool/dbconfig/20241010-153838-arnaudb.json
  • 15:35 dancy@deploy2002: Installing scap version "4.110.0" for 211 hosts
  • 15:33 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:28 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 15:25 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 15:23 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir
  • 15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P69631 and previous config saved to /var/cache/conftool/dbconfig/20241010-152331-arnaudb.json
  • 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T367781)', diff saved to https://phabricator.wikimedia.org/P69630 and previous config saved to /var/cache/conftool/dbconfig/20241010-150824-arnaudb.json
  • 15:08 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 15:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T367781)', diff saved to https://phabricator.wikimedia.org/P69629 and previous config saved to /var/cache/conftool/dbconfig/20241010-150512-arnaudb.json
  • 15:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 15:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 15:04 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T367781)', diff saved to https://phabricator.wikimedia.org/P69628 and previous config saved to /var/cache/conftool/dbconfig/20241010-150433-arnaudb.json
  • 15:02 papaul: ongoing maintenance on mr1-drmrs
  • 14:56 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Revert previous staging of Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
  • 14:56 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Revert previous staging of Refine fixes on test cluster [airflow-dags@4b69f503]
  • 14:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P69626 and previous config saved to /var/cache/conftool/dbconfig/20241010-144926-arnaudb.json
  • 14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367781)', diff saved to https://phabricator.wikimedia.org/P69625 and previous config saved to /var/cache/conftool/dbconfig/20241010-143713-arnaudb.json
  • 14:34 jhathaway@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1002.eqiad.wmnet']
  • 14:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P69624 and previous config saved to /var/cache/conftool/dbconfig/20241010-143419-arnaudb.json
  • 14:28 jhathaway@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1002.eqiad.wmnet']
  • 14:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69623 and previous config saved to /var/cache/conftool/dbconfig/20241010-142206-arnaudb.json
  • 14:19 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
  • 14:19 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
  • 14:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T367781)', diff saved to https://phabricator.wikimedia.org/P69622 and previous config saved to /var/cache/conftool/dbconfig/20241010-141912-arnaudb.json
  • 14:18 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 14:18 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 14:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T367781)', diff saved to https://phabricator.wikimedia.org/P69621 and previous config saved to /var/cache/conftool/dbconfig/20241010-141704-arnaudb.json
  • 14:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 14:16 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 14:16 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir
  • 14:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 14:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T367781)', diff saved to https://phabricator.wikimedia.org/P69620 and previous config saved to /var/cache/conftool/dbconfig/20241010-141642-arnaudb.json
  • 14:16 moritzm: failover Ganeti masters in magru to secondary node
  • 14:12 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 14:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P69619 and previous config saved to /var/cache/conftool/dbconfig/20241010-140659-arnaudb.json
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet
  • 14:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P69618 and previous config saved to /var/cache/conftool/dbconfig/20241010-140135-arnaudb.json
  • 13:59 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:ulsfo and A:dnsbox
  • 13:59 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns4004.wikimedia.org
  • 13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367781)', diff saved to https://phabricator.wikimedia.org/P69617 and previous config saved to /var/cache/conftool/dbconfig/20241010-135152-arnaudb.json
  • 13:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet
  • 13:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T367781)', diff saved to https://phabricator.wikimedia.org/P69616 and previous config saved to /var/cache/conftool/dbconfig/20241010-134926-arnaudb.json
  • 13:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 13:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 13:48 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns4004.wikimedia.org
  • 13:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P69615 and previous config saved to /var/cache/conftool/dbconfig/20241010-134628-arnaudb.json
  • 13:46 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:45 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Use ?? instead of default value in getRawVal() (T376245) (duration: 07m 16s)
  • 13:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet
  • 13:41 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, fomafix: Continuing with sync
  • 13:41 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, fomafix: Backport for Use ?? instead of default value in getRawVal() (T376245) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:38 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Use ?? instead of default value in getRawVal() (T376245)
  • 13:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Turn on mobile support for Parsoid Read Views (but not on talk pages) (T269499 T376048), Turn on Parsoid Selective Update metrics (take 2) (T371713 T376433) (duration: 16m 09s)
  • 13:36 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns4003.wikimedia.org
  • 13:35 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns4003.wikimedia.org
  • 13:35 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns4003.wikimedia.org
  • 13:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet
  • 13:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cscott: Continuing with sync
  • 13:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T367781)', diff saved to https://phabricator.wikimedia.org/P69613 and previous config saved to /var/cache/conftool/dbconfig/20241010-133121-arnaudb.json
  • 13:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T367781)', diff saved to https://phabricator.wikimedia.org/P69612 and previous config saved to /var/cache/conftool/dbconfig/20241010-133113-arnaudb.json
  • 13:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 13:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 13:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367781)', diff saved to https://phabricator.wikimedia.org/P69611 and previous config saved to /var/cache/conftool/dbconfig/20241010-133049-arnaudb.json
  • 13:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet
  • 13:23 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, cscott: Backport for Turn on mobile support for Parsoid Read Views (but not on talk pages) (T269499 T376048), Turn on Parsoid Selective Update metrics (take 2) (T371713 T376433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:21 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Turn on mobile support for Parsoid Read Views (but not on talk pages) (T269499 T376048), Turn on Parsoid Selective Update metrics (take 2) (T371713 T376433)
  • 13:17 dreamyjazz@deploy2002: Finished scap sync-world: Backport for QuickSurvey.vue: Support using HTML in thank you message (T376517), extension.json: Add mediawiki.jqueryMsg to dependencies for ext.quicksurveys.lib (T376517) (duration: 09m 12s)
  • 13:17 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns4003.wikimedia.org
  • 13:17 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:ulsfo and A:dnsbox
  • 13:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P69610 and previous config saved to /var/cache/conftool/dbconfig/20241010-131542-arnaudb.json
  • 13:12 dreamyjazz@deploy2002: dreamyjazz, kharlan: Continuing with sync
  • 13:11 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1004.eqiad.wmnet
  • 13:11 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1004.eqiad.wmnet
  • 13:10 dreamyjazz@deploy2002: dreamyjazz, kharlan: Backport for QuickSurvey.vue: Support using HTML in thank you message (T376517), extension.json: Add mediawiki.jqueryMsg to dependencies for ext.quicksurveys.lib (T376517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2034.codfw.wmnet
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2034.codfw.wmnet
  • 13:08 dreamyjazz@deploy2002: Started scap sync-world: Backport for QuickSurvey.vue: Support using HTML in thank you message (T376517), extension.json: Add mediawiki.jqueryMsg to dependencies for ext.quicksurveys.lib (T376517)
  • 13:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2034.codfw.wmnet
  • 13:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
  • 13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
  • 13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2034.codfw.wmnet
  • 13:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P69609 and previous config saved to /var/cache/conftool/dbconfig/20241010-130035-arnaudb.json
  • 12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
  • 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet
  • 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
  • 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
  • 12:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet
  • 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3003.esams.wmnet
  • 12:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3003.esams.wmnet
  • 12:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367781)', diff saved to https://phabricator.wikimedia.org/P69608 and previous config saved to /var/cache/conftool/dbconfig/20241010-124528-arnaudb.json
  • 12:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T367781)', diff saved to https://phabricator.wikimedia.org/P69607 and previous config saved to /var/cache/conftool/dbconfig/20241010-124319-arnaudb.json
  • 12:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 12:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 12:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 12:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 12:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T367781)', diff saved to https://phabricator.wikimedia.org/P69606 and previous config saved to /var/cache/conftool/dbconfig/20241010-124241-arnaudb.json
  • 12:38 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1004.eqiad.wmnet with OS bookworm
  • 12:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
  • 12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
  • 12:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P69605 and previous config saved to /var/cache/conftool/dbconfig/20241010-122734-arnaudb.json
  • 12:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
  • 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
  • 12:19 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
  • 12:16 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
  • 12:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P69604 and previous config saved to /var/cache/conftool/dbconfig/20241010-121227-arnaudb.json
  • 12:00 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage1004.eqiad.wmnet with OS bookworm
  • 11:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T367781)', diff saved to https://phabricator.wikimedia.org/P69603 and previous config saved to /var/cache/conftool/dbconfig/20241010-115720-arnaudb.json
  • 11:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69599 and previous config saved to /var/cache/conftool/dbconfig/20241010-114042-arnaudb.json
  • 11:34 zabe@deploy2002: Finished scap sync-world: Backport for s2: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 06m 58s)
  • 11:29 zabe@deploy2002: zabe: Continuing with sync
  • 11:29 zabe@deploy2002: zabe: Backport for s2: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:27 zabe@deploy2002: Started scap sync-world: Backport for s2: Reduce revision-slots cache expiry to 60 seconds (T183490)
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
  • 11:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P69598 and previous config saved to /var/cache/conftool/dbconfig/20241010-112535-arnaudb.json
  • 11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
  • 11:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow7001.magru.wmnet
  • 11:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow7001.magru.wmnet
  • 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2008.wikimedia.org
  • 11:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2008.wikimedia.org
  • 11:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T367781)', diff saved to https://phabricator.wikimedia.org/P69597 and previous config saved to /var/cache/conftool/dbconfig/20241010-111028-arnaudb.json
  • 11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T367781)', diff saved to https://phabricator.wikimedia.org/P69596 and previous config saved to /var/cache/conftool/dbconfig/20241010-110920-arnaudb.json
  • 11:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 11:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 11:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367781)', diff saved to https://phabricator.wikimedia.org/P69595 and previous config saved to /var/cache/conftool/dbconfig/20241010-110857-arnaudb.json
  • 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2007.codfw.wmnet
  • 10:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2007.codfw.wmnet
  • 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2006.codfw.wmnet
  • 10:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P69594 and previous config saved to /var/cache/conftool/dbconfig/20241010-105350-arnaudb.json
  • 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2006.codfw.wmnet
  • 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
  • 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
  • 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2004.codfw.wmnet
  • 10:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2004.codfw.wmnet
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testhost2001.codfw.wmnet
  • 10:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P69593 and previous config saved to /var/cache/conftool/dbconfig/20241010-103843-arnaudb.json
  • 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testhost2001.codfw.wmnet
  • 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2003.codfw.wmnet
  • 10:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367781)', diff saved to https://phabricator.wikimedia.org/P69592 and previous config saved to /var/cache/conftool/dbconfig/20241010-102336-arnaudb.json
  • 10:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2003.codfw.wmnet
  • 10:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T367781)', diff saved to https://phabricator.wikimedia.org/P69591 and previous config saved to /var/cache/conftool/dbconfig/20241010-102127-arnaudb.json
  • 10:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 10:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 10:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367781)', diff saved to https://phabricator.wikimedia.org/P69590 and previous config saved to /var/cache/conftool/dbconfig/20241010-102104-arnaudb.json
  • 10:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P69589 and previous config saved to /var/cache/conftool/dbconfig/20241010-100557-arnaudb.json
  • 09:54 jayme@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host kubestage1004.eqiad.wmnet
  • 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1002.wikimedia.org
  • 09:52 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1004.eqiad.wmnet
  • 09:52 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1003.eqiad.wmnet
  • 09:52 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1003.eqiad.wmnet
  • 09:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P69587 and previous config saved to /var/cache/conftool/dbconfig/20241010-095050-arnaudb.json
  • 09:50 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bookworm
  • 09:49 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt1002.wikimedia.org
  • 09:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367781)', diff saved to https://phabricator.wikimedia.org/P69586 and previous config saved to /var/cache/conftool/dbconfig/20241010-093544-arnaudb.json
  • 09:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T367781)', diff saved to https://phabricator.wikimedia.org/P69585 and previous config saved to /var/cache/conftool/dbconfig/20241010-093335-arnaudb.json
  • 09:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 09:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 09:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T367781)', diff saved to https://phabricator.wikimedia.org/P69584 and previous config saved to /var/cache/conftool/dbconfig/20241010-093313-arnaudb.json
  • 09:33 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
  • 09:30 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
  • 09:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367781)', diff saved to https://phabricator.wikimedia.org/P69583 and previous config saved to /var/cache/conftool/dbconfig/20241010-092735-arnaudb.json
  • 09:21 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.26 refs T375657
  • 09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P69582 and previous config saved to /var/cache/conftool/dbconfig/20241010-091806-arnaudb.json
  • 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt-staging2001.codfw.wmnet
  • 09:14 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bookworm
  • 09:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69581 and previous config saved to /var/cache/conftool/dbconfig/20241010-091228-arnaudb.json
  • 09:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt-staging2001.codfw.wmnet
  • 09:10 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage1003.eqiad.wmnet
  • 09:10 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1003.eqiad.wmnet
  • 09:07 aklapper@deploy2002: Finished scap sync-world: Backport for Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814) (duration: 12m 09s)
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2002.wikimedia.org
  • 09:03 aklapper@deploy2002: hashar, aklapper: Continuing with sync
  • 09:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P69580 and previous config saved to /var/cache/conftool/dbconfig/20241010-090259-arnaudb.json
  • 09:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2002.wikimedia.org
  • 08:57 aklapper@deploy2002: hashar, aklapper: Backport for Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P69579 and previous config saved to /var/cache/conftool/dbconfig/20241010-085721-arnaudb.json
  • 08:55 aklapper@deploy2002: Started scap sync-world: Backport for Revert "Use HTML markup instead of bidi control chars in wiki changes" (T375975 T376814)
  • 08:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T367781)', diff saved to https://phabricator.wikimedia.org/P69578 and previous config saved to /var/cache/conftool/dbconfig/20241010-084752-arnaudb.json
  • 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T367781)', diff saved to https://phabricator.wikimedia.org/P69577 and previous config saved to /var/cache/conftool/dbconfig/20241010-084543-arnaudb.json
  • 08:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 08:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367781)', diff saved to https://phabricator.wikimedia.org/P69576 and previous config saved to /var/cache/conftool/dbconfig/20241010-084521-arnaudb.json
  • 08:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367781)', diff saved to https://phabricator.wikimedia.org/P69575 and previous config saved to /var/cache/conftool/dbconfig/20241010-084214-arnaudb.json
  • 08:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on cloudsw1-b1-codfw.mgmt with reason: prevent bgp alerts firing until CRs configured
  • 08:41 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on cloudsw1-b1-codfw.mgmt with reason: prevent bgp alerts firing until CRs configured
  • 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
  • 08:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T367781)', diff saved to https://phabricator.wikimedia.org/P69574 and previous config saved to /var/cache/conftool/dbconfig/20241010-084003-arnaudb.json
  • 08:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 08:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
  • 08:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: T376868', diff saved to https://phabricator.wikimedia.org/P69573 and previous config saved to /var/cache/conftool/dbconfig/20241010-083347-arnaudb.json
  • 08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P69572 and previous config saved to /var/cache/conftool/dbconfig/20241010-083013-arnaudb.json
  • 08:21 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
  • 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
  • 08:21 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
  • 08:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 75%: T376868', diff saved to https://phabricator.wikimedia.org/P69571 and previous config saved to /var/cache/conftool/dbconfig/20241010-081841-arnaudb.json
  • 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
  • 08:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P69570 and previous config saved to /var/cache/conftool/dbconfig/20241010-081506-arnaudb.json
  • 08:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: T376867', diff saved to https://phabricator.wikimedia.org/P69569 and previous config saved to /var/cache/conftool/dbconfig/20241010-080711-arnaudb.json
  • 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1002.eqiad.wmnet
  • 08:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 50%: T376868', diff saved to https://phabricator.wikimedia.org/P69568 and previous config saved to /var/cache/conftool/dbconfig/20241010-080336-arnaudb.json
  • 08:02 moritzm: irc.wikimedia.org not directs to the ircstream implementation on irc1003.wikimedia.org T376014
  • 08:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367781)', diff saved to https://phabricator.wikimedia.org/P69567 and previous config saved to /var/cache/conftool/dbconfig/20241010-075959-arnaudb.json
  • 07:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T367781)', diff saved to https://phabricator.wikimedia.org/P69566 and previous config saved to /var/cache/conftool/dbconfig/20241010-075951-arnaudb.json
  • 07:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 07:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 07:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1002.eqiad.wmnet
  • 07:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367781)', diff saved to https://phabricator.wikimedia.org/P69565 and previous config saved to /var/cache/conftool/dbconfig/20241010-075911-arnaudb.json
  • 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
  • 07:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 75%: T376867', diff saved to https://phabricator.wikimedia.org/P69564 and previous config saved to /var/cache/conftool/dbconfig/20241010-075206-arnaudb.json
  • 07:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 25%: T376868', diff saved to https://phabricator.wikimedia.org/P69563 and previous config saved to /var/cache/conftool/dbconfig/20241010-074831-arnaudb.json
  • 07:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
  • 07:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
  • 07:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P69562 and previous config saved to /var/cache/conftool/dbconfig/20241010-074404-arnaudb.json
  • 07:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
  • 07:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 50%: T376867', diff saved to https://phabricator.wikimedia.org/P69561 and previous config saved to /var/cache/conftool/dbconfig/20241010-073700-arnaudb.json
  • 07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudidm2001-dev.codfw.wmnet
  • 07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudidm2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 07:33 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudidm2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - slyngshede@cumin1002"
  • 07:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 10%: T376868', diff saved to https://phabricator.wikimedia.org/P69560 and previous config saved to /var/cache/conftool/dbconfig/20241010-073326-arnaudb.json
  • 07:33 awight: UTC morning deployments done.
  • 07:32 hashar: Stopped gerrit service on gerrit2003.codfw.wmnet since it is not starting up properly | T372804
  • 07:32 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:31 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:30 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 07:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P69559 and previous config saved to /var/cache/conftool/dbconfig/20241010-072857-arnaudb.json
  • 07:28 awight@deploy2002: Finished scap sync-world: Backport for [config] Rename moved gadget name setting (T362771) (duration: 09m 22s)
  • 07:25 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudidm2001-dev.codfw.wmnet
  • 07:23 awight@deploy2002: awight, wmde-fisch: Continuing with sync
  • 07:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 25%: T376867', diff saved to https://phabricator.wikimedia.org/P69558 and previous config saved to /var/cache/conftool/dbconfig/20241010-072155-arnaudb.json
  • 07:21 awight@deploy2002: awight, wmde-fisch: Backport for [config] Rename moved gadget name setting (T362771) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
  • 07:18 awight@deploy2002: Started scap sync-world: Backport for [config] Rename moved gadget name setting (T362771)
  • 07:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1236 (re)pooling @ 5%: T376868', diff saved to https://phabricator.wikimedia.org/P69557 and previous config saved to /var/cache/conftool/dbconfig/20241010-071820-arnaudb.json
  • 07:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1236 T376868', diff saved to https://phabricator.wikimedia.org/P69556 and previous config saved to /var/cache/conftool/dbconfig/20241010-071721-arnaudb.json
  • 07:16 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 07:16 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 07:15 slyngshede@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts cloudidm2001-dev.codfw.wmnet
  • 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
  • 07:15 slyngshede@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudidm2001-dev.codfw.wmnet
  • 07:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1181 to s7 primary T376868', diff saved to https://phabricator.wikimedia.org/P69555 and previous config saved to /var/cache/conftool/dbconfig/20241010-071453-arnaudb.json
  • 07:14 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 07:14 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 07:14 arnaudb: Starting s7 eqiad failover from db1236 to db1181 - T376868
  • 07:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367781)', diff saved to https://phabricator.wikimedia.org/P69554 and previous config saved to /var/cache/conftool/dbconfig/20241010-071350-arnaudb.json
  • 07:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T367781)', diff saved to https://phabricator.wikimedia.org/P69553 and previous config saved to /var/cache/conftool/dbconfig/20241010-071242-arnaudb.json
  • 07:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 07:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 07:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367781)', diff saved to https://phabricator.wikimedia.org/P69552 and previous config saved to /var/cache/conftool/dbconfig/20241010-071219-arnaudb.json
  • 07:08 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 07:08 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1181 with weight 0 T376868', diff saved to https://phabricator.wikimedia.org/P69551 and previous config saved to /var/cache/conftool/dbconfig/20241010-070843-arnaudb.json
  • 07:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T376868
  • 07:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T376868
  • 07:06 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: T376867', diff saved to https://phabricator.wikimedia.org/P69550 and previous config saved to /var/cache/conftool/dbconfig/20241010-070650-arnaudb.json
  • 06:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P69549 and previous config saved to /var/cache/conftool/dbconfig/20241010-065712-arnaudb.json
  • 06:56 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 06:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: T376867', diff saved to https://phabricator.wikimedia.org/P69548 and previous config saved to /var/cache/conftool/dbconfig/20241010-065145-arnaudb.json
  • 06:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1230 T376867', diff saved to https://phabricator.wikimedia.org/P69547 and previous config saved to /var/cache/conftool/dbconfig/20241010-065048-arnaudb.json
  • 06:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1183 to s5 primary T376867', diff saved to https://phabricator.wikimedia.org/P69546 and previous config saved to /var/cache/conftool/dbconfig/20241010-064827-arnaudb.json
  • 06:47 arnaudb: Starting s5 eqiad failover from db1230 to db1183 - T376867
  • 06:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 06:43 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 06:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1183 with weight 0 T376867', diff saved to https://phabricator.wikimedia.org/P69545 and previous config saved to /var/cache/conftool/dbconfig/20241010-064219-arnaudb.json
  • 06:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T376867
  • 06:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P69544 and previous config saved to /var/cache/conftool/dbconfig/20241010-064206-arnaudb.json
  • 06:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T376867
  • 06:37 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 06:37 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 06:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367781)', diff saved to https://phabricator.wikimedia.org/P69543 and previous config saved to /var/cache/conftool/dbconfig/20241010-062659-arnaudb.json
  • 06:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T367781)', diff saved to https://phabricator.wikimedia.org/P69542 and previous config saved to /var/cache/conftool/dbconfig/20241010-062450-arnaudb.json
  • 06:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 06:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 06:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 06:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 06:10 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 06:10 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 06:03 XioNoX: cr2-eqsin> request vmhost snapshot - T375961
  • 03:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69541 and previous config saved to /var/cache/conftool/dbconfig/20241010-031553-ladsgroup.json
  • 03:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69540 and previous config saved to /var/cache/conftool/dbconfig/20241010-031531-ladsgroup.json
  • 03:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69539 and previous config saved to /var/cache/conftool/dbconfig/20241010-030048-ladsgroup.json
  • 03:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69538 and previous config saved to /var/cache/conftool/dbconfig/20241010-030025-ladsgroup.json
  • 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69537 and previous config saved to /var/cache/conftool/dbconfig/20241010-024543-ladsgroup.json
  • 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69536 and previous config saved to /var/cache/conftool/dbconfig/20241010-024519-ladsgroup.json
  • 02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69535 and previous config saved to /var/cache/conftool/dbconfig/20241010-023037-ladsgroup.json
  • 02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69534 and previous config saved to /var/cache/conftool/dbconfig/20241010-023014-ladsgroup.json
  • 02:02 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqsin [reason: repooling eqsin after cr2-eqsin replaced, T375961]
  • 02:02 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqsin [reason: repooling eqsin after cr2-eqsin replaced, T375961]
  • 01:50 sukhe: restart bird on doh5001 and dns5003 to resolve flapping BFD session after cr2-eqsin junos upgrade
  • 01:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1198.eqiad.wmnet onto db1223.eqiad.wmnet
  • 00:46 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus1006.eqiad.wmnet
  • 00:41 eileen: civicrm upgraded from 3b6a7cbb to 07dee21c
  • 00:27 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 00:26 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1007.eqiad.wmnet
  • 00:19 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus1007.eqiad.wmnet
  • 00:19 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus2005.codfw.wmnet
  • 00:02 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 00:02 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet

2024-10-09

  • 23:52 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 23:51 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet
  • 23:49 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2003.wikimedia.org
  • 23:43 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet
  • 23:41 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus1005.eqiad.wmnet
  • 23:26 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 23:25 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2008.codfw.wmnet
  • 23:18 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet
  • 23:07 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
  • 23:02 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
  • 22:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1012.eqiad.wmnet with OS bookworm
  • 22:51 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 22:51 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1198.eqiad.wmnet onto db1223.eqiad.wmnet
  • 22:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69532 and previous config saved to /var/cache/conftool/dbconfig/20241009-225055-ladsgroup.json
  • 22:40 eileen: civicrm upgraded from cc7c7744 to 3b6a7cbb
  • 22:35 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
  • 22:30 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
  • 22:28 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
  • 22:28 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release 20241009-3
  • 22:01 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: release 20241009-3
  • 22:00 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: release 20241009-3
  • 21:57 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: release 20241009-3
  • 21:57 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: release 20241009-3
  • 21:55 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
  • 21:54 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
  • 21:48 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
  • 21:47 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009-2
  • 21:45 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
  • 21:45 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and (A:esams or A:drmrs) and A:dnsbox
  • 21:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns6002.wikimedia.org
  • 21:44 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
  • 21:44 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
  • 21:44 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
  • 21:42 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
  • 21:42 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
  • 21:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69531 and previous config saved to /var/cache/conftool/dbconfig/20241009-214117-ladsgroup.json
  • 21:41 dzahn@cumin2002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
  • 21:32 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20241009
  • 21:30 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns6002.wikimedia.org
  • 21:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69530 and previous config saved to /var/cache/conftool/dbconfig/20241009-212612-ladsgroup.json
  • 21:22 mutante: [apt1002:~] $ sudo -i reprepro --component thirdparty/gitlab-bullseye update bullseye-wikimedia
  • 21:18 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns6001.wikimedia.org
  • 21:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69529 and previous config saved to /var/cache/conftool/dbconfig/20241009-211107-ladsgroup.json
  • 21:08 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns6001.wikimedia.org
  • 20:56 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns3004.wikimedia.org
  • 20:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69528 and previous config saved to /var/cache/conftool/dbconfig/20241009-205601-ladsgroup.json
  • 20:44 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns3004.wikimedia.org
  • 20:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1198.eqiad.wmnet onto db1212.eqiad.wmnet
  • 20:32 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns3003.wikimedia.org
  • 20:17 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns3003.wikimedia.org
  • 20:17 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and (A:esams or A:drmrs) and A:dnsbox
  • 20:12 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 20:12 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 20:08 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns2006*} and A:dnsbox
  • 20:08 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2006.wikimedia.org
  • 19:55 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2006.wikimedia.org
  • 19:55 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns2006*} and A:dnsbox
  • 19:55 swfrench-wmf: removing echostore staging deployment to unblock breaking change - T376766
  • 19:46 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:magru and A:dnsbox
  • 19:46 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7002.wikimedia.org
  • 19:38 mforns@deploy2002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
  • 19:38 mforns@deploy2002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
  • 19:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7002.wikimedia.org
  • 19:35 mforns@deploy2002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
  • 19:35 mforns@deploy2002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
  • 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc2002.codfw.wmnet with OS bookworm
  • 19:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc2001.codfw.wmnet with OS bookworm
  • 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:27 mforns@deploy2002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 19:27 mforns@deploy2002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 19:20 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7001.wikimedia.org
  • 19:05 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7001.wikimedia.org
  • 19:04 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:magru and A:dnsbox
  • 19:04 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:eqsin and A:dnsbox
  • 19:04 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns5004.wikimedia.org
  • 18:49 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns5004.wikimedia.org
  • 18:45 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
  • 18:41 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
  • 18:38 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
  • 18:35 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus3003.esams.wmnet
  • 18:34 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns5003.wikimedia.org
  • 18:34 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
  • 18:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:29 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 18:29 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 18:28 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
  • 18:26 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1198.eqiad.wmnet onto db1212.eqiad.wmnet
  • 18:26 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus5002.eqsin.wmnet
  • 18:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69527 and previous config saved to /var/cache/conftool/dbconfig/20241009-182632-ladsgroup.json
  • 18:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:24 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns5003.wikimedia.org
  • 18:24 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:eqsin and A:dnsbox
  • 18:19 eileen: config revision changed from 739e8794 to ae4a5be9
  • 18:18 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
  • 18:16 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
  • 18:16 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
  • 18:15 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs[5004-5006].eqsin.wmnet
  • 18:15 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs[5004-5006].eqsin.wmnet
  • 18:15 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus7001.magru.wmnet
  • 18:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc2002.codfw.wmnet with reason: host reimage
  • 18:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc2002.codfw.wmnet with reason: host reimage
  • 18:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
  • 18:08 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host prometheus7001.magru.wmnet
  • 18:06 eileen: civicrm upgraded from ae54bd5e to cc7c7744
  • 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc2001.codfw.wmnet with reason: host reimage
  • 18:01 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 18:01 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 17:58 zabe: zabe@mwmaint2002:~$ cat /home/zabe/s5.txt | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php {} --skip /home/zabe/text_table_cleanup/{} --dump /home/zabe/text_table_dump/{} --sleep 1" # T183490
  • 17:53 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2002.codfw.wmnet with OS bookworm
  • 17:53 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
  • 17:51 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 17:51 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 17:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69526 and previous config saved to /var/cache/conftool/dbconfig/20241009-174501-ladsgroup.json
  • 17:44 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
  • 17:41 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 17:40 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
  • 17:38 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet
  • 17:34 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet
  • 17:31 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host alert1002.wikimedia.org
  • 17:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69525 and previous config saved to /var/cache/conftool/dbconfig/20241009-172956-ladsgroup.json
  • 17:23 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert1002.wikimedia.org
  • 17:23 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2002.wikimedia.org
  • 17:23 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert2002.wikimedia.org
  • 17:21 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host alert1002.wikimedia.org
  • 17:13 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert1002.wikimedia.org
  • 17:12 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2002.wikimedia.org
  • 17:12 denisse@cumin2002: START - Cookbook sre.hosts.reboot-single for host alert2002.wikimedia.org
  • 16:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69523 and previous config saved to /var/cache/conftool/dbconfig/20241009-165944-ladsgroup.json
  • 16:50 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 16:50 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 16:50 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 16:50 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 16:50 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 16:50 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 16:48 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 16:48 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 16:44 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:44 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cr IPs facin cloudsw - cmooney@cumin1002"
  • 16:44 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for codfw cr IPs facin cloudsw - cmooney@cumin1002"
  • 16:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1198.eqiad.wmnet onto db1157.eqiad.wmnet
  • 16:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 16:32 bvibber: starting requeueTranscodes on old school mwmaint2002 after the k8s blowup last night
  • 16:23 sukhe: running authdns-update to fix broken zone files on dns2004
  • 16:23 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:23 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: picking up zone file 1.0.e.f.0.0.1.a.0.8.c.e.2.0.a.2.ip6.arpa - sukhe@cumin1002"
  • 16:23 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: picking up zone file 1.0.e.f.0.0.1.a.0.8.c.e.2.0.a.2.ip6.arpa - sukhe@cumin1002"
  • 16:21 sukhe: forcing commit 95858ba through sre.dns.netbox
  • 16:20 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 16:07 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:05 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2002.codfw.wmnet with OS bookworm
  • 16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-misc2001.codfw.wmnet with OS bookworm
  • 16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:58 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns2005.wikimedia.org
  • 15:58 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns2005.wikimedia.org
  • 15:54 sukhe@cumin1002: END (ERROR) - Cookbook sre.dns.roll-reboot (exit_code=97) rolling reboot on A:dnsbox
  • 15:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:53 sukhe: running authdns-update
  • 15:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:52 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-in2001.wikimedia.org
  • 15:49 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs[5004-5006].eqsin.wmnet with reason: site is depooled, cr2-eqsin is being replaced
  • 15:49 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs[5004-5006].eqsin.wmnet with reason: site is depooled, cr2-eqsin is being replaced
  • 15:48 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-in2001.wikimedia.org
  • 15:48 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx-in1001.wikimedia.org
  • 15:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2005.wikimedia.org
  • 15:44 jhathaway@cumin1002: START - Cookbook sre.hosts.reboot-single for host mx-in1001.wikimedia.org
  • 15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:43 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough and A:wikidough
  • 15:30 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2004.wikimedia.org
  • 15:26 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp.wikimedia.org on all recursors
  • 15:26 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache idp.wikimedia.org on all recursors
  • 15:25 fabfur: eqsin depooled for T375961
  • 15:24 fabfur@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqsin [reason: eqsin cr replacement, T375961]
  • 15:24 fabfur@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: eqsin cr replacement, T375961]
  • 15:24 fabfur@cumin1002: END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: depool site eqsin [reason: eqsin cr replacementAA, T375961]
  • 15:24 fabfur@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: eqsin cr replacementAA, T375961]
  • 15:23 mutante: stewards* - rebooting machines - T351202
  • 15:22 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:22 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add IPv6 reverse entry for cloudsw1-b1-codfw interface IPs - cmooney@cumin1002"
  • 15:22 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add IPv6 reverse entry for cloudsw1-b1-codfw interface IPs - cmooney@cumin1002"
  • 15:21 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2004.wikimedia.org
  • 15:20 sukhe: running dummy authdns-update
  • 15:19 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:17 mutante: planet.wikimedia.org - rebooting backends
  • 15:09 mutante: people.wikimedia.org - rebooting backends
  • 15:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint2001.codfw.wmnet
  • 15:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns1006.wikimedia.org
  • 15:07 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns1006.wikimedia.org
  • 15:06 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1006.wikimedia.org
  • 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-maint2001.codfw.wmnet
  • 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host crm2001.codfw.wmnet
  • 15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-eqsin with reason: router replacement
  • 15:03 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqsin with reason: router replacement
  • 15:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr2-eqsin with reason: router replacement
  • 15:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr2-eqsin with reason: router replacement
  • 15:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host crm2001.codfw.wmnet
  • 14:59 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
  • 14:58 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
  • 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint1001.eqiad.wmnet
  • 14:53 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup[2010-2011].codfw.wmnet with reason: T376800
  • 14:52 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup[2010-2011].codfw.wmnet with reason: T376800
  • 14:51 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:51 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:51 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-maint1001.eqiad.wmnet
  • 14:50 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:50 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1006.wikimedia.org
  • 14:47 brouberol@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
  • 14:47 brouberol@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on P{cephosd1001*} and (A:cephosd)
  • 14:47 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 14:45 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:44 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:44 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling reboot on A:durum
  • 14:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:44 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:44 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb2004-dev
  • 14:43 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
  • 14:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet
  • 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet
  • 14:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1005.wikimedia.org
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet
  • 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:31 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 14:31 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 14:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet
  • 14:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:29 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db1198.eqiad.wmnet onto db1157.eqiad.wmnet
  • 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T367856)', diff saved to https://phabricator.wikimedia.org/P69522 and previous config saved to /var/cache/conftool/dbconfig/20241009-142848-ladsgroup.json
  • 14:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 14:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367856)', diff saved to https://phabricator.wikimedia.org/P69521 and previous config saved to /var/cache/conftool/dbconfig/20241009-142826-ladsgroup.json
  • 14:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
  • 14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling for reclone (T375652)', diff saved to https://phabricator.wikimedia.org/P69520 and previous config saved to /var/cache/conftool/dbconfig/20241009-142404-ladsgroup.json
  • 14:23 moritzm: failover master for ganeti/routed to ganeti2033
  • 14:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb2004-dev
  • 14:22 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
  • 14:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1012.eqiad.wmnet with reason: host reimage
  • 14:21 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1005.wikimedia.org
  • 14:21 sukhe: sudo cumin 'O:alerting_host' 'run-puppet-agent'
  • 14:21 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2033.codfw.wmnet
  • 14:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb2004-dev
  • 14:21 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
  • 14:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 14:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 14:20 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 14:20 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 14:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 14:18 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
  • 14:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
  • 14:18 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough and A:wikidough
  • 14:18 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:17 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:14 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2033.codfw.wmnet
  • 14:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P69519 and previous config saved to /var/cache/conftool/dbconfig/20241009-141319-ladsgroup.json
  • 14:12 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
  • 14:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2033.codfw.wmnet
  • 14:11 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:11 moritzm: installing Apache security updates
  • 14:10 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:09 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2033.codfw.wmnet
  • 14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
  • 14:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
  • 14:08 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
  • 14:07 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp2004.wikimedia.org
  • 14:06 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1004.wikimedia.org
  • 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2003.codfw.wmnet
  • 14:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:04 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 14:03 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp2004.wikimedia.org
  • 14:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
  • 14:01 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:01 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1002.eqiad.wmnet
  • 13:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P69517 and previous config saved to /var/cache/conftool/dbconfig/20241009-135812-ladsgroup.json
  • 13:57 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1002.eqiad.wmnet
  • 13:56 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
  • 13:55 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2005.wikimedia.org
  • 13:54 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1003.eqiad.wmnet
  • 13:53 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:53 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:53 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1004.wikimedia.org
  • 13:52 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox
  • 13:52 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
  • 13:51 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:51 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2005.wikimedia.org
  • 13:51 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2004.wikimedia.org
  • 13:50 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1003.eqiad.wmnet
  • 13:50 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup[1010-1011].eqiad.wmnet with reason: T376800
  • 13:50 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup[1010-1011].eqiad.wmnet with reason: T376800
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1028.eqiad.wmnet
  • 13:49 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1001.eqiad.wmnet
  • 13:48 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:48 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2004.wikimedia.org
  • 13:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [brwikimedia] Enable the CampaignEvents extension (T376747) (duration: 07m 04s)
  • 13:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp1004.wikimedia.org
  • 13:45 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1001.eqiad.wmnet
  • 13:45 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host flink-zk1001.eqiad.wmnet
  • 13:44 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host flink-zk1001.eqiad.wmnet
  • 13:44 lucaswerkmeister-wmde@deploy2002: albertoleoncio, lucaswerkmeister-wmde: Continuing with sync
  • 13:44 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp1004.wikimedia.org
  • 13:43 lucaswerkmeister-wmde@deploy2002: albertoleoncio, lucaswerkmeister-wmde: Backport for [brwikimedia] Enable the CampaignEvents extension (T376747) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:43 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test1004.wikimedia.org
  • 13:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367856)', diff saved to https://phabricator.wikimedia.org/P69516 and previous config saved to /var/cache/conftool/dbconfig/20241009-134305-ladsgroup.json
  • 13:42 brouberol@cumin1002: END (ERROR) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=97) for Zookeeper A:zookeeper-flink-eqiad cluster: Roll restart of jvm daemons.
  • 13:42 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1028.eqiad.wmnet
  • 13:41 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [brwikimedia] Enable the CampaignEvents extension (T376747)
  • 13:41 brouberol@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-eqiad cluster: Roll restart of jvm daemons.
  • 13:39 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp-test1004.wikimedia.org
  • 13:39 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling reboot on A:durum
  • 13:39 Lucas_WMDE: lucaswerkmeister-wmde@deploy2002 $ printf 'https://en.wikipedia.org/static/images/%s\n' 'project-logos/sdwiki.png' 'project-logos/sdwiki-1.5x.png' 'project-logos/sdwiki-2x.png' 'mobile/copyright/wikipedia-wordmark-sd.svg' 'mobile/copyright/wikipedia-tagline-sd.svg' | mwscript-k8s --attach -- purgeList.php # T376536
  • 13:35 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for sdwiki: Add new logo and tagline (T376536) (duration: 19m 34s)
  • 13:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm2001.wikimedia.org
  • 13:32 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gerrit2003.wikimedia.org
  • 13:31 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm2001.wikimedia.org
  • 13:30 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ammarpad: Continuing with sync
  • 13:30 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm1001.wikimedia.org
  • 13:28 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm1001.wikimedia.org
  • 13:27 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
  • 13:23 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
  • 13:22 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1004.eqiad.wmnet
  • 13:18 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ammarpad: Backport for sdwiki: Add new logo and tagline (T376536) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:18 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host etherpad1004.eqiad.wmnet
  • 13:16 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad2002.codfw.wmnet
  • 13:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for sdwiki: Add new logo and tagline (T376536)
  • 13:14 kharlan@deploy2002: Finished scap sync-world: Backport for QuickSurveys: Deploy Safety Survey with zero coverage (T376517) (duration: 10m 37s)
  • 13:12 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host etherpad2002.codfw.wmnet
  • 13:09 kharlan@deploy2002: kharlan: Continuing with sync
  • 13:06 kharlan@deploy2002: kharlan: Backport for QuickSurveys: Deploy Safety Survey with zero coverage (T376517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:03 kharlan@deploy2002: Started scap sync-world: Backport for QuickSurveys: Deploy Safety Survey with zero coverage (T376517)
  • 12:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 12:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rpki2002.codfw.wmnet
  • 12:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rpki2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
  • 12:41 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rpki2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
  • 12:38 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 12:33 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts rpki2002.codfw.wmnet
  • 12:24 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:24 jelto@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:23 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:23 jelto@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:18 moritzm: installing initramfs-tools bugfix updates from Bookworm point release
  • 12:16 jelto@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:15 jelto@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:15 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:15 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:54 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@b2c30ad]: T375153 (duration: 02m 32s)
  • 11:52 jynus: start systemctl start wmf_auto_restart_routinator.service on rpki2003
  • 11:52 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@b2c30ad]: T375153
  • 11:24 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P69513 and previous config saved to /var/cache/conftool/dbconfig/20241009-111154-ladsgroup.json
  • 11:04 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 11:00 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 11:00 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 10:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P69511 and previous config saved to /var/cache/conftool/dbconfig/20241009-105647-ladsgroup.json
  • 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1027.eqiad.wmnet
  • 10:44 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
  • 10:44 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 10:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P69507 and previous config saved to /var/cache/conftool/dbconfig/20241009-104142-ladsgroup.json
  • 10:35 elukey@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 10:28 elukey: roll restart swift-proxy on ms-fe* to pick up https://gerrit.wikimedia.org/r/c/operations/puppet/+/1078380
  • 10:27 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1027.eqiad.wmnet
  • 10:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P69506 and previous config saved to /var/cache/conftool/dbconfig/20241009-102636-ladsgroup.json
  • 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1026.eqiad.wmnet
  • 10:11 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
  • 09:42 Dreamy_Jazz: Started time limited MediaModertation scan on enwiki for 16hrs to catchup with monthly request limit - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 09:40 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1026.eqiad.wmnet
  • 08:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:53 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:51 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 08:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:48 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
  • 08:46 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:37 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:37 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:23 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host cloudcephmon1005.eqiad.wmnet
  • 08:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephmon1005.eqiad.wmnet
  • 08:12 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.26 refs T375657
  • 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1021.eqiad.wmnet
  • 08:02 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1021.eqiad.wmnet
  • 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
  • 07:48 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:47 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1011.eqiad.wmnet
  • 07:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:43 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:26 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:22 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:22 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:20 elukey@cumin2002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:13 moritzm: remove ganeti2010 from active nodes T376594
  • 06:37 eileen: civicrm upgraded from 251e958f to ae54bd5e
  • 06:08 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 06:06 jelto@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 03:36 eileen: civicrm upgraded from 61718eae to 251e958f
  • 01:26 eileen: tools upgraded from 3f7b238d to e9c05e30
  • 00:04 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1012.eqiad.wmnet with OS bookworm

2024-10-08

  • 22:36 tzatziki: removing 1 file for legal compliance
  • 22:32 tzatziki: removing 3 files for legal compliance
  • 22:16 tzatziki: removing 1 file for legal compliance
  • 22:11 tzatziki: removing 3 files for legal compliance
  • 21:59 tzatziki: removing 3 files for legal compliance
  • 21:41 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on gerrit2003.wikimedia.org with reason: initial gerrit deploy wip
  • 21:41 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on gerrit2003.wikimedia.org with reason: initial gerrit deploy wip
  • 21:35 bvibber: running requeueTranscodes in k8s maint to clean up ios video transcodes (T363966)
  • 21:34 mutante: gerrit2003 - sudo -u gerrit-deploy /usr/bin/scap deploy-local --repo gerrit/gerrit -D log_json:False (for some reason this fails in puppet but works manually) T372804 T257317 T317412
  • 21:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host backup1012.eqiad.wmnet with OS bookworm
  • 21:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1022.eqiad.wmnet with OS bullseye
  • 21:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 21:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 21:06 eileen: config revision changed from 9ba217d2 to c84a1354
  • 21:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1022.eqiad.wmnet with reason: host reimage
  • 20:59 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1022.eqiad.wmnet with reason: host reimage
  • 20:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1022.eqiad.wmnet with OS bullseye
  • 20:54 cjming: end of UTC late backport window
  • 20:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1022.eqiad.wmnet with OS bullseye
  • 20:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1022.eqiad.wmnet with OS bullseye
  • 20:54 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:52 cjming@deploy2002: Finished scap sync-world: Backport for Switch iOS back-compat video transcodes from HLS to regular QuickTime (T363966) (duration: 07m 39s)
  • 20:52 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 20:48 cjming@deploy2002: bvibber, cjming: Continuing with sync
  • 20:47 cjming@deploy2002: bvibber, cjming: Backport for Switch iOS back-compat video transcodes from HLS to regular QuickTime (T363966) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:45 cjming@deploy2002: Started scap sync-world: Backport for Switch iOS back-compat video transcodes from HLS to regular QuickTime (T363966)
  • 20:42 cjming@deploy2002: Finished scap sync-world: Backport for Dark mode: Make LiquidThreads namespace exclusion explicit (duration: 09m 58s)
  • 20:37 cjming@deploy2002: jdlrobson, cjming: Continuing with sync
  • 20:34 cjming@deploy2002: jdlrobson, cjming: Backport for Dark mode: Make LiquidThreads namespace exclusion explicit synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:32 cjming@deploy2002: Started scap sync-world: Backport for Dark mode: Make LiquidThreads namespace exclusion explicit
  • 20:29 cjming@deploy2002: Finished scap sync-world: Backport for Expand Vector 2022 roll out and support local variants (T375549) (duration: 19m 28s)
  • 20:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
  • 20:29 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
  • 20:26 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
  • 20:26 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on gerrit2003.wikimedia.org with reason: applying gerrit profile
  • 20:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:24 cjming@deploy2002: jdlrobson, cjming: Continuing with sync
  • 20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host backup1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:12 cjming@deploy2002: jdlrobson, cjming: Backport for Expand Vector 2022 roll out and support local variants (T375549) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:11 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1012
  • 20:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host backup1012
  • 20:10 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:10 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1012 - jclark@cumin1002"
  • 20:10 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1012 - jclark@cumin1002"
  • 20:10 cjming@deploy2002: Started scap sync-world: Backport for Expand Vector 2022 roll out and support local variants (T375549)
  • 20:04 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 19:54 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 18:59 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:54 swfrench-wmf: ran authdns-update on dns1004 to pick up mwdebug-next record - T372604
  • 18:50 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=mwdebug-next,name=codfw [reason: pooling mwdebug-next in codfw to match mwdebug - T372604]
  • 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for pfw1 lo0 - pt1979@cumin2002"
  • 18:43 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for pfw1 lo0 - pt1979@cumin2002"
  • 18:43 cdanis: 💔cdanis@cumin1002.eqiad.wmnet ~ 🕝☕ sudo cumin -b1 -s120 A:dnsbox 'run-puppet-agent --enable "cdanis rolling out T344171 Ie7d5091bca40"'
  • 18:41 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:41 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:40 cdanis: 💙cdanis@cumin1002.eqiad.wmnet ~ 🕝☕ sudo cumin A:dnsbox 'disable-puppet "cdanis rolling out T344171 Ie7d5091bca40"'
  • 18:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:39 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:39 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:38 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:34 elukey@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 17:45 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw (T372604)
  • 17:39 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw (T372604)
  • 17:35 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T372604)
  • 17:35 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T372604)
  • 17:34 swfrench-wmf: ran and enabled puppet-agent on 'A:lvs and A:codfw' - T372604
  • 17:27 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T372604)
  • 17:21 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T372604)
  • 17:17 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T372604)
  • 17:12 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T372604)
  • 17:09 swfrench-wmf: ran and enabled puppet-agent on 'A:lvs and A:eqiad' - T372604
  • 17:04 swfrench-wmf: ran disable-puppet on 'A:lvs and (A:eqiad or A:codfw)' - T372604
  • 16:57 moritzm: enable Puppet fleet-wide for puppetmaster1001 hardware maintenance
  • 16:49 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Define wgGlobalBlockingEnableAutoblocks as false (T374853), Remove wgGlobalBlockingAllowGlobalAccountBlocks as unused (duration: 06m 50s)
  • 16:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
  • 16:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 16:48 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 16:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudlb2004-dev.codfw.wmnet
  • 16:44 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
  • 16:44 dreamyjazz@deploy2002: dreamyjazz: Backport for Define wgGlobalBlockingEnableAutoblocks as false (T374853), Remove wgGlobalBlockingAllowGlobalAccountBlocks as unused synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1001.eqiad.wmnet with reason: RAM expansion
  • 16:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1001.eqiad.wmnet with reason: RAM expansion
  • 16:42 dreamyjazz@deploy2002: Started scap sync-world: Backport for Define wgGlobalBlockingEnableAutoblocks as false (T374853), Remove wgGlobalBlockingAllowGlobalAccountBlocks as unused
  • 16:40 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudlb2004-dev.codfw.wmnet
  • 16:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudlb2004-dev.codfw.wmnet
  • 16:37 moritzm: disable Puppet fleet-wide for puppetmaster1001 hardware maintenance
  • 16:28 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudlb2004-dev.codfw.wmnet
  • 16:26 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad
  • 16:25 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad
  • 16:24 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad
  • 16:23 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad
  • 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 16:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 16:08 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb2004-dev
  • 16:08 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
  • 16:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 16:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 16:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 16:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 16:06 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb2004-dev
  • 16:06 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
  • 16:06 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
  • 16:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
  • 16:02 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:41 papaul: mr1-magru end of maintenance
  • 15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f7-eqiad
  • 15:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f7-eqiad
  • 15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e7-eqiad
  • 15:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e7-eqiad
  • 15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e6-eqiad
  • 15:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e6-eqiad
  • 15:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f6-eqiad
  • 15:33 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f6-eqiad
  • 15:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f5-eqiad
  • 15:33 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f5-eqiad
  • 15:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 15:32 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e5-eqiad
  • 15:32 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e5-eqiad
  • 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
  • 15:26 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
  • 15:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2004-dev']
  • 15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
  • 15:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2004-dev']
  • 15:19 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
  • 15:05 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: deploy phab1004 for T376720 (duration: 01m 07s)
  • 15:04 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: deploy phab1004 for T376720
  • 15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@40a63c9]: test deploy phab2002 for T376720 (duration: 00m 26s)
  • 15:03 brennen@deploy2002: Started deploy [phabricator/deployment@40a63c9]: test deploy phab2002 for T376720
  • 15:02 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: version upgrade
  • 15:02 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: version upgrade
  • 15:02 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phabricator.wikimedia.org with reason: version upgrade
  • 15:02 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phabricator.wikimedia.org with reason: version upgrade
  • 15:02 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: version upgrade
  • 15:02 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: version upgrade
  • 15:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: version upgrade
  • 15:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: version upgrade
  • 14:58 papaul: mr1-magru ongoing maintenance
  • 14:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
  • 14:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudlb2004-dev to codfw - jhancock@cumin2002"
  • 14:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:47 sergi0: deployment-prep: `sgimeno@deployment-mwmaint03:~$ foreachwiki userOptions.php --delete --old=1 growthexperiments-tour-newimpact-discovery` (T376461)
  • 14:41 moritzm: installing python-aiosmtpd security updates
  • 14:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1010.eqiad.wmnet
  • 14:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
  • 14:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 14:23 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1010.eqiad.wmnet
  • 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
  • 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
  • 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-misc2001
  • 14:22 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-misc2001
  • 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
  • 14:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2004-dev']
  • 14:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
  • 14:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
  • 14:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
  • 14:15 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb2004-dev
  • 14:15 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb2004-dev
  • 14:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-misc2001
  • 14:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-misc2001
  • 14:10 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1009.eqiad.wmnet
  • 14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
  • 14:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
  • 14:05 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:59 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:53 zabe@deploy2002: Finished scap sync-world: Backport for Stop setting wgAbuseFilterActorTableSchemaMigrationStage (T188180) (duration: 07m 03s)
  • 13:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:49 zabe@deploy2002: zabe: Continuing with sync
  • 13:48 zabe@deploy2002: zabe: Backport for Stop setting wgAbuseFilterActorTableSchemaMigrationStage (T188180) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:46 zabe@deploy2002: Started scap sync-world: Backport for Stop setting wgAbuseFilterActorTableSchemaMigrationStage (T188180)
  • 13:46 zabe@deploy2002: Finished scap sync-world: Backport for s5: Reduce revision-slots cache expiry to 60 seconds (T183490) (duration: 07m 10s)
  • 13:41 zabe@deploy2002: zabe: Continuing with sync
  • 13:41 zabe@deploy2002: zabe: Backport for s5: Reduce revision-slots cache expiry to 60 seconds (T183490) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:39 zabe@deploy2002: Started scap sync-world: Backport for s5: Reduce revision-slots cache expiry to 60 seconds (T183490)
  • 13:33 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Remove $wgCodeMirrorRTL temporary feature flag (T170001 T357795) (duration: 06m 56s)
  • 13:27 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, musikanimal: Continuing with sync
  • 13:27 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, musikanimal: Backport for Remove $wgCodeMirrorRTL temporary feature flag (T170001 T357795) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:24 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Remove $wgCodeMirrorRTL temporary feature flag (T170001 T357795)
  • 13:24 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
  • 13:24 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
  • 13:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:15 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:11 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for hawiki: Add temporary tagline for Vector-2022 (T376049) (duration: 08m 17s)
  • 13:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:07 lucaswerkmeister-wmde@deploy2002: ammarpad, lucaswerkmeister-wmde: Continuing with sync
  • 13:06 lucaswerkmeister-wmde@deploy2002: ammarpad, lucaswerkmeister-wmde: Backport for hawiki: Add temporary tagline for Vector-2022 (T376049) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for hawiki: Add temporary tagline for Vector-2022 (T376049)
  • 12:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:57 Amir1: dropping povwatch_log on all.dblist (T54924 and T376627)
  • 12:55 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ganeti2036.codfw.wmnet
  • 12:53 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:53 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:49 ladsgroup@deploy2002: Finished scap sync-world: Backport for Remove flow from techconductwiki (T332022) (duration: 09m 27s)
  • 12:47 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:45 moritzm: installing lua5.4 bugfix updates
  • 12:44 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 12:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:42 ladsgroup@deploy2002: ladsgroup: Backport for Remove flow from techconductwiki (T332022) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:39 ladsgroup@deploy2002: Started scap sync-world: Backport for Remove flow from techconductwiki (T332022)
  • 12:39 elukey@cumin1002: START - Cookbook sre.hosts.provision for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:32 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
  • 12:29 elukey@cumin1002: START - Cookbook sre.hosts.provision for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2036.codfw.wmnet
  • 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2036.codfw.wmnet
  • 12:26 moritzm: remove ganeti2009 from active nodes T376594
  • 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1008.eqiad.wmnet
  • 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
  • 12:19 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bookworm
  • 12:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1008.eqiad.wmnet
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1007.eqiad.wmnet
  • 12:01 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
  • 11:56 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
  • 11:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1007.eqiad.wmnet
  • 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1006.eqiad.wmnet
  • 11:35 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bookworm
  • 11:33 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
  • 11:30 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
  • 11:30 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2002.codfw.wmnet
  • 11:30 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2002.codfw.wmnet
  • 11:29 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1006.eqiad.wmnet
  • 11:28 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bookworm
  • 11:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:13 elukey@cumin1002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:09 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 11:06 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 10:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
  • 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
  • 10:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2009.codfw.wmnet
  • 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
  • 10:49 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
  • 10:49 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
  • 10:45 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bookworm
  • 10:36 jayme: updated kubernetes 1.23.14-3 -> 1.23.14-4 on P:kubernetes::node - T362408
  • 10:27 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:26 jayme: re-enable puppet on all P:kubernetes::node
  • 10:26 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
  • 10:09 jayme: disabled puppet on all P:kubernetes::node
  • 10:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 10:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 09:52 moritzm: installing freetype bugfix updates from Bookworm point update
  • 09:48 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 09:48 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:47 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host cloudcephosd1005.eqiad.wmnet
  • 09:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:25 jayme: imported kubernetes 1.23.14-4 to component/kubernetes123 (buster, bullseye, bookworm) - T362408
  • 09:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:20 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:17 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host cloudcephosd1005.eqiad.wmnet
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2036.codfw.wmnet to cluster codfw and group C
  • 09:12 Dreamy_Jazz: Maintenance script for T376340 finished
  • 09:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2036.codfw.wmnet to cluster codfw and group C
  • 09:11 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:10 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:06 Dreamy_Jazz: Ran `mwscript-k8s --comment="T376340" -- extensions/GlobalBlocking/maintenance/UpdateAutoBlockParentIdColumn.php --wiki=aawikibooks`
  • 09:01 stran@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
  • 08:55 stran@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 08:55 stran@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 08:54 stran@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 08:53 stran@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 08:53 stran@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
  • 08:20 dcausse: repooling wdqs1013
  • 08:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
  • 08:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: maintenance
  • 08:19 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.26 refs T375657
  • 08:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: T374215', diff saved to https://phabricator.wikimedia.org/P69498 and previous config saved to /var/cache/conftool/dbconfig/20241008-081620-arnaudb.json
  • 08:01 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: T374215', diff saved to https://phabricator.wikimedia.org/P69497 and previous config saved to /var/cache/conftool/dbconfig/20241008-080115-arnaudb.json
  • 07:46 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: T374215', diff saved to https://phabricator.wikimedia.org/P69496 and previous config saved to /var/cache/conftool/dbconfig/20241008-074609-arnaudb.json
  • 07:44 vgutierrez: uploaded golang-github-jvgutierrez-go-etcd-harness 1.0.0 to apt.wm.o (bookworm-wikimedia) - T376600
  • 07:31 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: T374215', diff saved to https://phabricator.wikimedia.org/P69495 and previous config saved to /var/cache/conftool/dbconfig/20241008-073104-arnaudb.json
  • 07:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 15%: T374215', diff saved to https://phabricator.wikimedia.org/P69494 and previous config saved to /var/cache/conftool/dbconfig/20241008-071559-arnaudb.json
  • 07:10 dcausse: depooling wdqs1013 (lag)
  • 07:00 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: T374215', diff saved to https://phabricator.wikimedia.org/P69493 and previous config saved to /var/cache/conftool/dbconfig/20241008-070053-arnaudb.json
  • 06:45 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: T374215', diff saved to https://phabricator.wikimedia.org/P69492 and previous config saved to /var/cache/conftool/dbconfig/20241008-064548-arnaudb.json
  • 04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.23 (duration: 00m 58s)
  • 03:50 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.26 refs T375657 (duration: 47m 44s)
  • 03:16 eileen: civicrm upgraded from 8b13ef22 to 61718eae
  • 03:15 eileen: config revision changed from 6e649356 to 9ba217d2
  • 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.26 refs T375657
  • 00:55 eileen: config revision changed from 856e4d99 to 6e649356
  • 00:30 eileen: config revision changed from 856e4d99 to 4ab498d2 - disable process control to load triggers

2024-10-07

  • 22:33 eileen: civicrm upgraded from f2095695 to 8b13ef22
  • 22:09 eileen: config revision changed from a2ba4a8d to 856e4d99
  • 21:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 20:20 urbanecm@deploy2002: Finished scap sync-world: Backport for disable the Add A Fact QuickSurvey on enwiki, Enable EditCheck on ru.wiki (T373022) (duration: 07m 41s)
  • 20:16 urbanecm@deploy2002: esanders, derenrich, urbanecm: Continuing with sync
  • 20:14 urbanecm@deploy2002: esanders, derenrich, urbanecm: Backport for disable the Add A Fact QuickSurvey on enwiki, Enable EditCheck on ru.wiki (T373022) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:12 urbanecm@deploy2002: Started scap sync-world: Backport for disable the Add A Fact QuickSurvey on enwiki, Enable EditCheck on ru.wiki (T373022)
  • 20:12 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
  • 19:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
  • 19:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
  • 18:22 swfrench-wmf: running `git restore helmfile.d/services/thumbor/values.yaml` on deploy1003 to unblock git-pull timer
  • 18:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 18:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
  • 18:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-misc2001 to codfw - jhancock@cumin2002"
  • 18:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 17:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 17:29 swfrench@deploy2002: Finished scap sync-world: Testing scap after mw-debug next bring-up - T372604 (duration: 02m 45s)
  • 17:26 swfrench@deploy2002: Started scap sync-world: Testing scap after mw-debug next bring-up - T372604
  • 17:12 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:12 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:06 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:06 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:26 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 16:24 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 16:16 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bookworm
  • 16:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 16:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 15:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 15:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1003.eqiad.wmnet with reason: RAM expansion
  • 15:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1003.eqiad.wmnet with reason: RAM expansion
  • 15:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetserver1002.eqiad.wmnet with reason: RAM expansion
  • 15:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetserver1002.eqiad.wmnet with reason: RAM expansion
  • 15:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts puppetmaster1001.eqiad.wmnet
  • 15:13 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster1001.eqiad.wmnet
  • 15:00 papaul: ongoing maintenance on mr1-esams
  • 14:43 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 14:40 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 14:18 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bookworm
  • 14:16 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker2092.codfw.wmnet with reason: Degraded RAID
  • 14:16 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wikikube-worker2092.codfw.wmnet with reason: Degraded RAID
  • 13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T367856)', diff saved to https://phabricator.wikimedia.org/P69489 and previous config saved to /var/cache/conftool/dbconfig/20241007-134950-ladsgroup.json
  • 13:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
  • 13:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69488 and previous config saved to /var/cache/conftool/dbconfig/20241007-134929-ladsgroup.json
  • 13:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
  • 13:37 vgutierrez: switching to digicert-2024 certificates on esams, eqsin, drmrs and magru
  • 13:36 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:35 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052) (duration: 06m 49s)
  • 13:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P69487 and previous config saved to /var/cache/conftool/dbconfig/20241007-133422-ladsgroup.json
  • 13:31 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
  • 13:30 dreamyjazz@deploy2002: dreamyjazz: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:28 dreamyjazz@deploy2002: Started scap sync-world: Backport for Update globalblocks 'gb_address' index to allow autoblocks (T376052)
  • 13:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P69486 and previous config saved to /var/cache/conftool/dbconfig/20241007-131915-ladsgroup.json
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2035.codfw.wmnet to cluster codfw and group C
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2035.codfw.wmnet to cluster codfw and group C
  • 13:10 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for scandium is being replaced by parsoidtest1001 (T363402) (duration: 07m 14s)
  • 13:05 lucaswerkmeister-wmde@deploy2002: arlolra, lucaswerkmeister-wmde: Continuing with sync
  • 13:05 lucaswerkmeister-wmde@deploy2002: arlolra, lucaswerkmeister-wmde: Backport for scandium is being replaced by parsoidtest1001 (T363402) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69485 and previous config saved to /var/cache/conftool/dbconfig/20241007-130409-ladsgroup.json
  • 13:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for scandium is being replaced by parsoidtest1001 (T363402)
  • 13:02 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2035.codfw.wmnet to cluster codfw and group C
  • 13:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2035.codfw.wmnet to cluster codfw and group C
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
  • 12:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
  • 12:53 Lucas_WMDE: printf 'https://en.wikipedia.org/static/images/%s\n' 'mobile/copyright/wikimaniawiki-wordmark.svg' 'project-logos/wikimaniawiki-1.5x.png' 'project-logos/wikimaniawiki-2x.png' 'project-logos/wikimaniawiki.png' 'icons/wikimaniawiki.svg' | mwscript-k8s --attach -- purgeList enwiki # T376292
  • 12:03 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 12:02 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 11:29 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:29 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:25 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:25 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:16 vgutierrez: uploaded golang-github-mtchavez-jenkins 1.0.0 to apt.wm.o (bookworm-wikimedia) - T376600
  • 11:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: T374215', diff saved to https://phabricator.wikimedia.org/P69484 and previous config saved to /var/cache/conftool/dbconfig/20241007-110430-arnaudb.json
  • 10:52 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2002.codfw.wmnet
  • 10:50 Dreamy_Jazz: Started 2 day scan on enwiki for MediaModeration to catchup with monthly request limit - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 10:49 Dreamy_Jazz: Started MediaModeration scanning script after it crashed for commonswiki - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 10:49 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2002.codfw.wmnet
  • 10:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: T374215', diff saved to https://phabricator.wikimedia.org/P69483 and previous config saved to /var/cache/conftool/dbconfig/20241007-104925-arnaudb.json
  • 10:47 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
  • 10:47 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
  • 10:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: T374215', diff saved to https://phabricator.wikimedia.org/P69482 and previous config saved to /var/cache/conftool/dbconfig/20241007-103420-arnaudb.json
  • 10:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: T374215', diff saved to https://phabricator.wikimedia.org/P69481 and previous config saved to /var/cache/conftool/dbconfig/20241007-101914-arnaudb.json
  • 10:17 vgutierrez: uploaded golang-github-cloudflare-ipvs 0.10.2 to apt.wm.o (bookworm-wikimedia) - T376600
  • 10:13 moritzm: installing Linux 6.1.112 on Bookworm systems
  • 10:11 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 10:10 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: T374215', diff saved to https://phabricator.wikimedia.org/P69480 and previous config saved to /var/cache/conftool/dbconfig/20241007-100410-arnaudb.json
  • 10:00 vgutierrez: uploaded golang-github-flyingmutant-rapid 1.1.0 to apt.wm.o (bookworm-wikimedia) - T376600
  • 09:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: T374215', diff saved to https://phabricator.wikimedia.org/P69478 and previous config saved to /var/cache/conftool/dbconfig/20241007-094904-arnaudb.json
  • 09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 2%: T374215', diff saved to https://phabricator.wikimedia.org/P69477 and previous config saved to /var/cache/conftool/dbconfig/20241007-093359-arnaudb.json
  • 09:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: maintenance
  • 09:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: maintenance
  • 09:27 arnaudb@cumin1002: dbctl commit (dc=all): 'missing commit', diff saved to https://phabricator.wikimedia.org/P69476 and previous config saved to /var/cache/conftool/dbconfig/20241007-092714-arnaudb.json
  • 09:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: T374215', diff saved to https://phabricator.wikimedia.org/P69474 and previous config saved to /var/cache/conftool/dbconfig/20241007-091953-arnaudb.json
  • 09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 1%: T374215', diff saved to https://phabricator.wikimedia.org/P69473 and previous config saved to /var/cache/conftool/dbconfig/20241007-091854-arnaudb.json
  • 09:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
  • 08:37 aqu@deploy2002: Finished deploy [airflow-dags/analytics@1699d34]: Refine staging fixes [airflow-dags@1699d34f] (duration: 04m 43s)
  • 08:32 aqu@deploy2002: Started deploy [airflow-dags/analytics@1699d34]: Refine staging fixes [airflow-dags@1699d34f]
  • 08:24 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 13s)
  • 08:24 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
  • 08:02 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503] (duration: 00m 18s)
  • 08:02 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 08:02 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@4b69f50]: Stage Refine fixes on test cluster [airflow-dags@4b69f503]
  • 08:02 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 08:01 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 08:01 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 08:00 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 07:57 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 07:56 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
  • 07:56 arnaudb@cumin1002: dbctl commit (dc=all): 'T374215 db1233 depool as clone source for db1246', diff saved to https://phabricator.wikimedia.org/P69471 and previous config saved to /var/cache/conftool/dbconfig/20241007-075611-arnaudb.json
  • 07:56 hashar: UTC morning backport window completed
  • 07:54 hashar@deploy2002: Finished scap sync-world: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049) (duration: 11m 19s)
  • 07:49 hashar@deploy2002: ammarpad, hashar: Continuing with sync
  • 07:45 hashar@deploy2002: ammarpad, hashar: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:43 hashar@deploy2002: Started scap sync-world: Backport for logos: Sync config.yaml and logos.php (T374430), hawiki: Add temporary logo (T376049)
  • 07:42 hashar@deploy2002: Finished scap sync-world: Backport for Revert "wikimaniawiki: Update logos to 2024" (duration: 21m 40s)
  • 07:04 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 07:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64315
  • 07:04 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 64315
  • 07:04 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply

2024-10-06

2024-10-05

  • 19:43 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 16:45 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:41 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:40 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:37 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:36 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:36 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T367856)', diff saved to https://phabricator.wikimedia.org/P69470 and previous config saved to /var/cache/conftool/dbconfig/20241005-133058-ladsgroup.json
  • 13:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69469 and previous config saved to /var/cache/conftool/dbconfig/20241005-133036-ladsgroup.json
  • 13:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P69468 and previous config saved to /var/cache/conftool/dbconfig/20241005-131529-ladsgroup.json
  • 13:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P69467 and previous config saved to /var/cache/conftool/dbconfig/20241005-130022-ladsgroup.json
  • 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69466 and previous config saved to /var/cache/conftool/dbconfig/20241005-124515-ladsgroup.json

2024-10-04

  • 17:48 ejegg: fundraising civicrm upgraded from 90199f62 to 45855ff4
  • 16:21 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet
  • 16:00 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 14:29 mforns@deploy2002: Finished deploy [airflow-dags/analytics@4b69f50]: add category to commons impact metrics allowlist (duration: 01m 48s)
  • 14:28 mforns@deploy2002: Started deploy [airflow-dags/analytics@4b69f50]: add category to commons impact metrics allowlist
  • 13:54 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 13:33 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.categories-reload (exit_code=97) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 13:32 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 13:19 ayounsi@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
  • 12:00 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided) (duration: 01m 13s)
  • 11:59 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided)
  • 11:47 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided) (duration: 00m 47s)
  • 11:46 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@9096f1b] (releasing): (no justification provided)
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2004.wikimedia.org
  • 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2004.wikimedia.org
  • 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1004.wikimedia.org
  • 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1004.wikimedia.org
  • 10:07 moritzm: upload ircstream 0.13.0+sse12u1 to apt.wikimedia.org bookworm/ircstream-sse component (seperate build using the experimental eventstream feature branch of ircstream) T376014
  • 09:43 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database shnwikinews (T375432)
  • 09:35 moritzm: upload ircstream 0.13.0+wmf12u1 to apt.wikimedia.org T376014
  • 09:18 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database shnwikinews (T375432)
  • 09:17 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database kgewiki (T374814)
  • 09:17 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database kgewiki (T374814)
  • 09:17 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database gorwikiquote (T375094)
  • 09:16 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database gorwikiquote (T375094)
  • 09:16 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database madwiktionary (T375023)
  • 09:16 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database madwiktionary (T375023)
  • 09:15 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database moswiki (T375568)
  • 09:15 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database moswiki (T375568)
  • 09:09 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 08:58 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 07:51 oblivian@puppetserver1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=kubernetes,name=mw1439.eqiad.wmnet
  • 07:51 oblivian@puppetserver1001: conftool action : set/weight=1; selector: dc=eqiad,cluster=kubernetes,name=mw1439.eqiad.wmnet
  • 07:30 hashar: upgrading Jenkins on CI Jenkins
  • 07:04 moritzm: import jenkins 2.462.3 to thirdparty/ci T376449
  • 01:45 ejegg: payments-wiki upgraded from e88750e6 to ed2d78b3

2024-10-03

  • 22:37 brennen@deploy2002: Finished scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433) (duration: 07m 04s)
  • 22:33 brennen@deploy2002: brennen: Continuing with sync
  • 22:32 brennen@deploy2002: brennen: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:30 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
  • 22:18 brennen@deploy2002: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.43.0-wmf.25 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/restricted/m
  • 22:18 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
  • 22:15 brennen@deploy2002: scap failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.43.0-wmf.25 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.wmnet/restricted/m
  • 22:15 brennen@deploy2002: Started scap sync-world: Backport for Revert "Turn on Parsoid Selective Update metrics" (T376433)
  • 21:39 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 21:39 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 21:28 brennen: end of UTC late backport & config window
  • 21:28 brennen@deploy2002: Finished scap sync-world: Backport for Turn on Parsoid Selective Update metrics (T371713) (duration: 15m 30s)
  • 21:23 brennen@deploy2002: cscott, brennen: Continuing with sync
  • 21:15 brennen@deploy2002: cscott, brennen: Backport for Turn on Parsoid Selective Update metrics (T371713) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:13 brennen@deploy2002: Started scap sync-world: Backport for Turn on Parsoid Selective Update metrics (T371713)
  • 21:11 brennen@deploy2002: Finished scap sync-world: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2) (duration: 10m 09s)
  • 21:06 brennen@deploy2002: cscott, brennen: Continuing with sync
  • 21:02 brennen@deploy2002: cscott, brennen: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:00 brennen@deploy2002: Started scap sync-world: Backport for RefreshLinksJob: Fix exception due to null/false confusion (take 2)
  • 20:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aqs1022.eqiad.wmnet with OS bullseye
  • 20:44 brennen@deploy2002: Finished scap sync-world: Backport for Update jquery.ime from upstream (duration: 09m 25s)
  • 20:39 brennen@deploy2002: brennen, amire80: Continuing with sync
  • 20:37 brennen@deploy2002: brennen, amire80: Backport for Update jquery.ime from upstream synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:34 brennen@deploy2002: Started scap sync-world: Backport for Update jquery.ime from upstream
  • 20:02 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 20:02 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 19:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 19:51 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:50 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:49 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host aqs1022.eqiad.wmnet with OS bullseye
  • 19:36 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.categories-reload (exit_code=99) reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:35 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs-categories1001.eqiad.wmnet
  • 19:28 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@a3efe93] (wcqs): Deploy 0.3.148 to WCQS (duration: 03m 02s)
  • 19:25 ryankemper@deploy2002: Started deploy [wdqs/wdqs@a3efe93] (wcqs): Deploy 0.3.148 to WCQS
  • 19:25 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 19:25 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 19:22 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@a3efe93]: 0.3.148 (duration: 08m 42s)
  • 19:18 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 19:18 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 19:16 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 19:14 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.148` on canary `wdqs1016`; proceeding to rest of fleet
  • 19:14 ryankemper@deploy2002: Started deploy [wdqs/wdqs@a3efe93]: 0.3.148
  • 19:13 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.148`. Pre-deploy tests passing on canary `wdqs1016`
  • 19:09 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 19:09 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 19:05 dduvall@deploy2002: Installing scap version "4.109.0" for 210 hosts
  • 18:51 cmooney@cumin1002: conftool action : set/pooled=yes; selector: name=dns1005.wikimedia.org [reason: testing T344171]
  • 18:43 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 18:43 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich-next: apply
  • 18:31 cstone: SmashPig upgraded from df2a9c42 to eaa176f7
  • 18:28 sukhe: depool dns1005 for all services for testing T344171
  • 18:00 mutante: codesearch - ran out of disk due to 11G /var/log/account/pacct file - manually ran /etc/cron.daily/acct to rotate it, then deleted old file, back to 39% disk usage
  • 17:41 mutante: codesearch was broken - VM was down - rebooted - restarting all the indices is a bit slow but mostly back up now
  • 17:13 swfrench@deploy2002: Finished scap sync-world: Testing after mediawiki-deployments.yaml format change - T370934 (duration: 02m 50s)
  • 17:11 swfrench@deploy2002: Started scap sync-world: Testing after mediawiki-deployments.yaml format change - T370934
  • 15:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:53 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 59.75.192.10.in-addr.arpa on all recursors
  • 15:53 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 59.75.192.10.in-addr.arpa on all recursors
  • 15:53 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:52 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:52 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new flag; this should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:51 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:51 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:50 topranks: merging patch to add k8s pod IP range reverse delegations to dns T376291
  • 15:47 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:47 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:46 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:45 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:45 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:45 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, testing new behavior; this should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2023.codfw.wmnet, repooling both afterwards
  • 15:36 papaul: Junos upgrade on mr1-codfw complete
  • 15:00 papaul: ongoing Junos upgrade on mr1-codfw
  • 14:56 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@b715af7]: Deploy latest DAGs to the analytics Airflow instance. T373694. T375402 (duration: 03m 33s)
  • 14:52 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@b715af7]: Deploy latest DAGs to the analytics Airflow instance. T373694. T375402
  • 14:31 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aqs1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host aqs1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:30 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aqs1022
  • 14:29 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aqs1022
  • 14:29 jclark@cumin1002: END (ERROR) - Cookbook sre.network.configure-switch-interfaces (exit_code=97) for host aqs1022
  • 14:28 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host aqs1022
  • 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt aqs1022 - jclark@cumin1002"
  • 14:26 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt aqs1022 - jclark@cumin1002"
  • 14:23 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:46 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2004.wikimedia.org
  • 13:42 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host irc2004.wikimedia.org
  • 13:40 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2004.wikimedia.org
  • 13:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc2004.wikimedia.org with OS bookworm
  • 13:32 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 13:31 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
  • 13:30 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 13:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2004.wikimedia.org with reason: host reimage
  • 13:23 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2004.wikimedia.org with reason: host reimage
  • 13:10 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc2004.wikimedia.org with OS bookworm
  • 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2004.wikimedia.org - elukey@cumin1002"
  • 13:09 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2004.wikimedia.org - elukey@cumin1002"
  • 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2004.wikimedia.org on all recursors
  • 13:09 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc2004.wikimedia.org on all recursors
  • 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:09 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2004.wikimedia.org - elukey@cumin1002"
  • 13:08 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2004.wikimedia.org - elukey@cumin1002"
  • 13:00 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 13:00 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc2004.wikimedia.org
  • 12:20 urbanecm@deploy2002: Finished scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124) (duration: 06m 47s)
  • 12:14 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124)
  • 12:13 urbanecm@deploy2002: scap failed: <UnboundLocalError> local variable 'e' referenced before assignment (scap version: 4.108.0-1) (duration: 08m 02s)
  • 12:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:05 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMenteesJob: Do not schedule follow-up jobs when first job fails (T376124)
  • 12:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T367856)', diff saved to https://phabricator.wikimedia.org/P69458 and previous config saved to /var/cache/conftool/dbconfig/20241003-111544-ladsgroup.json
  • 11:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69457 and previous config saved to /var/cache/conftool/dbconfig/20241003-111522-ladsgroup.json
  • 11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P69456 and previous config saved to /var/cache/conftool/dbconfig/20241003-110015-ladsgroup.json
  • 10:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P69454 and previous config saved to /var/cache/conftool/dbconfig/20241003-104508-ladsgroup.json
  • 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69453 and previous config saved to /var/cache/conftool/dbconfig/20241003-103001-ladsgroup.json
  • 10:29 urbanecm@deploy2002: Finished scap sync-world: Backport for Backport ReassignMenteesJob-related changes (T376124) (duration: 06m 54s)
  • 10:29 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:22 urbanecm@deploy2002: Started scap sync-world: Backport for Backport ReassignMenteesJob-related changes (T376124)
  • 10:11 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:08 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:06 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:06 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM irc1004.wikimedia.org
  • 10:00 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@b715af7]: T375153 (duration: 02m 44s)
  • 10:00 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM irc1004.wikimedia.org
  • 09:58 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@b715af7]: T375153
  • 09:42 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:41 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:38 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:38 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:35 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:35 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 08:36 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.25 refs T375656
  • 08:25 hashar@deploy2002: Finished scap sync-world: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323) (duration: 07m 07s)
  • 08:20 hashar@deploy2002: hashar, cscott: Continuing with sync
  • 08:20 hashar@deploy2002: hashar, cscott: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:18 hashar@deploy2002: Started scap sync-world: Backport for Deprecate ParserOutput::setLanguageLinks(null) (T376323)
  • 08:14 hashar@deploy2002: Finished scap sync-world: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898) (duration: 08m 37s)
  • 08:09 hashar@deploy2002: hashar, hamishz: Continuing with sync
  • 08:07 hashar@deploy2002: hashar, hamishz: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:05 hashar@deploy2002: Started scap sync-world: Backport for bjnwiki: Update logo (T375055), bjnwiktionary: Add logo (T374898)
  • 08:03 hashar: Ran `mwscript resetAuthenticationThrottle.php --signup --ip 14.139.82.6` for `metawiki`, `mediawikiwiki` and `wikidatawiki` # T375794
  • 07:59 hashar@deploy2002: Finished scap sync-world: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794) (duration: 08m 41s)
  • 07:54 hashar@deploy2002: anzx, hamishz, hashar: Continuing with sync
  • 07:53 hashar@deploy2002: anzx, hamishz, hashar: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:50 hashar@deploy2002: Started scap sync-world: Backport for throttle.php: Remove expired throttle, IP limit exemption for WTS 2024 (T375794)
  • 07:17 kartik@deploy2002: Finished scap sync-world: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644) (duration: 10m 39s)
  • 07:12 kartik@deploy2002: kartik: Continuing with sync
  • 07:08 kartik@deploy2002: kartik: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:06 kartik@deploy2002: Started scap sync-world: Backport for Section Translation: Add mos, kde and rsk Wikipedias (T375017 T374815 T374644)
  • 06:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 06:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply

2024-10-02

  • 23:47 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert "logging: Enable logging for debug GrowthExperiments events" (T376124) (duration: 07m 07s)
  • 23:39 urbanecm@deploy2002: Started scap sync-world: Backport for Revert "logging: Enable logging for debug GrowthExperiments events" (T376124)
  • 22:35 urbanecm@deploy2002: Finished scap sync-world: Backport for logging: Enable logging for debug GrowthExperiments events (T376124) (duration: 06m 52s)
  • 22:28 urbanecm@deploy2002: Started scap sync-world: Backport for logging: Enable logging for debug GrowthExperiments events (T376124)
  • 21:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs-categories1001.eqiad.wmnet with reason: T375687
  • 21:54 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs-categories1001.eqiad.wmnet with reason: T375687
  • 21:24 mutante: phab1004 - link=$(/usr/bin/readlink -f /srv/phab) ; /usr/bin/git config -f /etc/gitconfig.d/10-phab-deploy-safedir.gitconfig --add safe.directory $link ; /bin/cat /etc/gitconfig.d/*.gitconfig > /etc/gitconfig - T360756
  • 20:57 eileen: civicrm upgraded from 28fd5e3b to 90199f62
  • 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc1001.eqiad.wmnet with OS bookworm
  • 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:00 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-misc1002.eqiad.wmnet with OS bookworm
  • 19:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc1001.eqiad.wmnet with reason: host reimage
  • 19:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-misc1002.eqiad.wmnet with reason: host reimage
  • 19:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc1001.eqiad.wmnet with reason: host reimage
  • 19:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-misc1002.eqiad.wmnet with reason: host reimage
  • 19:27 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc1002.eqiad.wmnet with OS bookworm
  • 19:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-misc1001.eqiad.wmnet with OS bookworm
  • 19:23 cstone: SmashPig upgraded from 715e91fa to df2a9c42
  • 19:21 brett: cumin -b11 "A:cp" "run-puppet-agent --enable 'rolling out 1038884'"
  • 19:16 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 19:15 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 19:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
  • 19:06 brett@cumin2002: conftool action : set/pooled=no; selector: name=cp4041.ulsfo.wmnet
  • 18:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2004-dev']
  • 18:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2004-dev']
  • 18:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 18:21 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.9.1 - T376256 (duration: 00m 12s)
  • 18:21 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 24.9.1 - T376256
  • 18:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 18:10 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.25 refs T375656
  • 18:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudlb2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTARTand with Dell SCP reboot policy FORCED
  • 17:22 aokoth@cumin1002: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet
  • 17:20 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
  • 17:02 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=93) on VRTS host vrts1003.eqiad.wmnet
  • 17:02 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
  • 17:01 btullis@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet
  • 17:00 urbanecm@deploy2002: Finished scap sync-world: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124) (duration: 14m 42s)
  • 16:58 btullis@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet
  • 16:56 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 16:50 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts alert[1001,2001].wikimedia.org
  • 16:50 denisse@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:50 denisse@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: alert[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin2002"
  • 16:49 denisse@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: alert[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - denisse@cumin2002"
  • 16:48 urbanecm@deploy2002: urbanecm: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:46 denisse@cumin2002: START - Cookbook sre.dns.netbox
  • 16:46 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMentees: Add additional logging (T376124), ReassignMentees: Add additional logging (T376124)
  • 16:38 denisse@cumin2002: START - Cookbook sre.hosts.decommission for hosts alert[1001,2001].wikimedia.org
  • 16:33 taavi: start extensions/GlobalUsage/maintenance/refreshGlobalimagelinks.php on labswiki to backfill global usage information
  • 16:31 taavi@deploy2002: Finished scap sync-world: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php (duration: 07m 13s)
  • 16:31 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 16:27 denisse@cumin2002: START - Cookbook sre.hosts.decommission for hosts alert[1001,2001].wikimedia.org
  • 16:27 denisse: Running the sre.hosts.decommission cookbook on the alert1001, and alert2001 hosts - T372607
  • 16:27 taavi@deploy2002: matmarex, taavi: Continuing with sync
  • 16:26 taavi@deploy2002: matmarex, taavi: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:24 taavi@deploy2002: Started scap sync-world: Backport for Add wikitech.wikimedia.org to $wgCrossSiteAJAXdomains, logging: Remove unused global $wmgMonologProcessors, Remove references to removed wikitech.php
  • 16:16 taavi@deploy2002: Finished scap sync-world: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707) (duration: 07m 01s)
  • 16:11 taavi@deploy2002: zabe, taavi: Continuing with sync
  • 16:11 taavi@deploy2002: zabe, taavi: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:09 taavi@deploy2002: Started scap sync-world: Backport for reverse-proxy: Drop all public ips except cloudweb2002-dev.codfw.wmnet (T292707)
  • 16:03 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 16:03 bking@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host wdqs-categories1001.eqiad.wmnet
  • 16:03 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs-categories1001.eqiad.wmnet with OS bullseye
  • 15:46 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 15:45 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 15:43 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:43 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:41 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 15:41 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 15:38 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:38 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:37 cdanis@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 15:36 cdanis@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 15:36 cdanis@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:36 cdanis@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:36 cdanis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:35 cdanis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:35 cdanis@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:34 cdanis@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:33 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:33 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:31 cdanis@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:31 cdanis@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:30 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@3a7901e]: T375153 (duration: 01m 59s)
  • 15:28 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
  • 15:28 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
  • 15:28 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@3a7901e]: T375153
  • 15:27 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Datacenter Switchover - T370962
  • 15:26 dancy@deploy2002: Finished scap sync-world: Testing T370934 (duration: 03m 19s)
  • 15:24 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 15:23 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: test I946dd0 with dummy upgrade
  • 15:22 dancy@deploy2002: Started scap sync-world: Testing T370934
  • 15:18 dancy@deploy2002: Installation of scap version "4.108.0" completed for 210 hosts
  • 15:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on registry1004.eqiad.wmnet with reason: testing
  • 15:14 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on registry1004.eqiad.wmnet with reason: testing
  • 15:13 dancy@deploy2002: Installing scap version "4.108.0" for 210 hosts
  • 15:12 cdanis@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:12 cdanis@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:07 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Datacenter Switchover - T370962
  • 15:07 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:04 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:00 swfrench@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
  • 15:00 swfrench@cumin1002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
  • 14:59 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:56 elukey@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:51 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs-categories1001.eqiad.wmnet with OS bullseye
  • 14:46 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
  • 14:46 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
  • 14:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs-categories1001.eqiad.wmnet on all recursors
  • 14:45 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs-categories1001.eqiad.wmnet on all recursors
  • 14:45 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
  • 14:44 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM wdqs-categories1001.eqiad.wmnet - bking@cumin2002"
  • 14:40 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc1004.wikimedia.org
  • 14:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc1004.wikimedia.org with OS bookworm
  • 14:30 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:30 bking@cumin2002: START - Cookbook sre.ganeti.makevm for new host wdqs-categories1001.eqiad.wmnet
  • 14:29 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bookworm
  • 14:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc1004.wikimedia.org with reason: host reimage
  • 14:22 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc1004.wikimedia.org with reason: host reimage
  • 14:21 urbanecm@deploy2002: Finished scap sync-world: Backport for labswiki: Disallow account autocreation (T161859) (duration: 07m 38s)
  • 14:17 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 14:16 urbanecm@deploy2002: urbanecm: Backport for labswiki: Disallow account autocreation (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:14 urbanecm@deploy2002: Started scap sync-world: Backport for labswiki: Disallow account autocreation (T161859)
  • 14:12 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc1004.wikimedia.org with OS bookworm
  • 14:11 hashar@deploy2002: Finished scap sync-world: Backport for Remove Maintenance check (T376255) (duration: 07m 27s)
  • 14:08 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc1004.wikimedia.org - elukey@cumin1002"
  • 14:08 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc1004.wikimedia.org - elukey@cumin1002"
  • 14:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc1004.wikimedia.org on all recursors
  • 14:07 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc1004.wikimedia.org on all recursors
  • 14:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:07 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1004.wikimedia.org - elukey@cumin1002"
  • 14:07 hashar@deploy2002: hashar: Continuing with sync
  • 14:06 hashar@deploy2002: hashar: Backport for Remove Maintenance check (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:06 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc1004.wikimedia.org - elukey@cumin1002"
  • 14:04 hashar@deploy2002: Started scap sync-world: Backport for Remove Maintenance check (T376255)
  • 14:03 hashar@deploy2002: Sync cancelled.
  • 14:03 hashar@deploy2002: hashar: Backport for Remove Maintenance check (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:03 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 14:03 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc1004.wikimedia.org
  • 14:01 hashar@deploy2002: Started scap sync-world: Backport for Remove Maintenance check (T376255)
  • 13:31 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:28 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Improve sub-ref check to avoid false positives (T376242) (duration: 10m 32s)
  • 13:24 lucaswerkmeister-wmde@deploy2002: wmde-fisch, lucaswerkmeister-wmde: Continuing with sync
  • 13:20 lucaswerkmeister-wmde@deploy2002: wmde-fisch, lucaswerkmeister-wmde: Backport for Improve sub-ref check to avoid false positives (T376242) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:18 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Improve sub-ref check to avoid false positives (T376242)
  • 13:17 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for [zhwiki] Enable the CampaignEvents extension (T373821) (duration: 14m 45s)
  • 13:16 moritzm: upload ircstream 0.13.0~dev+wmf1 to apt.wikimedia.org bookworm/ircstream-sse component (seperate build using the experimental eventstream feature branch of ircstream) T376014
  • 13:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:12 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Continuing with sync
  • 13:09 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 13:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, daimona: Backport for [zhwiki] Enable the CampaignEvents extension (T373821) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:02 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for [zhwiki] Enable the CampaignEvents extension (T373821)
  • 12:59 moritzm: upload python3-aiohttp-sse-client 0.2.1-0 to apt.wikimedia.org bookworm/ircstream-sse component (needed by the eventstream feature branch of ircstream) T376014
  • 12:57 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: UEFI test
  • 12:57 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: UEFI test
  • 12:49 hashar@deploy2002: Finished scap sync-world: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255) (duration: 07m 01s)
  • 12:45 hashar@deploy2002: hashar, zabe: Continuing with sync
  • 12:45 hashar@deploy2002: hashar, zabe: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:42 hashar@deploy2002: Started scap sync-world: Backport for Use wgDonationInterfaceFundraiserMaintenance (T376255)
  • 12:39 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
  • 12:35 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
  • 12:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:14 zabe@deploy2002: Finished scap sync-world: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129) (duration: 08m 50s)
  • 12:13 jayme@cumin1002: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bookworm
  • 12:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:11 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
  • 12:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:09 zabe@deploy2002: zabe: Continuing with sync
  • 12:09 zabe@deploy2002: zabe: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:08 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
  • 12:08 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
  • 12:08 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
  • 12:08 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
  • 12:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:06 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 12:06 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93)
  • 12:06 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 12:05 zabe@deploy2002: Started scap sync-world: Backport for s6: Reduce revision-slots cache expiry to 60s (T183490 T376129)
  • 12:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 10:57 _joe_: restarted rsyslog on kubernetes1045
  • 10:46 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1005.eqiad.wmnet
  • 10:46 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd1005.eqiad.wmnet with OS bullseye
  • 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd1005.eqiad.wmnet with reason: host reimage
  • 10:27 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd1005.eqiad.wmnet with reason: host reimage
  • 10:17 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd1005.eqiad.wmnet with OS bullseye
  • 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
  • 10:13 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
  • 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1005.eqiad.wmnet on all recursors
  • 10:13 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1005.eqiad.wmnet on all recursors
  • 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
  • 10:11 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1005.eqiad.wmnet - elukey@cumin1002"
  • 10:04 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 10:04 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1005.eqiad.wmnet
  • 10:03 elukey@deploy2002: Finished scap sync-world: Backport for Add irc2003 to the irc settings (T376014) (duration: 07m 11s)
  • 10:03 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1004.eqiad.wmnet
  • 10:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd1004.eqiad.wmnet with OS bullseye
  • 09:59 elukey@deploy2002: elukey: Continuing with sync
  • 09:58 elukey@deploy2002: elukey: Backport for Add irc2003 to the irc settings (T376014) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:56 elukey@deploy2002: Started scap sync-world: Backport for Add irc2003 to the irc settings (T376014)
  • 09:54 elukey@deploy2002: Finished scap sync-world: Add irc2003 to the network policies (duration: 02m 15s)
  • 09:53 elukey@deploy2002: Started scap sync-world: Add irc2003 to the network policies
  • 09:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd1004.eqiad.wmnet with reason: host reimage
  • 09:47 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd1004.eqiad.wmnet with reason: host reimage
  • 09:44 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:44 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:43 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:43 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:42 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:42 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd1004.eqiad.wmnet with OS bullseye
  • 09:31 hashar@deploy2002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to [php-1.43.0-wmf.24]" - T375656
  • 09:30 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation/Advancement/Community Growth/Community Resources" "Wikimedia Foundation/Advancement/Community Growth/Community Resources and Partnerships" "Zabe" --reason "per request T376246"
  • 09:23 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
  • 09:23 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
  • 09:22 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1004.eqiad.wmnet on all recursors
  • 09:22 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1004.eqiad.wmnet on all recursors
  • 09:22 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:22 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
  • 09:21 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd1004.eqiad.wmnet - elukey@cumin1002"
  • 09:17 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 09:17 jynus@cumin1002: dbctl commit (dc=all): 'Set es2024 to weight 10 as the rest of es-rw hosts T376249', diff saved to https://phabricator.wikimedia.org/P69443 and previous config saved to /var/cache/conftool/dbconfig/20241002-091754-jynus.json
  • 09:17 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1004.eqiad.wmnet
  • 09:16 elukey@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-ctrl1004.eqiad.wmnet
  • 09:16 elukey@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 09:16 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 09:16 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl1004.eqiad.wmnet
  • 09:13 vgutierrez: repooling cp3071 and cp3072 after HW maintenance - T374986
  • 09:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp[3071-3072].esams.wmnet
  • 09:08 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp[3071-3072].esams.wmnet
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2003.wikimedia.org
  • 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-ctrl1001.eqiad.wmnet
  • 08:57 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-ctrl1001.eqiad.wmnet
  • 08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2003.wikimedia.org
  • 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host aux-k8s-worker1001.eqiad.wmnet
  • 08:55 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host aux-k8s-worker1001.eqiad.wmnet
  • 08:55 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@3b76c68]: (no justification provided) (duration: 00m 52s)
  • 08:54 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@3b76c68]: (no justification provided)
  • 08:36 jayme: removed the label node-role.kubernetes.io/master and the taint node-role.kubernetes.io/master:NoSchedule to all k8s apiservers - T334234
  • 08:32 jayme: added the taint node-role.kubernetes.io/control-plane:NoSchedule to all k8s apiservers - T334234
  • 08:29 hashar: Restarted stashbot based on instructions at https://wikitech.wikimedia.org/wiki/Tool:Stashbot
  • 08:20 hashar@deploy2002: Finished scap sync-world: Backport for Metrics Platform monotable: Base stream configuration (T373967) (duration: 10m 27s)
  • 08:16 hashar@deploy2002: hashar, sfaci: Continuing with sync
  • 08:12 hashar@deploy2002: hashar, sfaci: Backport for Metrics Platform monotable: Base stream configuration (T373967) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:10 hashar@deploy2002: Started scap sync-world: Backport for Metrics Platform monotable: Base stream configuration (T373967)
  • 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 07:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
  • 07:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
  • 07:09 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp[3071-3072].esams.wmnet with reason: HW maintenance
  • 07:09 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp[3071-3072].esams.wmnet with reason: HW maintenance
  • 06:50 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 1497 hosts
  • 06:49 root@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 1497 hosts
  • 06:48 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 706 hosts
  • 06:48 root@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 706 hosts
  • 02:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 01:20 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 01:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host logging-hd2005.codfw.wmnet with OS bookworm

2024-10-01

  • 23:42 zabe: zabe@mwmaint2002:~$ cat /home/zabe/s3.txt | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTable.php {} --skip /home/zabe/text_table_cleanup/{} --dump /home/zabe/text_table_dump/{} --sleep 1" # T183490
  • 20:34 hashar: UTC late backport window completed
  • 20:28 hashar: mwscript purgeList.php --wiki=tlywiki --namespace=4 # T367009
  • 20:12 hashar@deploy2002: Finished scap sync-world: Backport for Update wgMetaNamespace for tlywiki (T367009) (duration: 07m 21s)
  • 20:07 hashar@deploy2002: nmw03, hashar: Continuing with sync
  • 20:06 hashar@deploy2002: nmw03, hashar: Backport for Update wgMetaNamespace for tlywiki (T367009) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:04 hashar@deploy2002: Started scap sync-world: Backport for Update wgMetaNamespace for tlywiki (T367009)
  • 20:02 hashar: Restarting CI Jenkins
  • 19:48 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 19:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 17:59 ladsgroup@deploy2002: Finished scap sync-world: Backport for Allow storing of passwords for local users in wikitech (T376140) (duration: 09m 03s)
  • 17:56 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:55 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 17:55 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:53 ladsgroup@deploy2002: ladsgroup: Backport for Allow storing of passwords for local users in wikitech (T376140) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:50 ladsgroup@deploy2002: Started scap sync-world: Backport for Allow storing of passwords for local users in wikitech (T376140)
  • 17:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2004-dev.codfw.wmnet with OS bookworm
  • 16:00 ladsgroup@deploy2002: taavi, ladsgroup: Continuing with sync
  • 15:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, this test transfer should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:58 ladsgroup@deploy2002: taavi, ladsgroup: Backport for Make Wikitech behave a bit more like a SUL wiki (T371374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:56 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, this test transfer should succeed) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling both afterwards
  • 15:55 ladsgroup@deploy2002: Started scap sync-world: Backport for Make Wikitech behave a bit more like a SUL wiki (T371374)
  • 15:54 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T364077, this test transfer should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1023.eqiad.wmnet, repooling both afterwards
  • 15:54 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T364077, this test transfer should fail) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1023.eqiad.wmnet, repooling both afterwards
  • 15:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:39 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 15:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:07 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
  • 15:07 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1003.eqiad.wmnet
  • 15:05 brennen@deploy2002: Finished deploy [phabricator/deployment@33a2c8d]: deploy phab1004 for T376149 (duration: 01m 07s)
  • 15:04 brennen@deploy2002: Started deploy [phabricator/deployment@33a2c8d]: deploy phab1004 for T376149
  • 15:03 brennen@deploy2002: Finished deploy [phabricator/deployment@33a2c8d]: test deploy phab2002 for T376149 (duration: 00m 30s)
  • 15:03 jelto@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:03 brennen@deploy2002: Started deploy [phabricator/deployment@33a2c8d]: test deploy phab2002 for T376149
  • 15:02 jelto@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:01 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:01 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:01 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 14:45 jayme: added the taint node-role.kubernetes.io/control-plane:NoSchedule to wikikube staging apiservers - T334234
  • 14:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host logging-hd2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:15 jayme: added the label node-role.kubernetes.io/control-plane= to all k8s apiservers - T334234
  • 14:10 moritzm: installing cups security updates
  • 13:49 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-worker1003.eqiad.wmnet
  • 13:49 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
  • 13:32 elukey@puppetserver1001: conftool action : set/weight=1; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
  • 13:32 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl1003.eqiad.wmnet
  • 13:31 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=aux-k8s-worker1003.eqiad.wmnet
  • 13:31 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1003.eqiad.wmnet
  • 13:21 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 12:28 ladsgroup@deploy2002: Finished scap sync-world: Backport for wikitech: Allow 'crats to rename local users (T161859) (duration: 07m 51s)
  • 12:23 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 12:23 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=labswiki --undo /home/ladsgroup/T376129.undo.sql DB cluster31 (T376129)
  • 12:22 ladsgroup@deploy2002: ladsgroup: Backport for wikitech: Allow 'crats to rename local users (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:20 ladsgroup@deploy2002: Started scap sync-world: Backport for wikitech: Allow 'crats to rename local users (T161859)
  • 12:17 ladsgroup@deploy2002: Finished scap sync-world: Backport for Wikitech: Connect wikitech to external storage (T376129) (duration: 09m 53s)
  • 12:12 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 12:09 ladsgroup@deploy2002: ladsgroup: Backport for Wikitech: Connect wikitech to external storage (T376129) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:07 ladsgroup@deploy2002: Started scap sync-world: Backport for Wikitech: Connect wikitech to external storage (T376129)
  • 12:02 ladsgroup@deploy2002: Finished scap sync-world: Backport for wikitech: Soft connect wikitech to SUL (T161859) (duration: 09m 53s)
  • 11:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 11:54 ladsgroup@deploy2002: ladsgroup: Backport for wikitech: Soft connect wikitech to SUL (T161859) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:52 ladsgroup@deploy2002: Started scap sync-world: Backport for wikitech: Soft connect wikitech to SUL (T161859)
  • 11:51 stevemunene@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 11:49 ladsgroup@deploy2002: Finished scap sync-world: Backport for Drop wikitech.php (T371592 T371374) (duration: 07m 32s)
  • 11:45 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 11:44 ladsgroup@deploy2002: ladsgroup: Backport for Drop wikitech.php (T371592 T371374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:42 ladsgroup@deploy2002: Started scap sync-world: Backport for Drop wikitech.php (T371592 T371374)
  • 11:28 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host irc2003.wikimedia.org
  • 11:28 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host irc2003.wikimedia.org with OS bookworm
  • 11:16 effie: Switching wikitech to k8s - T292707
  • 11:12 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on irc2003.wikimedia.org with reason: host reimage
  • 11:09 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on irc2003.wikimedia.org with reason: host reimage
  • 11:01 jiji@deploy2002: Finished scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) (duration: 08m 23s)
  • 10:56 jiji@deploy2002: jiji: Continuing with sync
  • 10:55 jiji@deploy2002: jiji: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:52 jiji@deploy2002: Started scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359)
  • 10:48 jiji@deploy2002: Sync cancelled.
  • 10:44 jiji@deploy2002: jiji: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:44 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:44 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-staging2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:42 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2011.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:42 jiji@deploy2002: Started scap sync-world: Backport for wikitech: de-wikitech mediawiki-config (T371537 T371592 T371374 T371359)
  • 10:41 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:40 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:38 elukey@cumin2002: START - Cookbook sre.hosts.provision for host parsoidtest1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:36 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host deploy1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:35 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:35 elukey@cumin2002: START - Cookbook sre.hosts.provision for host krb1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:33 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2035.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:26 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:25 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:24 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:23 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:21 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:17 elukey@cumin2002: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host irc2003.wikimedia.org with OS bookworm
  • 10:15 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2003.wikimedia.org - elukey@cumin1002"
  • 10:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM irc2003.wikimedia.org - elukey@cumin1002"
  • 10:15 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) irc2003.wikimedia.org on all recursors
  • 10:15 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache irc2003.wikimedia.org on all recursors
  • 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:15 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2003.wikimedia.org - elukey@cumin1002"
  • 10:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM irc2003.wikimedia.org - elukey@cumin1002"
  • 10:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:11 elukey@cumin1002: START - Cookbook sre.dns.netbox
  • 10:11 elukey@cumin1002: START - Cookbook sre.ganeti.makevm for new host irc2003.wikimedia.org
  • 10:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:06 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
  • 10:02 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:01 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
  • 09:59 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:57 elukey@cumin2002: START - Cookbook sre.hosts.provision for host an-conf1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:24 jmm@deploy2002: Finished scap sync-world: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014) (duration: 08m 07s)
  • 09:19 jmm@deploy2002: jmm: Continuing with sync
  • 09:19 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
  • 09:18 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1063.eqiad.wmnet with reason: cloudvirt1063 needs maintenance T375223
  • 09:18 jmm@deploy2002: jmm: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:16 jmm@deploy2002: Started scap sync-world: Backport for Remove irc1001/irc2001 from mediawiki-config and add irc1003 (T331702 T376014)
  • 09:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T367856)', diff saved to https://phabricator.wikimedia.org/P69437 and previous config saved to /var/cache/conftool/dbconfig/20241001-090708-ladsgroup.json
  • 09:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:58 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.25 refs T375656
  • 08:46 urbanecm@deploy2002: Finished scap sync-world: Backport for DatabaseMentorStore: Cast user IDs to integers before looking them up (T375784) (duration: 06m 58s)
  • 08:39 urbanecm@deploy2002: Started scap sync-world: Backport for DatabaseMentorStore: Cast user IDs to integers before looking them up (T375784)
  • 07:58 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T375382
  • 07:54 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T375382
  • 07:43 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215
  • 07:39 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215
  • 07:34 kartik@deploy2002: Finished scap sync-world: Backport for Add namespace aliases for scn.wikipedia (T375979) (duration: 10m 05s)
  • 07:30 kartik@deploy2002: kartik, melos: Continuing with sync
  • 07:26 kartik@deploy2002: kartik, melos: Backport for Add namespace aliases for scn.wikipedia (T375979) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:24 kartik@deploy2002: Started scap sync-world: Backport for Add namespace aliases for scn.wikipedia (T375979)
  • 07:21 kartik@deploy2002: Finished scap sync-world: Backport for Enable translation settings banner for Test wikipedia (T372460) (duration: 18m 15s)
  • 07:14 kartik@deploy2002: kartik, abi: Continuing with sync
  • 07:09 kartik@deploy2002: kartik, abi: Backport for Enable translation settings banner for Test wikipedia (T372460) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:03 kartik@deploy2002: Started scap sync-world: Backport for Enable translation settings banner for Test wikipedia (T372460)
  • 06:47 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Luke Bowmaker out of all services on: 705 hosts
  • 06:47 root@cumin2002: START - Cookbook sre.idm.logout Logging Luke Bowmaker out of all services on: 705 hosts
  • 06:47 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Luke Bowmaker out of all services on: 1497 hosts
  • 06:46 root@cumin2002: START - Cookbook sre.idm.logout Logging Luke Bowmaker out of all services on: 1497 hosts
  • 06:44 XioNoX: cr3-ulsfo> request vmhost snapshot - T375345
  • 04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.22 (duration: 00m 58s)
  • 03:51 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.43.0-wmf.25 refs T375656 (duration: 48m 36s)
  • 03:02 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.43.0-wmf.25 refs T375656
  • 02:47 eileen: civicrm upgraded from cf27c789 to 28fd5e3b
  • 02:17 ejegg: email preference center upgraded from 8ff002ef to e88750e6
  • 02:16 ejegg: payments-wiki upgraded from 8d3b8e94 to e88750e6

Archives

See Server Admin Log/Archives.