Jump to content

Server Admin Log/Archive 95

From Wikitech


2025-07-31

  • 23:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P80402 and previous config saved to /var/cache/conftool/dbconfig/20250731-235547-ladsgroup.json
  • 23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P80401 and previous config saved to /var/cache/conftool/dbconfig/20250731-234040-ladsgroup.json
  • 23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T400854)', diff saved to https://phabricator.wikimedia.org/P80400 and previous config saved to /var/cache/conftool/dbconfig/20250731-232532-ladsgroup.json
  • 23:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T400854)', diff saved to https://phabricator.wikimedia.org/P80399 and previous config saved to /var/cache/conftool/dbconfig/20250731-232304-ladsgroup.json
  • 23:22 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 23:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T400854)', diff saved to https://phabricator.wikimedia.org/P80398 and previous config saved to /var/cache/conftool/dbconfig/20250731-232241-ladsgroup.json
  • 23:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P80397 and previous config saved to /var/cache/conftool/dbconfig/20250731-230734-ladsgroup.json
  • 22:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P80396 and previous config saved to /var/cache/conftool/dbconfig/20250731-225226-ladsgroup.json
  • 22:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T400854)', diff saved to https://phabricator.wikimedia.org/P80395 and previous config saved to /var/cache/conftool/dbconfig/20250731-223719-ladsgroup.json
  • 22:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T400854)', diff saved to https://phabricator.wikimedia.org/P80394 and previous config saved to /var/cache/conftool/dbconfig/20250731-223453-ladsgroup.json
  • 22:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 22:33 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 22:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T400854)', diff saved to https://phabricator.wikimedia.org/P80393 and previous config saved to /var/cache/conftool/dbconfig/20250731-223339-ladsgroup.json
  • 22:31 dancy@deploy1003: build-images aborted: Publishing wmf/next image (duration: 01m 05s)
  • 22:30 dancy@deploy1003: Started scap build-images: Publishing wmf/next image
  • 22:29 dancy@deploy1003: Installation of scap version "4.194.2" completed for 2 hosts
  • 22:27 dancy@deploy1003: Installing scap version "4.194.2" for 2 host(s)
  • 22:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P80392 and previous config saved to /var/cache/conftool/dbconfig/20250731-221832-ladsgroup.json
  • 22:11 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp70[03-16].magru.wmnet} and A:cp - 9.2.11-1wm2 upgrade ()
  • 22:09 dancy@deploy1003: Installation of scap version "4.193.0" completed for 2 hosts
  • 22:08 dancy@deploy1003: Installing scap version "4.193.0" for 2 host(s)
  • 22:05 zabe@deploy1003: Finished scap sync-world: Backport for Stop setting wgGlobalUsageDatabase (T400169) (duration: 08m 24s)
  • 22:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ncredir1001.eqiad.wmnet
  • 22:04 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for ncredir1001.eqiad.wmnet
  • 22:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ncredir1002.eqiad.wmnet
  • 22:03 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for ncredir1002.eqiad.wmnet
  • 22:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ncredir2001.codfw.wmnet
  • 22:03 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for ncredir2001.codfw.wmnet
  • 22:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P80391 and previous config saved to /var/cache/conftool/dbconfig/20250731-220324-ladsgroup.json
  • 21:59 zabe@deploy1003: zabe: Continuing with sync
  • 21:58 zabe@deploy1003: zabe: Backport for Stop setting wgGlobalUsageDatabase (T400169) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:56 zabe@deploy1003: Started scap sync-world: Backport for Stop setting wgGlobalUsageDatabase (T400169)
  • 21:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T400854)', diff saved to https://phabricator.wikimedia.org/P80390 and previous config saved to /var/cache/conftool/dbconfig/20250731-214817-ladsgroup.json
  • 21:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T400854)', diff saved to https://phabricator.wikimedia.org/P80389 and previous config saved to /var/cache/conftool/dbconfig/20250731-214654-ladsgroup.json
  • 21:46 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 21:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T400854)', diff saved to https://phabricator.wikimedia.org/P80388 and previous config saved to /var/cache/conftool/dbconfig/20250731-214631-ladsgroup.json
  • 21:43 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2001.codfw.wmnet with reason: known and WIP
  • 21:43 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1001.eqiad.wmnet with reason: known and WIP
  • 21:42 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1002.eqiad.wmnet with reason: known and WIP
  • 21:38 dancy@deploy1003: Finished scap sync-world: Backport for Fix placement of toolbar insert group on mobile (T400933) (duration: 13m 54s)
  • 21:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P80387 and previous config saved to /var/cache/conftool/dbconfig/20250731-213124-ladsgroup.json
  • 21:30 dancy@deploy1003: dancy, kemayo: Continuing with sync
  • 21:29 dancy@deploy1003: dancy, kemayo: Backport for Fix placement of toolbar insert group on mobile (T400933) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:24 dancy@deploy1003: Started scap sync-world: Backport for Fix placement of toolbar insert group on mobile (T400933)
  • 21:23 dancy@deploy1003: Installation of scap version "4.194.1" completed for 2 hosts
  • 21:22 brett: Deleting /var/lib/acme-chief/certs/non-canonical-redirect-{13..18} from acme-chief to force regeneration of certs
  • 21:21 dancy@deploy1003: Installing scap version "4.194.1" for 2 host(s)
  • 21:16 dancy@deploy1003: Installation of scap version "4.193.0" completed for 2 hosts
  • 21:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P80386 and previous config saved to /var/cache/conftool/dbconfig/20250731-211616-ladsgroup.json
  • 21:15 dancy@deploy1003: Installing scap version "4.193.0" for 2 host(s)
  • 21:14 dancy@deploy1003: Finished scap sync-world: Backport for Fix placement of toolbar insert group on mobile (T400933) (duration: 33m 54s)
  • 21:02 dancy@deploy1003: kemayo, dancy: Continuing with sync
  • 21:01 dancy@deploy1003: kemayo, dancy: Backport for Fix placement of toolbar insert group on mobile (T400933) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T400854)', diff saved to https://phabricator.wikimedia.org/P80385 and previous config saved to /var/cache/conftool/dbconfig/20250731-210108-ladsgroup.json
  • 20:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T400854)', diff saved to https://phabricator.wikimedia.org/P80384 and previous config saved to /var/cache/conftool/dbconfig/20250731-205844-ladsgroup.json
  • 20:58 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T400854)', diff saved to https://phabricator.wikimedia.org/P80383 and previous config saved to /var/cache/conftool/dbconfig/20250731-205821-ladsgroup.json
  • 20:46 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp70[03-16].magru.wmnet} and A:cp - 9.2.11-1wm2 upgrade ()
  • 20:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P80382 and previous config saved to /var/cache/conftool/dbconfig/20250731-204314-ladsgroup.json
  • 20:40 dancy@deploy1003: Started scap sync-world: Backport for Fix placement of toolbar insert group on mobile (T400933)
  • 20:35 kemayo@deploy1003: kemayo: Backport for Fix placement of toolbar insert group on mobile (T400933) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P80379 and previous config saved to /var/cache/conftool/dbconfig/20250731-202806-ladsgroup.json
  • 20:23 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp7002.magru.wmnet} and A:cp - 9.2.11-1wm2 upgrade ()
  • 20:21 kemayo@deploy1003: Started scap sync-world: Backport for Fix placement of toolbar insert group on mobile (T400933)
  • 20:17 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp7002.magru.wmnet} and A:cp - 9.2.11-1wm2 upgrade ()
  • 20:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T400854)', diff saved to https://phabricator.wikimedia.org/P80378 and previous config saved to /var/cache/conftool/dbconfig/20250731-201259-ladsgroup.json
  • 20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T400854)', diff saved to https://phabricator.wikimedia.org/P80377 and previous config saved to /var/cache/conftool/dbconfig/20250731-201026-ladsgroup.json
  • 20:10 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T400854)', diff saved to https://phabricator.wikimedia.org/P80376 and previous config saved to /var/cache/conftool/dbconfig/20250731-201003-ladsgroup.json
  • 20:07 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1035.eqiad.wmnet with OS bookworm
  • 19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P80375 and previous config saved to /var/cache/conftool/dbconfig/20250731-195455-ladsgroup.json
  • 19:48 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1035.eqiad.wmnet with reason: host reimage
  • 19:45 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1035.eqiad.wmnet with reason: host reimage
  • 19:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P80374 and previous config saved to /var/cache/conftool/dbconfig/20250731-193948-ladsgroup.json
  • 19:32 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.12 refs T396373
  • 19:26 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1035.eqiad.wmnet with OS bookworm
  • 19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T400854)', diff saved to https://phabricator.wikimedia.org/P80373 and previous config saved to /var/cache/conftool/dbconfig/20250731-192440-ladsgroup.json
  • 19:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T400854)', diff saved to https://phabricator.wikimedia.org/P80372 and previous config saved to /var/cache/conftool/dbconfig/20250731-192208-ladsgroup.json
  • 19:22 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 19:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T400854)', diff saved to https://phabricator.wikimedia.org/P80371 and previous config saved to /var/cache/conftool/dbconfig/20250731-192144-ladsgroup.json
  • 19:19 brennen@deploy1003: Finished scap sync-world: Backport for Notifications: fix type error and add regression test (T400899) (duration: 08m 46s)
  • 19:14 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 19:14 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 19:14 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 19:13 brennen@deploy1003: brennen: Continuing with sync
  • 19:13 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 19:13 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:12 brennen@deploy1003: brennen: Backport for Notifications: fix type error and add regression test (T400899) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:12 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:12 ejegg: restarted automated fundraising jobs
  • 19:10 brennen@deploy1003: Started scap sync-world: Backport for Notifications: fix type error and add regression test (T400899)
  • 19:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P80370 and previous config saved to /var/cache/conftool/dbconfig/20250731-190637-ladsgroup.json
  • 18:59 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:59 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:58 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:58 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:57 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:57 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:53 ejegg: disabled automated fundraising jobs for table alter
  • 18:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P80369 and previous config saved to /var/cache/conftool/dbconfig/20250731-185129-ladsgroup.json
  • 18:44 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 18:43 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 18:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T400854)', diff saved to https://phabricator.wikimedia.org/P80368 and previous config saved to /var/cache/conftool/dbconfig/20250731-183622-ladsgroup.json
  • 18:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T400854)', diff saved to https://phabricator.wikimedia.org/P80367 and previous config saved to /var/cache/conftool/dbconfig/20250731-183353-ladsgroup.json
  • 18:33 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 18:33 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 18:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T400854)', diff saved to https://phabricator.wikimedia.org/P80366 and previous config saved to /var/cache/conftool/dbconfig/20250731-183248-ladsgroup.json
  • 18:29 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c7-eqiad
  • 18:29 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-c7-eqiad
  • 18:27 brennen: train 1.45.0-wmf.12 status (T396373): blocked on T400899, holding at group1
  • 18:23 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c6-eqiad
  • 18:23 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-c6-eqiad
  • 18:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P80365 and previous config saved to /var/cache/conftool/dbconfig/20250731-181740-ladsgroup.json
  • 18:15 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c5-eqiad
  • 18:15 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-c5-eqiad
  • 18:06 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c4-eqiad
  • 18:05 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-c4-eqiad
  • 18:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P80364 and previous config saved to /var/cache/conftool/dbconfig/20250731-180233-ladsgroup.json
  • 18:01 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp7001.magru.wmnet} and A:cp - 9.2.11-1wm2 upgrade ()
  • 17:57 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c3-eqiad
  • 17:57 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-c3-eqiad
  • 17:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp7001.magru.wmnet} and A:cp - 9.2.11-1wm2 upgrade ()
  • 17:53 brett: Import trafficserver 9.2.11-1wm2 into bullseye-wikimedia
  • 17:51 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c2-eqiad
  • 17:51 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-c2-eqiad
  • 17:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T400854)', diff saved to https://phabricator.wikimedia.org/P80363 and previous config saved to /var/cache/conftool/dbconfig/20250731-174725-ladsgroup.json
  • 17:42 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d8-eqiad
  • 17:42 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d8-eqiad
  • 17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T400854)', diff saved to https://phabricator.wikimedia.org/P80362 and previous config saved to /var/cache/conftool/dbconfig/20250731-174106-ladsgroup.json
  • 17:40 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 17:40 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 17:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T400854)', diff saved to https://phabricator.wikimedia.org/P80361 and previous config saved to /var/cache/conftool/dbconfig/20250731-173955-ladsgroup.json
  • 17:35 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1034.eqiad.wmnet with OS bookworm
  • 17:29 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d7-eqiad
  • 17:28 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d7-eqiad
  • 17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P80360 and previous config saved to /var/cache/conftool/dbconfig/20250731-172447-ladsgroup.json
  • 17:22 sukhe: sudo cumin -b21 "A:cp" "run-puppet-agent --enable 'merging CR 1174685'"
  • 17:14 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1034.eqiad.wmnet with reason: host reimage
  • 17:14 cmooney@cumin1003: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device lsw1-d6-eqiad
  • 17:09 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1034.eqiad.wmnet with reason: host reimage
  • 17:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P80359 and previous config saved to /var/cache/conftool/dbconfig/20250731-170940-ladsgroup.json
  • 17:08 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d6-eqiad
  • 17:07 sukhe: sudo cumin "A:cp" "disable-puppet 'merging CR 1174685'"
  • 17:06 brett@dns1004: END - running authdns-update
  • 17:05 brett@dns1004: START - running authdns-update
  • 16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T400854)', diff saved to https://phabricator.wikimedia.org/P80358 and previous config saved to /var/cache/conftool/dbconfig/20250731-165432-ladsgroup.json
  • 16:51 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1034.eqiad.wmnet with OS bookworm
  • 16:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T400854)', diff saved to https://phabricator.wikimedia.org/P80357 and previous config saved to /var/cache/conftool/dbconfig/20250731-164610-ladsgroup.json
  • 16:46 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T400854)', diff saved to https://phabricator.wikimedia.org/P80356 and previous config saved to /var/cache/conftool/dbconfig/20250731-164547-ladsgroup.json
  • 16:35 jly@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:35 jly@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:35 jly@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:33 jly@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:32 jly@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:32 jly@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:32 jly@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:31 jly@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:31 jly@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:31 jly@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P80355 and previous config saved to /var/cache/conftool/dbconfig/20250731-163040-ladsgroup.json
  • 16:25 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d4-eqiad
  • 16:24 dancy@deploy1003: Finished scap sync-world: Testing T398875 (duration: 03m 43s)
  • 16:24 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d4-eqiad
  • 16:21 dancy@deploy1003: Started scap sync-world: Testing T398875
  • 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P80354 and previous config saved to /var/cache/conftool/dbconfig/20250731-161532-ladsgroup.json
  • 16:03 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:02 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:02 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:01 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:00 dancy@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 10m 36s)
  • 16:00 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T400854)', diff saved to https://phabricator.wikimedia.org/P80353 and previous config saved to /var/cache/conftool/dbconfig/20250731-160024-ladsgroup.json
  • 16:00 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 16:00 ottomata: deploying eventgate-analytics external for T398922 and T396359
  • 15:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T400854)', diff saved to https://phabricator.wikimedia.org/P80352 and previous config saved to /var/cache/conftool/dbconfig/20250731-155903-ladsgroup.json
  • 15:58 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T400854)', diff saved to https://phabricator.wikimedia.org/P80351 and previous config saved to /var/cache/conftool/dbconfig/20250731-155840-ladsgroup.json
  • 15:50 dancy@deploy1003: Started scap build-images: Publishing wmf/next image
  • 15:47 dancy@deploy1003: Installation of scap version "4.194.1" completed for 2 hosts
  • 15:45 dancy@deploy1003: Installing scap version "4.194.1" for 2 host(s)
  • 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P80350 and previous config saved to /var/cache/conftool/dbconfig/20250731-154333-ladsgroup.json
  • 15:38 hashar: Restarting Gerrit
  • 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P80349 and previous config saved to /var/cache/conftool/dbconfig/20250731-152826-ladsgroup.json
  • 15:27 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1033.eqiad.wmnet with OS bookworm
  • 15:15 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: sync
  • 15:14 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: sync
  • 15:14 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: sync
  • 15:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T400854)', diff saved to https://phabricator.wikimedia.org/P80348 and previous config saved to /var/cache/conftool/dbconfig/20250731-151318-ladsgroup.json
  • 15:13 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/proton: sync
  • 15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T400854)', diff saved to https://phabricator.wikimedia.org/P80347 and previous config saved to /var/cache/conftool/dbconfig/20250731-150856-ladsgroup.json
  • 15:08 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1033.eqiad.wmnet with reason: host reimage
  • 15:08 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T400854)', diff saved to https://phabricator.wikimedia.org/P80346 and previous config saved to /var/cache/conftool/dbconfig/20250731-150834-ladsgroup.json
  • 15:03 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1033.eqiad.wmnet with reason: host reimage
  • 14:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P80345 and previous config saved to /var/cache/conftool/dbconfig/20250731-145326-ladsgroup.json
  • 14:44 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1033.eqiad.wmnet with OS bookworm
  • 14:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P80344 and previous config saved to /var/cache/conftool/dbconfig/20250731-143819-ladsgroup.json
  • 14:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T400854)', diff saved to https://phabricator.wikimedia.org/P80343 and previous config saved to /var/cache/conftool/dbconfig/20250731-142311-ladsgroup.json
  • 14:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T400854)', diff saved to https://phabricator.wikimedia.org/P80342 and previous config saved to /var/cache/conftool/dbconfig/20250731-142050-ladsgroup.json
  • 14:20 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 14:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T400854)', diff saved to https://phabricator.wikimedia.org/P80341 and previous config saved to /var/cache/conftool/dbconfig/20250731-142027-ladsgroup.json
  • 14:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P80340 and previous config saved to /var/cache/conftool/dbconfig/20250731-140519-ladsgroup.json
  • 13:54 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:54 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Set `wgCheckUserUserInfoCardFeatureVisible` to true on testwiki (duration: 09m 33s)
  • 13:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P80339 and previous config saved to /var/cache/conftool/dbconfig/20250731-135012-ladsgroup.json
  • 13:48 lucaswerkmeister-wmde@deploy1003: mszwarc, lucaswerkmeister-wmde: Continuing with sync
  • 13:47 lucaswerkmeister-wmde@deploy1003: mszwarc, lucaswerkmeister-wmde: Backport for Set `wgCheckUserUserInfoCardFeatureVisible` to true on testwiki synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:44 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Set `wgCheckUserUserInfoCardFeatureVisible` to true on testwiki
  • 13:38 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-d8-eqiad
  • 13:38 cmooney@cumin1003: START - Cookbook sre.network.tls for network device ssw1-d8-eqiad
  • 13:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T400854)', diff saved to https://phabricator.wikimedia.org/P80338 and previous config saved to /var/cache/conftool/dbconfig/20250731-133504-ladsgroup.json
  • 13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T400854)', diff saved to https://phabricator.wikimedia.org/P80337 and previous config saved to /var/cache/conftool/dbconfig/20250731-133143-ladsgroup.json
  • 13:31 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:31 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:30 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for zhwiki: Allow local securepoll setup (T380020) (duration: 09m 24s)
  • 13:30 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: sync
  • 13:29 elukey@deploy1003: helmfile [staging] START helmfile.d/services/proton: sync
  • 13:29 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d1-eqiad
  • 13:29 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d1-eqiad
  • 13:25 lucaswerkmeister-wmde@deploy1003: stang, lucaswerkmeister-wmde: Continuing with sync
  • 13:25 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye
  • 13:23 lucaswerkmeister-wmde@deploy1003: stang, lucaswerkmeister-wmde: Backport for zhwiki: Allow local securepoll setup (T380020) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:22 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-d1-eqiad
  • 13:21 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for zhwiki: Allow local securepoll setup (T380020)
  • 13:20 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 13:16 andrew@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon2006-dev.codfw.wmnet with OS bullseye
  • 13:15 Lucas_WMDE: created securepoll_log on zhwiki via sql.php (T380020)
  • 13:12 elukey: install spicerack 11.4.0 on cumin100*
  • 12:59 elukey: install spicerack 11.4.0 on cumin2002
  • 12:36 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d1-eqiad
  • 12:35 cmooney@cumin1003: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device lsw1-d8-eqiad
  • 12:33 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d8-eqiad
  • 12:26 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 12:25 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:25 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:25 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:24 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:24 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:21 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:20 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:20 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:19 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:19 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:18 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:08 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-d8-eqiad
  • 11:07 cmooney@cumin1003: START - Cookbook sre.network.tls for network device ssw1-d8-eqiad
  • 10:36 elukey: uploaded spicerack_11.4.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
  • 10:32 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:32 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for nokia switches eqiad - cmooney@cumin1003"
  • 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for nokia switches eqiad - cmooney@cumin1003"
  • 10:25 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:19 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:01 fabfur: haproxykafka upgraded to v0.3.13 on A:cp (T400199)
  • 09:50 fabfur: upgrading haproxykafka to v0.3.13 on A:cp (T400199)
  • 09:13 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye
  • 09:02 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 02:46 eileen: config revision changed from 9fff33b8 to aae36688
  • 02:33 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1024.eqiad.wmnet with OS bookworm
  • 02:33 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 02:33 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 02:29 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1025.eqiad.wmnet with OS bookworm
  • 02:28 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 02:28 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 02:18 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1024.eqiad.wmnet with reason: host reimage
  • 02:15 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1024.eqiad.wmnet with reason: host reimage
  • 02:13 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1025.eqiad.wmnet with reason: host reimage
  • 02:07 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1025.eqiad.wmnet with reason: host reimage
  • 02:03 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1024.eqiad.wmnet with OS bookworm
  • 02:00 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 01:55 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1025.eqiad.wmnet with OS bookworm
  • 01:54 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 01:51 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 01:48 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 01:45 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 01:38 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 01:37 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 01:36 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 01:35 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 01:25 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 01:11 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 10m 48s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:23 zabe@deploy1003: Finished scap sync-world: Backport for group0: Stop writing to cl_to and cl_collation (T399579) (duration: 08m 39s)
  • 00:18 zabe@deploy1003: zabe: Continuing with sync
  • 00:17 zabe@deploy1003: zabe: Backport for group0: Stop writing to cl_to and cl_collation (T399579) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:15 zabe@deploy1003: Started scap sync-world: Backport for group0: Stop writing to cl_to and cl_collation (T399579)
  • 00:14 zabe@deploy1003: Finished scap sync-world: Backport for CommonSettings: Stop setting wgDBuser (duration: 13m 32s)
  • 00:07 zabe@deploy1003: zabe: Continuing with sync
  • 00:05 zabe@deploy1003: zabe: Backport for CommonSettings: Stop setting wgDBuser synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:01 zabe@deploy1003: Started scap sync-world: Backport for CommonSettings: Stop setting wgDBuser

2025-07-30

  • 23:02 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1036.eqiad.wmnet with OS bookworm
  • 22:50 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 22:44 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
  • 22:40 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1036.eqiad.wmnet with reason: host reimage
  • 22:22 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1036.eqiad.wmnet with OS bookworm
  • 22:00 dreamyjazz@deploy1003: Finished scap sync-world: Backport for ListPage: don't try to list votes for jump polls (T400831 T75915 T398126), ListPage: don't try to list votes for jump polls (T400831 T75915 T398126) (duration: 42m 40s)
  • 21:51 ryankemper@cumin2002: START - Cookbook sre.wdqs.restart
  • 21:47 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 21:46 dreamyjazz@deploy1003: dreamyjazz: Backport for ListPage: don't try to list votes for jump polls (T400831 T75915 T398126), ListPage: don't try to list votes for jump polls (T400831 T75915 T398126) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:18 dreamyjazz@deploy1003: Started scap sync-world: Backport for ListPage: don't try to list votes for jump polls (T400831 T75915 T398126), ListPage: don't try to list votes for jump polls (T400831 T75915 T398126)
  • 20:38 dancy@deploy1003: Finished scap sync-world: Backport for wikimaniawiki: adjust down 2025 namespace protection (T400833) (duration: 08m 27s)
  • 20:33 dancy@deploy1003: robertsky, dancy: Continuing with sync
  • 20:32 dancy@deploy1003: robertsky, dancy: Backport for wikimaniawiki: adjust down 2025 namespace protection (T400833) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:30 dancy@deploy1003: Started scap sync-world: Backport for wikimaniawiki: adjust down 2025 namespace protection (T400833)
  • 20:25 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:23 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:23 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1037.eqiad.wmnet with OS bookworm
  • 20:22 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host clouddb1025
  • 20:22 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host clouddb1025
  • 20:21 kgraessle@deploy1003: Finished scap sync-world: Backport for Add experiment code to group by toggle (T397728) (duration: 09m 53s)
  • 20:16 kgraessle@deploy1003: kgraessle, jsn: Continuing with sync
  • 20:13 kgraessle@deploy1003: kgraessle, jsn: Backport for Add experiment code to group by toggle (T397728) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:12 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:12 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt clouddb1025 - vriley@cumin1002"
  • 20:12 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt clouddb1025 - vriley@cumin1002"
  • 20:11 kgraessle@deploy1003: Started scap sync-world: Backport for Add experiment code to group by toggle (T397728)
  • 20:06 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 20:06 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:03 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
  • 19:59 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1037.eqiad.wmnet with reason: host reimage
  • 19:55 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:42 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1037.eqiad.wmnet with OS bookworm
  • 19:41 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:40 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:34 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:33 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:33 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt clouddb1024 - vriley@cumin1002"
  • 19:33 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt clouddb1024 - vriley@cumin1002"
  • 19:28 ejegg: payments-wiki upgraded from 7de1798c to 0ab5bab9
  • 19:27 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 19:26 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host clouddb1024
  • 19:26 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host clouddb1024
  • 19:25 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1023.eqiad.wmnet with OS bookworm
  • 19:25 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 19:22 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 19:07 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1023.eqiad.wmnet with reason: host reimage
  • 19:05 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb1022.eqiad.wmnet with OS bookworm
  • 19:05 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 19:02 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 19:01 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1023.eqiad.wmnet with reason: host reimage
  • 18:51 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2033.codfw.wmnet with OS bookworm
  • 18:50 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1023.eqiad.wmnet with OS bookworm
  • 18:47 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1022.eqiad.wmnet with reason: host reimage
  • 18:41 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1022.eqiad.wmnet with reason: host reimage
  • 18:30 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1022.eqiad.wmnet with OS bookworm
  • 18:23 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2033.codfw.wmnet with reason: host reimage
  • 18:16 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.12 refs T396373
  • 18:16 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2033.codfw.wmnet with reason: host reimage
  • 18:06 brennen: train 1.45.0-wmf.12 status: no current blockers, rolling to group1 using spiderpig
  • 17:58 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host logstash2033
  • 17:58 cwhite@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2033
  • 17:58 cwhite@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2033
  • 17:58 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2033.codfw.wmnet 16.0.192.10.in-addr.arpa 6.1.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:57 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2033.codfw.wmnet 16.0.192.10.in-addr.arpa 6.1.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 17:57 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:57 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host logstash2033 - cwhite@cumin2002"
  • 17:57 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host logstash2033 - cwhite@cumin2002"
  • 17:51 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 17:51 cwhite@cumin2002: START - Cookbook sre.hosts.move-vlan for host logstash2033
  • 17:50 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2033.codfw.wmnet with OS bookworm
  • 17:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 17:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T399728)', diff saved to https://phabricator.wikimedia.org/P80332 and previous config saved to /var/cache/conftool/dbconfig/20250730-174921-fceratto.json
  • 17:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P80331 and previous config saved to /var/cache/conftool/dbconfig/20250730-173413-fceratto.json
  • 17:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P80330 and previous config saved to /var/cache/conftool/dbconfig/20250730-171906-fceratto.json
  • 17:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T399728)', diff saved to https://phabricator.wikimedia.org/P80329 and previous config saved to /var/cache/conftool/dbconfig/20250730-170359-fceratto.json
  • 16:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1252 (T399728)', diff saved to https://phabricator.wikimedia.org/P80328 and previous config saved to /var/cache/conftool/dbconfig/20250730-165853-fceratto.json
  • 16:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1252.eqiad.wmnet with reason: Maintenance
  • 16:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T399728)', diff saved to https://phabricator.wikimedia.org/P80327 and previous config saved to /var/cache/conftool/dbconfig/20250730-165831-fceratto.json
  • 16:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P80326 and previous config saved to /var/cache/conftool/dbconfig/20250730-164323-fceratto.json
  • 16:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P80325 and previous config saved to /var/cache/conftool/dbconfig/20250730-162816-fceratto.json
  • 16:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T399728)', diff saved to https://phabricator.wikimedia.org/P80324 and previous config saved to /var/cache/conftool/dbconfig/20250730-161308-fceratto.json
  • 16:08 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T399728)', diff saved to https://phabricator.wikimedia.org/P80323 and previous config saved to /var/cache/conftool/dbconfig/20250730-160812-fceratto.json
  • 16:08 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 16:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T399728)', diff saved to https://phabricator.wikimedia.org/P80322 and previous config saved to /var/cache/conftool/dbconfig/20250730-160749-fceratto.json
  • 16:01 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddb1022.eqiad.wmnet with OS bookworm
  • 15:55 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P80321 and previous config saved to /var/cache/conftool/dbconfig/20250730-155241-fceratto.json
  • 15:39 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2034.codfw.wmnet with OS bookworm
  • 15:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P80320 and previous config saved to /var/cache/conftool/dbconfig/20250730-153734-fceratto.json
  • 15:28 sbassett: Deployed security mitigation for T400697
  • 15:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T399728)', diff saved to https://phabricator.wikimedia.org/P80319 and previous config saved to /var/cache/conftool/dbconfig/20250730-152226-fceratto.json
  • 15:18 cdanis: all done 💙cdanis@cumin1003.eqiad.wmnet ~ 🕚☕ sudo cumin 'A:cp' 'run-puppet-agent --enable "cdanis deploy I74ada0e T400753"'
  • 15:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T399728)', diff saved to https://phabricator.wikimedia.org/P80318 and previous config saved to /var/cache/conftool/dbconfig/20250730-151821-fceratto.json
  • 15:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 15:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T399728)', diff saved to https://phabricator.wikimedia.org/P80317 and previous config saved to /var/cache/conftool/dbconfig/20250730-151758-fceratto.json
  • 15:14 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2034.codfw.wmnet with reason: host reimage
  • 15:09 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2034.codfw.wmnet with reason: host reimage
  • 15:06 cdanis: begin phase2 💔cdanis@cumin1003.eqiad.wmnet ~ 🕚☕ sudo cumin 'A:cp' 'disable-puppet "cdanis deploy I74ada0e T400753"'
  • 15:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P80316 and previous config saved to /var/cache/conftool/dbconfig/20250730-150250-fceratto.json
  • 14:59 brouberol@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-jumbo-eqiad
  • 14:59 cdanis: phase1 💙cdanis@cumin1003.eqiad.wmnet ~ 🕚☕ sudo cumin 'A:cp' 'run-puppet-agent --enable "cdanis deploy I74ada0e T400753"'
  • 14:50 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host logstash2034
  • 14:50 cwhite@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2034
  • 14:50 cwhite@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2034
  • 14:50 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2034.codfw.wmnet 30.16.192.10.in-addr.arpa 0.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:50 cdanis: 💙cdanis@cumin1003.eqiad.wmnet ~ 🕥☕ sudo cumin 'A:cp' 'disable-puppet "cdanis deploy I74ada0e T400753"'
  • 14:49 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2034.codfw.wmnet 30.16.192.10.in-addr.arpa 0.3.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 14:49 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:49 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host logstash2034 - cwhite@cumin2002"
  • 14:48 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host logstash2034 - cwhite@cumin2002"
  • 14:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P80315 and previous config saved to /var/cache/conftool/dbconfig/20250730-144743-fceratto.json
  • 14:34 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 14:34 cwhite@cumin2002: START - Cookbook sre.hosts.move-vlan for host logstash2034
  • 14:33 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2034.codfw.wmnet with OS bookworm
  • 14:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T399728)', diff saved to https://phabricator.wikimedia.org/P80314 and previous config saved to /var/cache/conftool/dbconfig/20250730-143235-fceratto.json
  • 14:32 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1022.eqiad.wmnet with OS bookworm
  • 14:26 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T399728)', diff saved to https://phabricator.wikimedia.org/P80313 and previous config saved to /var/cache/conftool/dbconfig/20250730-142644-fceratto.json
  • 14:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 14:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T399728)', diff saved to https://phabricator.wikimedia.org/P80311 and previous config saved to /var/cache/conftool/dbconfig/20250730-142314-fceratto.json
  • 14:19 apine@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:19 apine@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:19 apine@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:18 apine@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:18 apine@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:17 apine@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:16 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:16 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:13 apine@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 apine@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:11 apine@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 apine@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:09 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P80310 and previous config saved to /var/cache/conftool/dbconfig/20250730-140806-fceratto.json
  • 13:54 brouberol@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
  • 13:53 brouberol: kafka-jumbo1018 is added to the cluster, puppet ran on all kafka/zookeeper hosts, external-services was updated on dse-k8s-eqiad, codfw and eqiad - T398826
  • 13:53 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P80309 and previous config saved to /var/cache/conftool/dbconfig/20250730-135259-fceratto.json
  • 13:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:52 brouberol@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:52 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:51 brouberol@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:51 brouberol@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T399728)', diff saved to https://phabricator.wikimedia.org/P80308 and previous config saved to /var/cache/conftool/dbconfig/20250730-133751-fceratto.json
  • 13:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T399728)', diff saved to https://phabricator.wikimedia.org/P80307 and previous config saved to /var/cache/conftool/dbconfig/20250730-133253-fceratto.json
  • 13:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 13:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T399728)', diff saved to https://phabricator.wikimedia.org/P80306 and previous config saved to /var/cache/conftool/dbconfig/20250730-133230-fceratto.json
  • 13:30 brouberol@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-jumbo-eqiad
  • 13:26 aokoth@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host vrts1004.eqiad.wmnet
  • 13:26 aokoth@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host vrts1004.eqiad.wmnet with OS bookworm
  • 13:24 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:18 sgimeno@deploy1003: Finished scap sync-world: Backport for Growth: remove conditional user options for get-started-notification, MetricsPlatform: Disable synchronous configs fetching (T398422) (duration: 11m 26s)
  • 13:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P80305 and previous config saved to /var/cache/conftool/dbconfig/20250730-131723-fceratto.json
  • 13:13 aokoth@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on vrts1004.eqiad.wmnet with reason: host reimage
  • 13:12 sgimeno@deploy1003: sgimeno, phuedx: Continuing with sync
  • 13:09 sgimeno@deploy1003: sgimeno, phuedx: Backport for Growth: remove conditional user options for get-started-notification, MetricsPlatform: Disable synchronous configs fetching (T398422) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:07 aokoth@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on vrts1004.eqiad.wmnet with reason: host reimage
  • 13:06 sgimeno@deploy1003: Started scap sync-world: Backport for Growth: remove conditional user options for get-started-notification, MetricsPlatform: Disable synchronous configs fetching (T398422)
  • 13:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P80304 and previous config saved to /var/cache/conftool/dbconfig/20250730-130215-fceratto.json
  • 12:59 aokoth@cumin1003: START - Cookbook sre.hosts.reimage for host vrts1004.eqiad.wmnet with OS bookworm
  • 12:57 aokoth@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM vrts1004.eqiad.wmnet - aokoth@cumin1003"
  • 12:57 aokoth@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM vrts1004.eqiad.wmnet - aokoth@cumin1003"
  • 12:57 aokoth@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) vrts1004.eqiad.wmnet on all recursors
  • 12:57 aokoth@cumin1003: START - Cookbook sre.dns.wipe-cache vrts1004.eqiad.wmnet on all recursors
  • 12:57 aokoth@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:57 aokoth@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1004.eqiad.wmnet - aokoth@cumin1003"
  • 12:57 aokoth@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM vrts1004.eqiad.wmnet - aokoth@cumin1003"
  • 12:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
  • 12:53 aokoth@cumin1003: START - Cookbook sre.dns.netbox
  • 12:52 aokoth@cumin1003: START - Cookbook sre.ganeti.makevm for new host vrts1004.eqiad.wmnet
  • 12:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T399728)', diff saved to https://phabricator.wikimedia.org/P80303 and previous config saved to /var/cache/conftool/dbconfig/20250730-124708-fceratto.json
  • 12:42 ladsgroup@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
  • 12:42 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitarium_restart (exit_code=99)
  • 12:42 ladsgroup@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
  • 12:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T399728)', diff saved to https://phabricator.wikimedia.org/P80302 and previous config saved to /var/cache/conftool/dbconfig/20250730-124159-fceratto.json
  • 12:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 12:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T399728)', diff saved to https://phabricator.wikimedia.org/P80301 and previous config saved to /var/cache/conftool/dbconfig/20250730-124137-fceratto.json
  • 12:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [Growth] Remove support code for Surfacing Structured Tasks experiment (T397515), [Growth] Remove feature flags related to Surfacing Structured Tasks (T397515) (duration: 09m 06s)
  • 12:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P80300 and previous config saved to /var/cache/conftool/dbconfig/20250730-122629-fceratto.json
  • 12:24 urbanecm@deploy1003: urbanecm: Continuing with sync
  • 12:22 brouberol@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
  • 12:22 urbanecm@deploy1003: urbanecm: Backport for [Growth] Remove support code for Surfacing Structured Tasks experiment (T397515), [Growth] Remove feature flags related to Surfacing Structured Tasks (T397515) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:20 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] Remove support code for Surfacing Structured Tasks experiment (T397515), [Growth] Remove feature flags related to Surfacing Structured Tasks (T397515)
  • 12:19 brouberol: kafka-jumbo1017 is added to the cluster, puppet ran on all kafka/zookeeper hosts, external-services was updated on dse-k8s-eqiad, codfw and eqiad - T398826
  • 12:18 brouberol@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:17 brouberol@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:16 brouberol@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:16 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:12 ladsgroup@deploy1003: Finished scap sync-world: Backport for DropUnusedTables: add --dry-run option (T395928), DropUnusedTables: add --dry-run option (T395928) (duration: 08m 20s)
  • 12:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P80299 and previous config saved to /var/cache/conftool/dbconfig/20250730-121122-fceratto.json
  • 12:06 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 12:05 ladsgroup@deploy1003: ladsgroup: Backport for DropUnusedTables: add --dry-run option (T395928), DropUnusedTables: add --dry-run option (T395928) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:03 ladsgroup@deploy1003: Started scap sync-world: Backport for DropUnusedTables: add --dry-run option (T395928), DropUnusedTables: add --dry-run option (T395928)
  • 11:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 11:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 11:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T399728)', diff saved to https://phabricator.wikimedia.org/P80298 and previous config saved to /var/cache/conftool/dbconfig/20250730-115614-fceratto.json
  • 11:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
  • 11:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
  • 11:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
  • 11:53 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
  • 11:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 11:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T399728)', diff saved to https://phabricator.wikimedia.org/P80297 and previous config saved to /var/cache/conftool/dbconfig/20250730-115112-fceratto.json
  • 11:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 11:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 11:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T399728)', diff saved to https://phabricator.wikimedia.org/P80296 and previous config saved to /var/cache/conftool/dbconfig/20250730-115049-fceratto.json
  • 11:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 11:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 11:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 11:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 11:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 11:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 11:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 11:40 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P80295 and previous config saved to /var/cache/conftool/dbconfig/20250730-113541-fceratto.json
  • 11:34 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:34 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:33 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:33 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:32 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:32 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:21 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P80294 and previous config saved to /var/cache/conftool/dbconfig/20250730-112034-fceratto.json
  • 11:19 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:16 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 11:15 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 11:07 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:07 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T399728)', diff saved to https://phabricator.wikimedia.org/P80293 and previous config saved to /var/cache/conftool/dbconfig/20250730-110526-fceratto.json
  • 11:05 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:04 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:01 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:01 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T399728)', diff saved to https://phabricator.wikimedia.org/P80292 and previous config saved to /var/cache/conftool/dbconfig/20250730-105926-fceratto.json
  • 10:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T399728)', diff saved to https://phabricator.wikimedia.org/P80291 and previous config saved to /var/cache/conftool/dbconfig/20250730-105904-fceratto.json
  • 10:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P80290 and previous config saved to /var/cache/conftool/dbconfig/20250730-104357-fceratto.json
  • 10:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P80289 and previous config saved to /var/cache/conftool/dbconfig/20250730-102850-fceratto.json
  • 10:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T399728)', diff saved to https://phabricator.wikimedia.org/P80288 and previous config saved to /var/cache/conftool/dbconfig/20250730-101343-fceratto.json
  • 10:09 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T399728)', diff saved to https://phabricator.wikimedia.org/P80287 and previous config saved to /var/cache/conftool/dbconfig/20250730-100934-fceratto.json
  • 10:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 10:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T399728)', diff saved to https://phabricator.wikimedia.org/P80286 and previous config saved to /var/cache/conftool/dbconfig/20250730-100854-fceratto.json
  • 09:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P80285 and previous config saved to /var/cache/conftool/dbconfig/20250730-095346-fceratto.json
  • 09:53 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db[1204-1205].eqiad.wmnet
  • 09:53 jynus@cumin1003: START - Cookbook sre.hosts.remove-downtime for db[1204-1205].eqiad.wmnet
  • 09:44 gkyziridis@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:43 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:43 gkyziridis@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P80284 and previous config saved to /var/cache/conftool/dbconfig/20250730-093839-fceratto.json
  • 09:37 gkyziridis@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 09:30 gkyziridis@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T399728)', diff saved to https://phabricator.wikimedia.org/P80283 and previous config saved to /var/cache/conftool/dbconfig/20250730-092332-fceratto.json
  • 09:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T399728)', diff saved to https://phabricator.wikimedia.org/P80282 and previous config saved to /var/cache/conftool/dbconfig/20250730-091829-fceratto.json
  • 09:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 09:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T399728)', diff saved to https://phabricator.wikimedia.org/P80281 and previous config saved to /var/cache/conftool/dbconfig/20250730-091817-fceratto.json
  • 09:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2010.codfw.wmnet with OS bookworm
  • 09:16 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 09:16 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 09:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:10 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[1204-1205].eqiad.wmnet with reason: upgrade mariadb
  • 09:03 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
  • 09:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P80280 and previous config saved to /var/cache/conftool/dbconfig/20250730-090309-fceratto.json
  • 08:59 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db[2183-2184].codfw.wmnet
  • 08:59 jynus@cumin1003: START - Cookbook sre.hosts.remove-downtime for db[2183-2184].codfw.wmnet
  • 08:59 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2010.codfw.wmnet with reason: host reimage
  • 08:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P80279 and previous config saved to /var/cache/conftool/dbconfig/20250730-084800-fceratto.json
  • 08:38 gkyziridis@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:38 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2184.codfw.wmnet with reason: replication will stop
  • 08:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2183.codfw.wmnet with reason: upgrade mariadb
  • 08:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 08:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T399728)', diff saved to https://phabricator.wikimedia.org/P80278 and previous config saved to /var/cache/conftool/dbconfig/20250730-083252-fceratto.json
  • 08:28 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:28 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T399728)', diff saved to https://phabricator.wikimedia.org/P80276 and previous config saved to /var/cache/conftool/dbconfig/20250730-082758-fceratto.json
  • 08:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 08:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T399728)', diff saved to https://phabricator.wikimedia.org/P80275 and previous config saved to /var/cache/conftool/dbconfig/20250730-082735-fceratto.json
  • 08:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P80274 and previous config saved to /var/cache/conftool/dbconfig/20250730-081228-fceratto.json
  • 08:09 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2010.codfw.wmnet with OS bookworm
  • 08:05 mlitn@deploy1003: Finished scap sync-world: Backport for Add new MediaSearch config/coefficients (T385286) (duration: 09m 42s)
  • 08:03 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 08:03 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab2002.wikimedia.org to gitlab1004.wikimedia.org
  • 08:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:00 mlitn@deploy1003: mlitn: Continuing with sync
  • 07:58 mlitn@deploy1003: mlitn: Backport for Add new MediaSearch config/coefficients (T385286) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P80273 and previous config saved to /var/cache/conftool/dbconfig/20250730-075720-fceratto.json
  • 07:56 jelto@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'https://gitlab.wikimedia.org/ https://gitlab-replica-b.wikimedia.org/' on all recursors
  • 07:56 jelto@cumin1003: START - Cookbook sre.dns.wipe-cache 'https://gitlab.wikimedia.org/ https://gitlab-replica-b.wikimedia.org/' on all recursors
  • 07:56 mlitn@deploy1003: Started scap sync-world: Backport for Add new MediaSearch config/coefficients (T385286)
  • 07:53 jelto@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'https://gitlab.wikimedia.org/ https://gitlab-replica-b.wikimedia.org/' on all recursors
  • 07:53 jelto@cumin1003: START - Cookbook sre.dns.wipe-cache 'https://gitlab.wikimedia.org/ https://gitlab-replica-b.wikimedia.org/' on all recursors
  • 07:51 jelto@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'https://gitlab.wikimedia.org/ https://gitlab-replica-b.wikimedia.org/' on all recursors
  • 07:51 jelto@cumin1003: START - Cookbook sre.dns.wipe-cache 'https://gitlab.wikimedia.org/ https://gitlab-replica-b.wikimedia.org/' on all recursors
  • 07:50 jelto@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'https://gitlab.wikimedia.org/ https://gitlab-replica-b.wikimedia.org/' on all recursors
  • 07:50 jelto@cumin1003: START - Cookbook sre.dns.wipe-cache 'https://gitlab.wikimedia.org/ https://gitlab-replica-b.wikimedia.org/' on all recursors
  • 07:50 jelto@dns1004: END - running authdns-update
  • 07:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:49 jelto@dns1004: START - running authdns-update
  • 07:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T399728)', diff saved to https://phabricator.wikimedia.org/P80272 and previous config saved to /var/cache/conftool/dbconfig/20250730-074213-fceratto.json
  • 07:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T399728)', diff saved to https://phabricator.wikimedia.org/P80271 and previous config saved to /var/cache/conftool/dbconfig/20250730-073517-fceratto.json
  • 07:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 07:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 06:37 jelto@cumin1003: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab2002.wikimedia.org to gitlab1004.wikimedia.org
  • 01:11 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 10m 52s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-07-29

  • 23:10 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2035.codfw.wmnet with OS bookworm
  • 22:48 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2035.codfw.wmnet with reason: host reimage
  • 22:42 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2035.codfw.wmnet with reason: host reimage
  • 22:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20250714/ using stat1009.eqiad.wmnet)
  • 22:23 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host logstash2035
  • 22:23 cwhite@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2035
  • 22:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 22:15 kemayo@deploy1003: Finished scap sync-world: Backport for Enable DiscussionTools thanks on existing "report incident" wikis (T366095) (duration: 12m 28s)
  • 22:15 cwhite@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2035
  • 22:15 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2035.codfw.wmnet 28.32.192.10.in-addr.arpa 8.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 22:15 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2035.codfw.wmnet 28.32.192.10.in-addr.arpa 8.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 22:15 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:15 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host logstash2035 - cwhite@cumin2002"
  • 22:15 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host logstash2035 - cwhite@cumin2002"
  • 22:14 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20250714/ using stat1009.eqiad.wmnet)
  • 22:10 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 22:10 cwhite@cumin2002: START - Cookbook sre.hosts.move-vlan for host logstash2035
  • 22:10 kemayo@deploy1003: kemayo: Continuing with sync
  • 22:09 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2035.codfw.wmnet with OS bookworm
  • 22:05 kemayo@deploy1003: kemayo: Backport for Enable DiscussionTools thanks on existing "report incident" wikis (T366095) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:03 kemayo@deploy1003: Started scap sync-world: Backport for Enable DiscussionTools thanks on existing "report incident" wikis (T366095)
  • 21:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2091.codfw.wmnet with reason: host reimage
  • 21:51 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2091.codfw.wmnet with reason: host reimage
  • 21:34 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 21:19 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20250714/ using stat1009.eqiad.wmnet)
  • 21:16 ryankemper@cumin1002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
  • 21:09 ryankemper@cumin1002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
  • 21:03 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 00m 57s)
  • 21:02 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 20:42 cdanis@deploy1003: Finished scap sync-world: Backport for probenet: Report CDN host handling each measure request (T398596) (duration: 10m 27s)
  • 20:42 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 20:41 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 20:37 cdanis@deploy1003: cdanis: Continuing with sync
  • 20:34 cdanis@deploy1003: cdanis: Backport for probenet: Report CDN host handling each measure request (T398596) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:32 cdanis@deploy1003: Started scap sync-world: Backport for probenet: Report CDN host handling each measure request (T398596)
  • 20:16 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 20:16 urbanecm@deploy1003: Finished scap sync-world: Backport for Use FallbackContentHandler for another undeployed content handler (T124748), Simplify $wgContactConfig required checkboxes validation (duration: 11m 43s)
  • 20:12 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1023.eqiad.wmnet with OS bookworm
  • 20:11 urbanecm@deploy1003: matmarex, urbanecm: Continuing with sync
  • 20:08 urbanecm: Run `mwscript-k8s --file=users.txt --follow -- extensions/CentralAuth/maintenance/attachAccount.php --wiki=aawiki --userlist users.txt` (T396091; users.txt is `Inverted Pages`)
  • 20:08 urbanecm: Run `mwscript-k8s --follow -- extensions/CentralAuth/maintenance/migrateAccount.php --wiki=aawiki --username='Inverted Pages' --auto` (T396091)
  • 20:07 urbanecm@deploy1003: matmarex, urbanecm: Backport for Use FallbackContentHandler for another undeployed content handler (T124748), Simplify $wgContactConfig required checkboxes validation synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:04 urbanecm@deploy1003: Started scap sync-world: Backport for Use FallbackContentHandler for another undeployed content handler (T124748), Simplify $wgContactConfig required checkboxes validation
  • 20:03 urbanecm: Run mwscript-k8s --comment="T400618" --follow -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=dewiki --logwiki=metawiki 'Editor Socks' 'Socks'
  • 20:01 dzahn@dns1004: END - running authdns-update
  • 19:59 dzahn@dns1004: START - running authdns-update
  • 19:53 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddb1022.eqiad.wmnet with OS bookworm
  • 19:40 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:31 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:05 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:59 andrew@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1042.eqiad.wmnet']
  • 18:59 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1042.eqiad.wmnet']
  • 18:59 andrew@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1042.eqiad.wmnet']
  • 18:59 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1042.eqiad.wmnet']
  • 18:58 andrew@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1042']
  • 18:57 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1042']
  • 18:57 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1042']
  • 18:57 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 18:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:44 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host clouddb1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:38 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 18:21 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:21 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for nokia switches eqiad - cmooney@cumin1003"
  • 18:21 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for nokia switches eqiad - cmooney@cumin1003"
  • 18:18 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 18:13 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.12 refs T396373
  • 18:11 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:08 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 18:03 brennen: train 1.45.0-wmf.12 status: no current blockers, rolling to group0 using spiderpig
  • 18:00 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:59 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host clouddb1023
  • 17:59 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:58 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host clouddb1023
  • 17:58 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:58 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt clouddb1023 - vriley@cumin1002"
  • 17:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt clouddb1023 - vriley@cumin1002"
  • 17:56 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 17:51 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 17:51 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 17:49 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1032.eqiad.wmnet
  • 17:45 dancy@deploy1003: Installation of scap version "4.193.0" completed for 2 hosts
  • 17:43 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1022.eqiad.wmnet with OS bookworm
  • 17:43 dancy@deploy1003: Installing scap version "4.193.0" for 2 host(s)
  • 17:42 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 16:56 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 16:53 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 16:40 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2010.codfw.wmnet with OS bookworm
  • 16:25 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:18 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 16:17 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 16:17 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix haproxy no header condition - oblivian@cumin1003"
  • 16:17 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix haproxy no header condition - oblivian@cumin1003
  • 16:16 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix haproxy no header condition - oblivian@cumin1003
  • 16:16 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix haproxy no header condition - oblivian@cumin1003"
  • 16:11 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ff
  • 16:11 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ff
  • 16:11 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.fe
  • 16:10 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.fe
  • 16:10 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.fd
  • 16:10 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.fd
  • 16:10 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.fc
  • 16:09 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.fc
  • 16:09 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.fb
  • 16:09 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.fb
  • 16:09 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.fa
  • 16:08 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.fa
  • 16:08 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.f9
  • 16:07 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.f9
  • 16:07 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.f8
  • 16:07 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.f8
  • 16:07 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.f7
  • 16:06 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.f7
  • 16:06 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.f6
  • 16:06 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.f6
  • 16:06 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.f5
  • 16:05 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.f5
  • 16:05 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.f4
  • 16:05 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.f4
  • 16:05 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.f3
  • 16:04 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.f3
  • 16:04 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.f2
  • 16:04 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.f2
  • 16:03 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.f1
  • 16:03 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.f1
  • 16:03 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.f0
  • 16:03 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.f0
  • 16:02 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ef
  • 16:02 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ef
  • 16:02 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ee
  • 16:01 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ee
  • 16:01 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ed
  • 16:01 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ed
  • 16:01 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ec
  • 16:00 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ec
  • 16:00 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.eb
  • 15:59 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.eb
  • 15:59 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ea
  • 15:59 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ea
  • 15:59 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.e9
  • 15:58 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.e9
  • 15:58 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.e8
  • 15:58 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.e8
  • 15:58 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.e7
  • 15:57 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.e7
  • 15:57 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.e6
  • 15:57 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.e6
  • 15:57 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.e5
  • 15:56 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.e5
  • 15:56 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.e4
  • 15:55 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.e4
  • 15:55 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.e3
  • 15:55 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.e3
  • 15:55 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.e2
  • 15:54 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.e2
  • 15:54 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.e1
  • 15:54 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.e1
  • 15:54 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.e0
  • 15:53 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.e0
  • 15:53 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.df
  • 15:53 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.df
  • 15:53 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.de
  • 15:52 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.de
  • 15:52 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.dd
  • 15:51 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.dd
  • 15:51 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.dc
  • 15:51 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.dc
  • 15:51 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.db
  • 15:50 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.db
  • 15:50 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.da
  • 15:49 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.da
  • 15:49 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.d9
  • 15:49 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.d9
  • 15:49 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.d8
  • 15:48 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.d8
  • 15:48 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.d7
  • 15:47 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.d7
  • 15:47 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.d6
  • 15:47 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.d6
  • 15:47 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.d5
  • 15:46 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.d5
  • 15:46 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.d4
  • 15:46 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.d4
  • 15:46 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.d3
  • 15:45 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.d3
  • 15:45 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.d2
  • 15:44 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.d2
  • 15:44 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.d1
  • 15:44 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.d1
  • 15:44 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.d0
  • 15:43 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.d0
  • 15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.cf
  • 15:43 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cf
  • 15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ce
  • 15:42 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ce
  • 15:42 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.cd
  • 15:41 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cd
  • 15:41 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.cc
  • 15:41 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cc
  • 15:41 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.cb
  • 15:40 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cb
  • 15:40 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ca
  • 15:39 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ca
  • 15:39 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.c9
  • 15:39 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.c9
  • 15:39 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.c8
  • 15:38 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.c8
  • 15:38 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.c7
  • 15:38 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.c7
  • 15:38 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.c6
  • 15:37 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.c6
  • 15:37 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.c5
  • 15:37 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.c5
  • 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.c4
  • 15:36 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.c4
  • 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.c3
  • 15:35 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.c3
  • 15:35 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.c2
  • 15:35 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.c2
  • 15:35 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.c1
  • 15:34 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.c1
  • 15:34 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.c0
  • 15:33 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.c0
  • 15:33 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.bf
  • 15:33 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.bf
  • 15:33 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.be
  • 15:32 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.be
  • 15:32 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.bd
  • 15:31 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.bd
  • 15:31 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.bc
  • 15:31 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.bc
  • 15:31 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.bb
  • 15:30 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.bb
  • 15:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ba
  • 15:30 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ba
  • 15:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.b9
  • 15:29 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.b9
  • 15:29 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.b8
  • 15:28 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.b8
  • 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.b7
  • 15:28 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.b7
  • 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.b6
  • 15:27 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.b6
  • 15:27 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.b5
  • 15:27 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.b5
  • 15:27 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.b4
  • 15:26 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.b4
  • 15:26 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.b3
  • 15:25 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.b3
  • 15:25 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.b2
  • 15:25 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.b2
  • 15:25 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.b1
  • 15:24 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.b1
  • 15:24 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.b0
  • 15:23 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.b0
  • 15:23 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.af
  • 15:23 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.af
  • 15:23 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ae
  • 15:22 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ae
  • 15:22 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ad
  • 15:21 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ad
  • 15:21 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ac
  • 15:21 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ac
  • 15:21 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ab
  • 15:20 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ab
  • 15:20 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.aa
  • 15:19 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.aa
  • 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.a9
  • 15:19 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.a9
  • 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.a8
  • 15:18 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.a8
  • 15:18 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.a7
  • 15:18 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.a7
  • 15:18 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.a6
  • 15:17 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.a6
  • 15:17 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.a5
  • 15:16 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.a5
  • 15:16 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.a4
  • 15:16 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.a4
  • 15:16 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.a3
  • 15:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:15 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.a3
  • 15:15 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.a2
  • 15:14 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.a2
  • 15:14 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.a1
  • 15:14 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.a1
  • 15:14 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.a0
  • 15:13 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.a0
  • 15:13 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.9f
  • 15:12 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.9f
  • 15:12 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.9e
  • 15:12 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.9e
  • 15:12 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.9d
  • 15:11 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.9d
  • 15:11 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.9c
  • 15:11 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.9c
  • 15:11 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.9b
  • 15:10 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.9b
  • 15:10 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.9a
  • 15:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T399249)', diff saved to https://phabricator.wikimedia.org/P80267 and previous config saved to /var/cache/conftool/dbconfig/20250729-151015-marostegui.json
  • 15:09 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.9a
  • 15:09 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.99
  • 15:09 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.99
  • 15:09 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.98
  • 15:08 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.98
  • 15:08 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.97
  • 15:08 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.97
  • 15:07 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.96
  • 15:07 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.96
  • 15:07 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.95
  • 15:06 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.95
  • 15:06 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.94
  • 15:06 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.94
  • 15:06 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.93
  • 15:05 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.93
  • 15:05 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.92
  • 15:05 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.92
  • 15:05 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.91
  • 15:04 brennen@deploy1003: Finished deploy [phabricator/deployment@1df7631]: deploy phab1004 for T400718 (duration: 00m 50s)
  • 15:04 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.91
  • 15:04 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.90
  • 15:04 brennen@deploy1003: Started deploy [phabricator/deployment@1df7631]: deploy phab1004 for T400718
  • 15:03 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.90
  • 15:03 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.8f
  • 15:03 brennen@deploy1003: Finished deploy [phabricator/deployment@1df7631]: test deploy phab2002 for T400718 (duration: 00m 42s)
  • 15:03 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.8f
  • 15:03 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.8e
  • 15:03 jelto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1005.eqiad.wmnet with reason: Phabricator deploy
  • 15:02 brennen@deploy1003: Started deploy [phabricator/deployment@1df7631]: test deploy phab2002 for T400718
  • 15:02 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.8e
  • 15:02 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.8d
  • 15:02 jelto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator deploy
  • 15:02 jelto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator deploy
  • 15:02 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.8d
  • 15:02 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.8c
  • 15:01 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.8c
  • 15:01 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.8b
  • 15:00 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.8b
  • 15:00 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.8a
  • 15:00 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.8a
  • 15:00 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.89
  • 14:59 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.89
  • 14:59 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.88
  • 14:59 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.88
  • 14:58 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.87
  • 14:58 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.87
  • 14:58 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.86
  • 14:57 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.86
  • 14:57 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.85
  • 14:57 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.85
  • 14:57 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.84
  • 14:57 fabfur: haproxykafka updated to 0.3.12 on A:cp (T400199)
  • 14:56 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.84
  • 14:56 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.83
  • 14:56 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.83
  • 14:56 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.82
  • 14:55 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.82
  • 14:55 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.81
  • 14:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P80266 and previous config saved to /var/cache/conftool/dbconfig/20250729-145507-marostegui.json
  • 14:54 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.81
  • 14:54 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.80
  • 14:54 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.80
  • 14:54 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.7f
  • 14:53 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.7f
  • 14:53 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.7e
  • 14:53 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.7e
  • 14:53 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.7d
  • 14:52 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.7d
  • 14:52 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.7c
  • 14:51 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.7c
  • 14:51 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.7b
  • 14:51 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.7b
  • 14:51 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.7a
  • 14:50 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.7a
  • 14:50 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.79
  • 14:50 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.79
  • 14:49 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.78
  • 14:49 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.78
  • 14:49 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.77
  • 14:48 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.77
  • 14:48 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.76
  • 14:48 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.76
  • 14:48 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.75
  • 14:47 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.75
  • 14:47 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.74
  • 14:47 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.74
  • 14:46 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.73
  • 14:46 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.73
  • 14:46 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.72
  • 14:45 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.72
  • 14:45 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.71
  • 14:45 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.71
  • 14:45 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.70
  • 14:44 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.70
  • 14:44 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.6f
  • 14:43 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.6f
  • 14:43 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.6e
  • 14:43 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.6e
  • 14:43 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.6d
  • 14:42 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.6d
  • 14:42 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.6c
  • 14:42 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.6c
  • 14:42 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.6b
  • 14:41 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.6b
  • 14:41 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.6a
  • 14:40 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.6a
  • 14:40 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.69
  • 14:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:40 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.69
  • 14:40 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.68
  • 14:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259 (T399728)', diff saved to https://phabricator.wikimedia.org/P80265 and previous config saved to /var/cache/conftool/dbconfig/20250729-144008-fceratto.json
  • 14:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P80264 and previous config saved to /var/cache/conftool/dbconfig/20250729-144000-marostegui.json
  • 14:39 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.68
  • 14:39 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.67
  • 14:39 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.67
  • 14:38 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.66
  • 14:38 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.66
  • 14:38 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.65
  • 14:37 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.65
  • 14:37 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.64
  • 14:37 zabe@deploy1003: Finished scap sync-world: Backport for Stop writing to cl_to and cl_collation on testwiki (T399579) (duration: 11m 12s)
  • 14:37 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.64
  • 14:37 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.63
  • 14:36 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.63
  • 14:36 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.62
  • 14:35 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.62
  • 14:35 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.61
  • 14:35 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.61
  • 14:35 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.60
  • 14:34 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.60
  • 14:34 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1173950 and upgrading haproxykafka to 0.3.12 on A:cp (T400199)
  • 14:34 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.5f
  • 14:34 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.5f
  • 14:34 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.5e
  • 14:33 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.5e
  • 14:33 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.5d
  • 14:33 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.5d
  • 14:32 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.5c
  • 14:32 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.5c
  • 14:32 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.5b
  • 14:31 zabe@deploy1003: zabe: Continuing with sync
  • 14:31 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.5b
  • 14:31 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.5a
  • 14:31 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.5a
  • 14:31 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.59
  • 14:30 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.59
  • 14:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.58
  • 14:29 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.58
  • 14:29 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.57
  • 14:29 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.57
  • 14:29 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.56
  • 14:28 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.56
  • 14:28 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.55
  • 14:28 zabe@deploy1003: zabe: Backport for Stop writing to cl_to and cl_collation on testwiki (T399579) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:28 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.55
  • 14:28 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.54
  • 14:27 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.54
  • 14:27 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.53
  • 14:26 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.53
  • 14:26 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.52
  • 14:26 zabe@deploy1003: Started scap sync-world: Backport for Stop writing to cl_to and cl_collation on testwiki (T399579)
  • 14:26 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.52
  • 14:26 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.51
  • 14:25 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.51
  • 14:25 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.50
  • 14:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P80263 and previous config saved to /var/cache/conftool/dbconfig/20250729-142500-fceratto.json
  • 14:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T399249)', diff saved to https://phabricator.wikimedia.org/P80262 and previous config saved to /var/cache/conftool/dbconfig/20250729-142452-marostegui.json
  • 14:24 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.50
  • 14:24 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.4f
  • 14:24 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.4f
  • 14:24 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.4e
  • 14:23 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.4e
  • 14:23 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.4d
  • 14:23 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.4d
  • 14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.4c
  • 14:22 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.4c
  • 14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.4b
  • 14:21 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.4b
  • 14:21 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.4a
  • 14:21 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.4a
  • 14:21 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.49
  • 14:20 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.49
  • 14:20 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.48
  • 14:19 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.48
  • 14:19 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.47
  • 14:19 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.47
  • 14:19 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.46
  • 14:18 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.46
  • 14:18 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.45
  • 14:17 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.45
  • 14:17 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.44
  • 14:17 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.44
  • 14:17 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.43
  • 14:16 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.43
  • 14:16 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.42
  • 14:16 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.42
  • 14:16 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.41
  • 14:15 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.41
  • 14:15 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.40
  • 14:14 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.40
  • 14:14 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.3f
  • 14:14 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.3f
  • 14:14 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.3e
  • 14:13 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.3e
  • 14:13 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.3d
  • 14:12 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.3d
  • 14:12 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.3c
  • 14:12 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.3c
  • 14:12 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.3b
  • 14:11 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.3b
  • 14:11 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.3a
  • 14:11 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.3a
  • 14:11 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.39
  • 14:10 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.39
  • 14:10 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.38
  • 14:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P80261 and previous config saved to /var/cache/conftool/dbconfig/20250729-140953-fceratto.json
  • 14:09 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.38
  • 14:09 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.37
  • 14:09 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.37
  • 14:09 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.36
  • 14:08 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.36
  • 14:08 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.35
  • 14:08 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.35
  • 14:07 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.34
  • 14:07 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.34
  • 14:07 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.33
  • 14:06 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.33
  • 14:06 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.32
  • 14:06 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.32
  • 14:06 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.31
  • 14:05 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.31
  • 14:05 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.30
  • 14:04 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.30
  • 14:04 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.2f
  • 14:04 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.2f
  • 14:04 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.2e
  • 14:03 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.2e
  • 14:03 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.2d
  • 14:03 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.2d
  • 14:03 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.2c
  • 14:02 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.2c
  • 14:02 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.2b
  • 14:01 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.2b
  • 14:01 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.2a
  • 14:01 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.2a
  • 14:01 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.29
  • 14:00 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.29
  • 14:00 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.28
  • 14:00 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.28
  • 14:00 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.27
  • 13:59 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.27
  • 13:59 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.26
  • 13:58 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.26
  • 13:58 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.25
  • 13:58 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.25
  • 13:58 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.24
  • 13:57 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.24
  • 13:57 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.23
  • 13:57 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.23
  • 13:57 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.22
  • 13:56 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.22
  • 13:56 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.21
  • 13:55 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.21
  • 13:55 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.20
  • 13:55 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.20
  • 13:55 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.1f
  • 13:55 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259 (T399728)', diff saved to https://phabricator.wikimedia.org/P80260 and previous config saved to /var/cache/conftool/dbconfig/20250729-135445-fceratto.json
  • 13:54 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.1f
  • 13:54 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.1e
  • 13:53 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.1e
  • 13:53 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.1d
  • 13:53 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.1d
  • 13:53 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.1c
  • 13:52 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.1c
  • 13:52 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.1b
  • 13:52 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.1b
  • 13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1259 (T399728)', diff saved to https://phabricator.wikimedia.org/P80259 and previous config saved to /var/cache/conftool/dbconfig/20250729-135154-fceratto.json
  • 13:51 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.1a
  • 13:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1259.eqiad.wmnet with reason: Maintenance
  • 13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T399728)', diff saved to https://phabricator.wikimedia.org/P80258 and previous config saved to /var/cache/conftool/dbconfig/20250729-135131-fceratto.json
  • 13:51 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.1a
  • 13:51 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.19
  • 13:50 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.19
  • 13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.18
  • 13:50 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.18
  • 13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.17
  • 13:49 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.17
  • 13:49 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.16
  • 13:49 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:49 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Enable new mobile search experience everywhere (not including empty search recommendations) (T380515) (duration: 11m 45s)
  • 13:48 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.16
  • 13:48 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.15
  • 13:48 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.15
  • 13:48 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.14
  • 13:47 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.14
  • 13:47 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.13
  • 13:47 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.13
  • 13:47 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.12
  • 13:46 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.12
  • 13:46 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.11
  • 13:45 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.11
  • 13:45 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.10
  • 13:45 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.10
  • 13:45 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.0f
  • 13:44 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.0f
  • 13:44 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.0e
  • 13:44 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.0e
  • 13:43 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.0d
  • 13:43 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, bwang: Continuing with sync
  • 13:43 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.0d
  • 13:43 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.0c
  • 13:42 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.0c
  • 13:42 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.0b
  • 13:42 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.0b
  • 13:42 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.0a
  • 13:41 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.0a
  • 13:41 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.09
  • 13:41 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.09
  • 13:40 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.08
  • 13:40 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.08
  • 13:40 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.07
  • 13:39 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.07
  • 13:39 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.06
  • 13:39 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, bwang: Backport for Enable new mobile search experience everywhere (not including empty search recommendations) (T380515) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:39 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.06
  • 13:39 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.05
  • 13:38 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.05
  • 13:38 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.04
  • 13:38 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.04
  • 13:38 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.03
  • 13:37 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.03
  • 13:37 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.02
  • 13:37 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Enable new mobile search experience everywhere (not including empty search recommendations) (T380515)
  • 13:37 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.02
  • 13:37 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.01
  • 13:37 Emperor: check container dbs for all commons original containers T400700
  • 13:37 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.01
  • 13:37 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.00
  • 13:36 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.00
  • 13:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P80257 and previous config saved to /var/cache/conftool/dbconfig/20250729-133623-fceratto.json
  • 13:31 Lucas_WMDE: lucaswerkmeister-wmde@deploy1003 ~ $ mwscript-k8s --follow --comment=T400644 -- namespaceDupes mkwikibooks --fix --add-prefix=T400644/ | tee T40064-2
  • 13:29 Lucas_WMDE: lucaswerkmeister-wmde@deploy1003 ~ $ mwscript-k8s --follow --comment=T400644 -- namespaceDupes mkwikibooks --fix | tee T400644
  • 13:26 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.04
  • 13:26 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.04
  • 13:26 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.03
  • 13:25 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.03
  • 13:25 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.02
  • 13:25 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Localize mk.wikibooks sitename and metanamespace (T400644) (duration: 09m 14s)
  • 13:24 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.02
  • 13:24 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.01
  • 13:24 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.01
  • 13:22 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P80254 and previous config saved to /var/cache/conftool/dbconfig/20250729-132116-fceratto.json
  • 13:20 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, aleksandar: Continuing with sync
  • 13:18 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, aleksandar: Backport for Localize mk.wikibooks sitename and metanamespace (T400644) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:18 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.00
  • 13:17 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.00
  • 13:16 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Localize mk.wikibooks sitename and metanamespace (T400644)
  • 13:12 dani@deploy1003: Finished scap sync-world: Backport for Undeploy Readers Use Cases Survey v2 (T399736) (duration: 09m 19s)
  • 13:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T399249)', diff saved to https://phabricator.wikimedia.org/P80252 and previous config saved to /var/cache/conftool/dbconfig/20250729-130936-marostegui.json
  • 13:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 13:08 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-thumb.6b
  • 13:06 dani@deploy1003: dani: Continuing with sync
  • 13:06 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T399728)', diff saved to https://phabricator.wikimedia.org/P80251 and previous config saved to /var/cache/conftool/dbconfig/20250729-130608-fceratto.json
  • 13:05 dani@deploy1003: dani: Backport for Undeploy Readers Use Cases Survey v2 (T399736) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1254 (T399728)', diff saved to https://phabricator.wikimedia.org/P80250 and previous config saved to /var/cache/conftool/dbconfig/20250729-130316-fceratto.json
  • 13:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1254.eqiad.wmnet with reason: Maintenance
  • 13:03 dani@deploy1003: Started scap sync-world: Backport for Undeploy Readers Use Cases Survey v2 (T399736)
  • 13:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 13:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T399728)', diff saved to https://phabricator.wikimedia.org/P80249 and previous config saved to /var/cache/conftool/dbconfig/20250729-130157-fceratto.json
  • 12:55 mvernon@cumin2002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-thumb.6b
  • 12:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P80248 and previous config saved to /var/cache/conftool/dbconfig/20250729-124650-fceratto.json
  • 12:42 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:35 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P80246 and previous config saved to /var/cache/conftool/dbconfig/20250729-123142-fceratto.json
  • 12:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:27 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:26 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:26 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:25 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T399249)', diff saved to https://phabricator.wikimedia.org/P80245 and previous config saved to /var/cache/conftool/dbconfig/20250729-122352-marostegui.json
  • 12:21 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:20 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T399728)', diff saved to https://phabricator.wikimedia.org/P80244 and previous config saved to /var/cache/conftool/dbconfig/20250729-121635-fceratto.json
  • 12:13 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T399728)', diff saved to https://phabricator.wikimedia.org/P80243 and previous config saved to /var/cache/conftool/dbconfig/20250729-121343-fceratto.json
  • 12:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 12:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T399728)', diff saved to https://phabricator.wikimedia.org/P80242 and previous config saved to /var/cache/conftool/dbconfig/20250729-121321-fceratto.json
  • 12:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
  • 11:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P80241 and previous config saved to /var/cache/conftool/dbconfig/20250729-115814-fceratto.json
  • 11:57 ladsgroup@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 11:56 Amir1: dropping flaggedrevs tables in frwiki, bawiki, siwiki (T398944)
  • 11:48 ladsgroup@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 11:46 ladsgroup@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 11:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P80240 and previous config saved to /var/cache/conftool/dbconfig/20250729-114306-fceratto.json
  • 11:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T399728)', diff saved to https://phabricator.wikimedia.org/P80239 and previous config saved to /var/cache/conftool/dbconfig/20250729-112759-fceratto.json
  • 11:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T399728)', diff saved to https://phabricator.wikimedia.org/P80238 and previous config saved to /var/cache/conftool/dbconfig/20250729-112515-fceratto.json
  • 11:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 11:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 11:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T399728)', diff saved to https://phabricator.wikimedia.org/P80237 and previous config saved to /var/cache/conftool/dbconfig/20250729-112354-fceratto.json
  • 11:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P80236 and previous config saved to /var/cache/conftool/dbconfig/20250729-110846-fceratto.json
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T399249)', diff saved to https://phabricator.wikimedia.org/P80235 and previous config saved to /var/cache/conftool/dbconfig/20250729-110637-marostegui.json
  • 11:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 11:03 fabfur: done upgrading haproxykafka to 0.3.12 on A:cp-eqsin (T400199)
  • 10:58 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P80234 and previous config saved to /var/cache/conftool/dbconfig/20250729-105339-fceratto.json
  • 10:49 fabfur: upgrading haproxykafka to 0.3.12 on A:cp-eqsin (and applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1173919 too) (T400199)
  • 10:46 ladsgroup@deploy1003: Finished scap sync-world: Backport for Reduce frequency of parsercache purge (T398806) (duration: 10m 26s)
  • 10:38 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 10:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T399728)', diff saved to https://phabricator.wikimedia.org/P80233 and previous config saved to /var/cache/conftool/dbconfig/20250729-103831-fceratto.json
  • 10:37 ladsgroup@deploy1003: ladsgroup: Backport for Reduce frequency of parsercache purge (T398806) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T399728)', diff saved to https://phabricator.wikimedia.org/P80232 and previous config saved to /var/cache/conftool/dbconfig/20250729-103555-fceratto.json
  • 10:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 10:35 ladsgroup@deploy1003: Started scap sync-world: Backport for Reduce frequency of parsercache purge (T398806)
  • 10:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T399728)', diff saved to https://phabricator.wikimedia.org/P80231 and previous config saved to /var/cache/conftool/dbconfig/20250729-103532-fceratto.json
  • 10:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P80230 and previous config saved to /var/cache/conftool/dbconfig/20250729-102025-fceratto.json
  • 10:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P80228 and previous config saved to /var/cache/conftool/dbconfig/20250729-100517-fceratto.json
  • 10:05 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5027.eqsin.wmnet
  • 10:04 fabfur: repooling cp5027 (T400199) - note previous ticket # (T400620) was wrong
  • 09:59 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye
  • 09:56 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5027.eqsin.wmnet
  • 09:56 fabfur: depooling cp5027 and upgrading haproxykafka to version 0.3.12 (T400620)
  • 09:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T399249)', diff saved to https://phabricator.wikimedia.org/P80227 and previous config saved to /var/cache/conftool/dbconfig/20250729-095030-marostegui.json
  • 09:50 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 09:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T399728)', diff saved to https://phabricator.wikimedia.org/P80226 and previous config saved to /var/cache/conftool/dbconfig/20250729-095009-fceratto.json
  • 09:49 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye
  • 09:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 09:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T399728)', diff saved to https://phabricator.wikimedia.org/P80225 and previous config saved to /var/cache/conftool/dbconfig/20250729-094733-fceratto.json
  • 09:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 09:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T399728)', diff saved to https://phabricator.wikimedia.org/P80224 and previous config saved to /var/cache/conftool/dbconfig/20250729-094711-fceratto.json
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P80223 and previous config saved to /var/cache/conftool/dbconfig/20250729-093523-marostegui.json
  • 09:35 cmooney@dns2005: END - running authdns-update
  • 09:34 cmooney@dns2005: START - running authdns-update
  • 09:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P80222 and previous config saved to /var/cache/conftool/dbconfig/20250729-093204-fceratto.json
  • 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P80221 and previous config saved to /var/cache/conftool/dbconfig/20250729-092015-marostegui.json
  • 09:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P80220 and previous config saved to /var/cache/conftool/dbconfig/20250729-091656-fceratto.json
  • 09:07 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2009.codfw.wmnet with OS bookworm
  • 09:07 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 09:06 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T399249)', diff saved to https://phabricator.wikimedia.org/P80219 and previous config saved to /var/cache/conftool/dbconfig/20250729-090507-marostegui.json
  • 09:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T399728)', diff saved to https://phabricator.wikimedia.org/P80218 and previous config saved to /var/cache/conftool/dbconfig/20250729-090149-fceratto.json
  • 08:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T399728)', diff saved to https://phabricator.wikimedia.org/P80217 and previous config saved to /var/cache/conftool/dbconfig/20250729-085857-fceratto.json
  • 08:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 08:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T399728)', diff saved to https://phabricator.wikimedia.org/P80216 and previous config saved to /var/cache/conftool/dbconfig/20250729-085834-fceratto.json
  • 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2009.codfw.wmnet with reason: host reimage
  • 08:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P80215 and previous config saved to /var/cache/conftool/dbconfig/20250729-084327-fceratto.json
  • 08:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2009.codfw.wmnet with reason: host reimage
  • 08:30 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm
  • 08:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P80214 and previous config saved to /var/cache/conftool/dbconfig/20250729-082819-fceratto.json
  • 08:16 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 08:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T399728)', diff saved to https://phabricator.wikimedia.org/P80213 and previous config saved to /var/cache/conftool/dbconfig/20250729-081312-fceratto.json
  • 08:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T399728)', diff saved to https://phabricator.wikimedia.org/P80212 and previous config saved to /var/cache/conftool/dbconfig/20250729-081033-fceratto.json
  • 08:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 08:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T399728)', diff saved to https://phabricator.wikimedia.org/P80211 and previous config saved to /var/cache/conftool/dbconfig/20250729-081007-fceratto.json
  • 08:09 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Introduce selectors - oblivian@cumin1003"
  • 08:09 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Introduce selectors - oblivian@cumin1003
  • 08:09 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2006.codfw.wmnet with OS bookworm
  • 08:09 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 08:09 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Introduce selectors - oblivian@cumin1003
  • 08:09 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Introduce selectors - oblivian@cumin1003"
  • 08:07 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 07:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P80210 and previous config saved to /var/cache/conftool/dbconfig/20250729-075500-fceratto.json
  • 07:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T399249)', diff saved to https://phabricator.wikimedia.org/P80209 and previous config saved to /var/cache/conftool/dbconfig/20250729-075135-marostegui.json
  • 07:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 07:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T399249)', diff saved to https://phabricator.wikimedia.org/P80208 and previous config saved to /var/cache/conftool/dbconfig/20250729-075112-marostegui.json
  • 07:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2006.codfw.wmnet with reason: host reimage
  • 07:44 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2006.codfw.wmnet with reason: host reimage
  • 07:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P80207 and previous config saved to /var/cache/conftool/dbconfig/20250729-073953-fceratto.json
  • 07:38 fabfur: haproxykafka upgraded to 0.3.11 on A:cp (T400620)
  • 07:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P80206 and previous config saved to /var/cache/conftool/dbconfig/20250729-073604-marostegui.json
  • 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P80205 and previous config saved to /var/cache/conftool/dbconfig/20250729-072924-root.json
  • 07:29 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 07:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T399728)', diff saved to https://phabricator.wikimedia.org/P80204 and previous config saved to /var/cache/conftool/dbconfig/20250729-072445-fceratto.json
  • 07:21 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T399728)', diff saved to https://phabricator.wikimedia.org/P80203 and previous config saved to /var/cache/conftool/dbconfig/20250729-072153-fceratto.json
  • 07:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P80202 and previous config saved to /var/cache/conftool/dbconfig/20250729-072057-marostegui.json
  • 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P80201 and previous config saved to /var/cache/conftool/dbconfig/20250729-071418-root.json
  • 07:09 fabfur: upgrading haproxykafka to 0.3.11 on A:cp (T400620)
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T399249)', diff saved to https://phabricator.wikimedia.org/P80200 and previous config saved to /var/cache/conftool/dbconfig/20250729-070549-marostegui.json
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P80199 and previous config saved to /var/cache/conftool/dbconfig/20250729-065910-root.json
  • 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P80197 and previous config saved to /var/cache/conftool/dbconfig/20250729-064405-root.json
  • 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1202 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P80196 and previous config saved to /var/cache/conftool/dbconfig/20250729-063657-marostegui.json
  • 06:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P80195 and previous config saved to /var/cache/conftool/dbconfig/20250729-062907-root.json
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P80194 and previous config saved to /var/cache/conftool/dbconfig/20250729-061401-root.json
  • 05:58 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P80193 and previous config saved to /var/cache/conftool/dbconfig/20250729-055855-root.json
  • 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P80192 and previous config saved to /var/cache/conftool/dbconfig/20250729-054349-root.json
  • 05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T399249)', diff saved to https://phabricator.wikimedia.org/P80191 and previous config saved to /var/cache/conftool/dbconfig/20250729-054206-marostegui.json
  • 05:41 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T399249)', diff saved to https://phabricator.wikimedia.org/P80190 and previous config saved to /var/cache/conftool/dbconfig/20250729-054142-marostegui.json
  • 05:28 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P80189 and previous config saved to /var/cache/conftool/dbconfig/20250729-052843-root.json
  • 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P80188 and previous config saved to /var/cache/conftool/dbconfig/20250729-052634-marostegui.json
  • 05:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P80187 and previous config saved to /var/cache/conftool/dbconfig/20250729-051127-marostegui.json
  • 04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T399249)', diff saved to https://phabricator.wikimedia.org/P80186 and previous config saved to /var/cache/conftool/dbconfig/20250729-045619-marostegui.json
  • 04:02 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.9 (duration: 01m 56s)
  • 03:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T399249)', diff saved to https://phabricator.wikimedia.org/P80185 and previous config saved to /var/cache/conftool/dbconfig/20250729-033111-marostegui.json
  • 03:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 03:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T399249)', diff saved to https://phabricator.wikimedia.org/P80184 and previous config saved to /var/cache/conftool/dbconfig/20250729-033048-marostegui.json
  • 03:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P80183 and previous config saved to /var/cache/conftool/dbconfig/20250729-031540-marostegui.json
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.12 refs T396373
  • 03:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P80182 and previous config saved to /var/cache/conftool/dbconfig/20250729-030033-marostegui.json
  • 02:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T399249)', diff saved to https://phabricator.wikimedia.org/P80181 and previous config saved to /var/cache/conftool/dbconfig/20250729-024525-marostegui.json
  • 01:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T399249)', diff saved to https://phabricator.wikimedia.org/P80180 and previous config saved to /var/cache/conftool/dbconfig/20250729-011911-marostegui.json
  • 01:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 01:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T399249)', diff saved to https://phabricator.wikimedia.org/P80179 and previous config saved to /var/cache/conftool/dbconfig/20250729-011848-marostegui.json
  • 01:05 eileen: * config revision changed from 1081636a to 9fff33b8
  • 01:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P80178 and previous config saved to /var/cache/conftool/dbconfig/20250729-010340-marostegui.json
  • 00:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P80177 and previous config saved to /var/cache/conftool/dbconfig/20250729-004833-marostegui.json
  • 00:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T399249)', diff saved to https://phabricator.wikimedia.org/P80176 and previous config saved to /var/cache/conftool/dbconfig/20250729-003325-marostegui.json
  • 00:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T399728)', diff saved to https://phabricator.wikimedia.org/P80175 and previous config saved to /var/cache/conftool/dbconfig/20250729-002159-fceratto.json
  • 00:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P80174 and previous config saved to /var/cache/conftool/dbconfig/20250729-000651-fceratto.json

2025-07-28

  • 23:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P80173 and previous config saved to /var/cache/conftool/dbconfig/20250728-235143-fceratto.json
  • 23:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T399728)', diff saved to https://phabricator.wikimedia.org/P80172 and previous config saved to /var/cache/conftool/dbconfig/20250728-233635-fceratto.json
  • 23:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2222 (T399728)', diff saved to https://phabricator.wikimedia.org/P80171 and previous config saved to /var/cache/conftool/dbconfig/20250728-233340-fceratto.json
  • 23:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2222.codfw.wmnet with reason: Maintenance
  • 23:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T399728)', diff saved to https://phabricator.wikimedia.org/P80170 and previous config saved to /var/cache/conftool/dbconfig/20250728-233317-fceratto.json
  • 23:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P80169 and previous config saved to /var/cache/conftool/dbconfig/20250728-231810-fceratto.json
  • 23:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T399249)', diff saved to https://phabricator.wikimedia.org/P80168 and previous config saved to /var/cache/conftool/dbconfig/20250728-230758-marostegui.json
  • 23:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 23:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T399249)', diff saved to https://phabricator.wikimedia.org/P80167 and previous config saved to /var/cache/conftool/dbconfig/20250728-230735-marostegui.json
  • 23:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P80166 and previous config saved to /var/cache/conftool/dbconfig/20250728-230302-fceratto.json
  • 22:52 kemayo@deploy1003: Finished scap sync-world: Backport for Tone check: don't cause an error when the model fails, Edit check: skip collapsed ranges when computing modified content branch nodes (T400573) (duration: 09m 31s)
  • 22:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P80165 and previous config saved to /var/cache/conftool/dbconfig/20250728-225227-marostegui.json
  • 22:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T399728)', diff saved to https://phabricator.wikimedia.org/P80164 and previous config saved to /var/cache/conftool/dbconfig/20250728-224754-fceratto.json
  • 22:47 kemayo@deploy1003: kemayo: Continuing with sync
  • 22:45 kemayo@deploy1003: kemayo: Backport for Tone check: don't cause an error when the model fails, Edit check: skip collapsed ranges when computing modified content branch nodes (T400573) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2221 (T399728)', diff saved to https://phabricator.wikimedia.org/P80163 and previous config saved to /var/cache/conftool/dbconfig/20250728-224459-fceratto.json
  • 22:44 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2221.codfw.wmnet with reason: Maintenance
  • 22:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T399728)', diff saved to https://phabricator.wikimedia.org/P80162 and previous config saved to /var/cache/conftool/dbconfig/20250728-224436-fceratto.json
  • 22:42 kemayo@deploy1003: Started scap sync-world: Backport for Tone check: don't cause an error when the model fails, Edit check: skip collapsed ranges when computing modified content branch nodes (T400573)
  • 22:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P80161 and previous config saved to /var/cache/conftool/dbconfig/20250728-223720-marostegui.json
  • 22:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P80160 and previous config saved to /var/cache/conftool/dbconfig/20250728-222929-fceratto.json
  • 22:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T399249)', diff saved to https://phabricator.wikimedia.org/P80159 and previous config saved to /var/cache/conftool/dbconfig/20250728-222212-marostegui.json
  • 22:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P80158 and previous config saved to /var/cache/conftool/dbconfig/20250728-221421-fceratto.json
  • 22:11 maryum: security deploy for multiple patches including T400526 T395858 T400500 T400545
  • 21:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T399728)', diff saved to https://phabricator.wikimedia.org/P80157 and previous config saved to /var/cache/conftool/dbconfig/20250728-215914-fceratto.json
  • 21:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T399728)', diff saved to https://phabricator.wikimedia.org/P80156 and previous config saved to /var/cache/conftool/dbconfig/20250728-215619-fceratto.json
  • 21:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 21:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T399728)', diff saved to https://phabricator.wikimedia.org/P80155 and previous config saved to /var/cache/conftool/dbconfig/20250728-215556-fceratto.json
  • 21:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P80154 and previous config saved to /var/cache/conftool/dbconfig/20250728-214049-fceratto.json
  • 21:29 sbassett@deploy1003: Finished scap sync-world: Backport for SECURITY: Fix stored i18n XSS through href attributes (duration: 08m 24s)
  • 21:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P80153 and previous config saved to /var/cache/conftool/dbconfig/20250728-212541-fceratto.json
  • 21:24 sbassett@deploy1003: sbassett: Continuing with sync
  • 21:23 sbassett@deploy1003: sbassett: Backport for SECURITY: Fix stored i18n XSS through href attributes synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:21 sbassett@deploy1003: Started scap sync-world: Backport for SECURITY: Fix stored i18n XSS through href attributes
  • 21:19 cjming: end of UTC late backport window
  • 21:18 cscott@deploy1003: Finished scap sync-world: Backport for Deploy Parsoid Read Views to 39 Wikipedias (T400510) (duration: 10m 23s)
  • 21:13 cscott@deploy1003: arlolra, cscott: Continuing with sync
  • 21:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T399728)', diff saved to https://phabricator.wikimedia.org/P80152 and previous config saved to /var/cache/conftool/dbconfig/20250728-211034-fceratto.json
  • 21:10 cscott@deploy1003: arlolra, cscott: Backport for Deploy Parsoid Read Views to 39 Wikipedias (T400510) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:08 cscott@deploy1003: Started scap sync-world: Backport for Deploy Parsoid Read Views to 39 Wikipedias (T400510)
  • 21:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T399728)', diff saved to https://phabricator.wikimedia.org/P80151 and previous config saved to /var/cache/conftool/dbconfig/20250728-210738-fceratto.json
  • 21:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 21:06 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 21:06 cjming@deploy1003: Finished scap sync-world: Backport for Enable AA test on 50 wikis (T399486) (duration: 08m 39s)
  • 21:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 21:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T399728)', diff saved to https://phabricator.wikimedia.org/P80150 and previous config saved to /var/cache/conftool/dbconfig/20250728-210513-fceratto.json
  • 21:00 cjming@deploy1003: ksarabia, cjming: Continuing with sync
  • 20:59 cjming@deploy1003: ksarabia, cjming: Backport for Enable AA test on 50 wikis (T399486) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:57 cjming@deploy1003: Started scap sync-world: Backport for Enable AA test on 50 wikis (T399486)
  • 20:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T399249)', diff saved to https://phabricator.wikimedia.org/P80147 and previous config saved to /var/cache/conftool/dbconfig/20250728-205559-marostegui.json
  • 20:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 20:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T399249)', diff saved to https://phabricator.wikimedia.org/P80146 and previous config saved to /var/cache/conftool/dbconfig/20250728-205536-marostegui.json
  • 20:53 cjming@deploy1003: Finished scap sync-world: Backport for aswikisource: add publisher (প্ৰকাশক) namespace (T399269) (duration: 08m 56s)
  • 20:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P80145 and previous config saved to /var/cache/conftool/dbconfig/20250728-205005-fceratto.json
  • 20:47 cjming@deploy1003: cjming, anzx: Continuing with sync
  • 20:46 cjming@deploy1003: cjming, anzx: Backport for aswikisource: add publisher (প্ৰকাশক) namespace (T399269) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:44 cjming@deploy1003: Started scap sync-world: Backport for aswikisource: add publisher (প্ৰকাশক) namespace (T399269)
  • 20:40 cjming@deploy1003: Finished scap sync-world: Backport for mnwwiktionary: update reconstruction namespace (T400441) (duration: 09m 08s)
  • 20:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P80144 and previous config saved to /var/cache/conftool/dbconfig/20250728-204028-marostegui.json
  • 20:35 cjming@deploy1003: cjming, anzx: Continuing with sync
  • 20:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P80143 and previous config saved to /var/cache/conftool/dbconfig/20250728-203458-fceratto.json
  • 20:33 cjming@deploy1003: cjming, anzx: Backport for mnwwiktionary: update reconstruction namespace (T400441) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:31 cjming@deploy1003: Started scap sync-world: Backport for mnwwiktionary: update reconstruction namespace (T400441)
  • 20:27 cjming@deploy1003: Finished scap sync-world: Backport for throttle: add rules for Wikimania 2025 (T400276) (duration: 08m 39s)
  • 20:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P80142 and previous config saved to /var/cache/conftool/dbconfig/20250728-202521-marostegui.json
  • 20:21 cjming@deploy1003: cjming, anzx: Continuing with sync
  • 20:20 cjming@deploy1003: cjming, anzx: Backport for throttle: add rules for Wikimania 2025 (T400276) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T399728)', diff saved to https://phabricator.wikimedia.org/P80141 and previous config saved to /var/cache/conftool/dbconfig/20250728-201950-fceratto.json
  • 20:18 cjming@deploy1003: Started scap sync-world: Backport for throttle: add rules for Wikimania 2025 (T400276)
  • 20:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T399728)', diff saved to https://phabricator.wikimedia.org/P80140 and previous config saved to /var/cache/conftool/dbconfig/20250728-201655-fceratto.json
  • 20:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 20:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T399728)', diff saved to https://phabricator.wikimedia.org/P80139 and previous config saved to /var/cache/conftool/dbconfig/20250728-201633-fceratto.json
  • 20:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T399249)', diff saved to https://phabricator.wikimedia.org/P80138 and previous config saved to /var/cache/conftool/dbconfig/20250728-201013-marostegui.json
  • 20:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P80137 and previous config saved to /var/cache/conftool/dbconfig/20250728-200125-fceratto.json
  • 19:47 dancy@deploy1003: Finished deploy [releng/jenkins-deploy@b89eed0] (releasing): Disabling the MediaWiki publish WMF single-version image job (T398873) (duration: 01m 11s)
  • 19:46 dancy@deploy1003: Started deploy [releng/jenkins-deploy@b89eed0] (releasing): Disabling the MediaWiki publish WMF single-version image job (T398873)
  • 19:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P80136 and previous config saved to /var/cache/conftool/dbconfig/20250728-194618-fceratto.json
  • 19:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T399728)', diff saved to https://phabricator.wikimedia.org/P80135 and previous config saved to /var/cache/conftool/dbconfig/20250728-193110-fceratto.json
  • 19:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T399728)', diff saved to https://phabricator.wikimedia.org/P80134 and previous config saved to /var/cache/conftool/dbconfig/20250728-192817-fceratto.json
  • 19:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 19:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T399728)', diff saved to https://phabricator.wikimedia.org/P80133 and previous config saved to /var/cache/conftool/dbconfig/20250728-192754-fceratto.json
  • 19:27 fabfur: restarting haproxykafka on A:cp-ulsfo (T400620)
  • 19:17 fabfur: updating haproxykafka to v0.3.11 on A:cp-ulsfo (T400620)
  • 19:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P80132 and previous config saved to /var/cache/conftool/dbconfig/20250728-191247-fceratto.json
  • 18:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P80131 and previous config saved to /var/cache/conftool/dbconfig/20250728-185739-fceratto.json
  • 18:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T399728)', diff saved to https://phabricator.wikimedia.org/P80130 and previous config saved to /var/cache/conftool/dbconfig/20250728-184232-fceratto.json
  • 18:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T399728)', diff saved to https://phabricator.wikimedia.org/P80129 and previous config saved to /var/cache/conftool/dbconfig/20250728-183939-fceratto.json
  • 18:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 18:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T399728)', diff saved to https://phabricator.wikimedia.org/P80128 and previous config saved to /var/cache/conftool/dbconfig/20250728-183916-fceratto.json
  • 18:26 ejegg: payments-wiki upgraded from b942983f to a0f165ad
  • 18:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P80127 and previous config saved to /var/cache/conftool/dbconfig/20250728-182409-fceratto.json
  • 18:22 fabfur: haproxykafka 0.3.11 uploaded to apt repo (T400620)
  • 18:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T399249)', diff saved to https://phabricator.wikimedia.org/P80126 and previous config saved to /var/cache/conftool/dbconfig/20250728-182134-marostegui.json
  • 18:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 18:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T399249)', diff saved to https://phabricator.wikimedia.org/P80125 and previous config saved to /var/cache/conftool/dbconfig/20250728-182111-marostegui.json
  • 18:15 fabfur: repooled cp4037 (T400620)
  • 18:15 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 18:14 fabfur: depooling cp4037 to upgrade new haproxykafka version (T400620)
  • 18:13 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 18:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P80124 and previous config saved to /var/cache/conftool/dbconfig/20250728-180901-fceratto.json
  • 18:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P80123 and previous config saved to /var/cache/conftool/dbconfig/20250728-180603-marostegui.json
  • 17:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T399728)', diff saved to https://phabricator.wikimedia.org/P80122 and previous config saved to /var/cache/conftool/dbconfig/20250728-175354-fceratto.json
  • 17:51 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 17:51 fabfur: repooling cp4037 after upgrading haproxykafka (T400620)
  • 17:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T399728)', diff saved to https://phabricator.wikimedia.org/P80121 and previous config saved to /var/cache/conftool/dbconfig/20250728-175056-fceratto.json
  • 17:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P80120 and previous config saved to /var/cache/conftool/dbconfig/20250728-175055-marostegui.json
  • 17:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 17:46 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 17:45 fabfur: depooling cp4037 to upgrade to latest haproxykafka version (0.3.11) (T400620)
  • 17:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T399249)', diff saved to https://phabricator.wikimedia.org/P80119 and previous config saved to /var/cache/conftool/dbconfig/20250728-173548-marostegui.json
  • 16:50 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye
  • 16:46 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 16:45 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 16:44 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 16:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:37 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 16:37 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 16:36 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 16:36 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 16:35 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 16:35 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:34 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye
  • 16:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 16:24 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bullseye
  • 16:23 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 16:22 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 16:21 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 16:21 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:20 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 16:20 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:20 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:20 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 16:19 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 16:19 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:18 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 16:18 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 16:18 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 16:16 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 16:16 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bullseye
  • 16:16 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 16:15 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 16:15 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:15 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 16:15 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:14 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:14 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 16:13 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 16:13 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:12 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 16:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:12 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 16:12 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 16:11 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:11 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:11 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 16:10 bd808@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 16:10 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 16:09 bd808@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 16:09 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:08 bd808@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 16:08 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 16:08 bd808@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 16:08 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:07 bd808@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 16:07 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 16:07 bd808@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 16:02 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T399249)', diff saved to https://phabricator.wikimedia.org/P80118 and previous config saved to /var/cache/conftool/dbconfig/20250728-154114-marostegui.json
  • 15:41 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 15:38 dancy@deploy1003: Installation of scap version "4.192.0" completed for 180 hosts
  • 15:34 dancy@deploy1003: Installing scap version "4.192.0" for 180 host(s)
  • 15:27 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:59 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for objectcache: Only clean a subset of tables in SqlBagOStuff (T398806) (duration: 12m 30s)
  • 14:52 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2038
  • 14:52 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es2038
  • 14:51 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:50 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 14:49 ladsgroup@deploy1003: ladsgroup: Backport for objectcache: Only clean a subset of tables in SqlBagOStuff (T398806) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:48 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 14:44 ladsgroup@deploy1003: Started scap sync-world: Backport for objectcache: Only clean a subset of tables in SqlBagOStuff (T398806)
  • 14:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:20 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for fix: avoid using wikitext that triggers ping notifications (T400369) (duration: 39m 52s)
  • 14:07 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, migr: Continuing with sync
  • 14:04 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, migr: Backport for fix: avoid using wikitext that triggers ping notifications (T400369) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:53 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 13:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 13:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T399249)', diff saved to https://phabricator.wikimedia.org/P80115 and previous config saved to /var/cache/conftool/dbconfig/20250728-135111-marostegui.json
  • 13:49 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:48 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 13:47 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 13:46 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 13:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T399728)', diff saved to https://phabricator.wikimedia.org/P80114 and previous config saved to /var/cache/conftool/dbconfig/20250728-134632-fceratto.json
  • 13:45 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:40 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for fix: avoid using wikitext that triggers ping notifications (T400369)
  • 13:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P80113 and previous config saved to /var/cache/conftool/dbconfig/20250728-133604-marostegui.json
  • 13:35 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Growth: enable new way of refreshing LinkRecommendations for more wikis (T386250 T392944), Echo: be explicit about special wikis using Wikipedia logo (T400070) (duration: 08m 06s)
  • 13:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P80112 and previous config saved to /var/cache/conftool/dbconfig/20250728-133124-fceratto.json
  • 13:30 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, migr: Continuing with sync
  • 13:29 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, migr: Backport for Growth: enable new way of refreshing LinkRecommendations for more wikis (T386250 T392944), Echo: be explicit about special wikis using Wikipedia logo (T400070) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Growth: enable new way of refreshing LinkRecommendations for more wikis (T386250 T392944), Echo: be explicit about special wikis using Wikipedia logo (T400070)
  • 13:25 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Revert "aswikisource: add publisher (প্ৰকাশক) namespace" (T399269) (duration: 08m 28s)
  • 13:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P80110 and previous config saved to /var/cache/conftool/dbconfig/20250728-132056-marostegui.json
  • 13:19 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
  • 13:18 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Revert "aswikisource: add publisher (প্ৰকাশক) namespace" (T399269) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:16 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Revert "aswikisource: add publisher (প্ৰকাশক) namespace" (T399269)
  • 13:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P80109 and previous config saved to /var/cache/conftool/dbconfig/20250728-131617-fceratto.json
  • 13:14 lucaswerkmeister-wmde@deploy1003: Sync cancelled.
  • 13:09 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, anzx: Backport for aswikisource: add publisher (প্ৰকাশক) namespace (T399269) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:07 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for aswikisource: add publisher (প্ৰকাশক) namespace (T399269)
  • 13:06 sukhe: sukhe@idp1004:~$ sudo systemctl restart tomcat10.service
  • 13:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T399249)', diff saved to https://phabricator.wikimedia.org/P80108 and previous config saved to /var/cache/conftool/dbconfig/20250728-130549-marostegui.json
  • 13:04 reedy@deploy1003: Finished scap sync-world: Backport for Allow index dump from non-managed cluster (T400158) (duration: 13m 39s)
  • 13:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T399249)', diff saved to https://phabricator.wikimedia.org/P80107 and previous config saved to /var/cache/conftool/dbconfig/20250728-130441-marostegui.json
  • 13:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 13:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T399249)', diff saved to https://phabricator.wikimedia.org/P80106 and previous config saved to /var/cache/conftool/dbconfig/20250728-130418-marostegui.json
  • 13:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T399728)', diff saved to https://phabricator.wikimedia.org/P80105 and previous config saved to /var/cache/conftool/dbconfig/20250728-130109-fceratto.json
  • 12:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2228 (T399728)', diff saved to https://phabricator.wikimedia.org/P80104 and previous config saved to /var/cache/conftool/dbconfig/20250728-125818-fceratto.json
  • 12:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2228.codfw.wmnet with reason: Maintenance
  • 12:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T399728)', diff saved to https://phabricator.wikimedia.org/P80103 and previous config saved to /var/cache/conftool/dbconfig/20250728-125756-fceratto.json
  • 12:56 reedy@deploy1003: reedy: Continuing with sync
  • 12:55 reedy@deploy1003: reedy: Backport for Allow index dump from non-managed cluster (T400158) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:51 reedy@deploy1003: Started scap sync-world: Backport for Allow index dump from non-managed cluster (T400158)
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P80102 and previous config saved to /var/cache/conftool/dbconfig/20250728-124910-marostegui.json
  • 12:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P80101 and previous config saved to /var/cache/conftool/dbconfig/20250728-124249-fceratto.json
  • 12:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P80100 and previous config saved to /var/cache/conftool/dbconfig/20250728-123403-marostegui.json
  • 12:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P80099 and previous config saved to /var/cache/conftool/dbconfig/20250728-122741-fceratto.json
  • 12:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T399249)', diff saved to https://phabricator.wikimedia.org/P80098 and previous config saved to /var/cache/conftool/dbconfig/20250728-121855-marostegui.json
  • 12:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T399249)', diff saved to https://phabricator.wikimedia.org/P80097 and previous config saved to /var/cache/conftool/dbconfig/20250728-121747-marostegui.json
  • 12:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 12:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T399249)', diff saved to https://phabricator.wikimedia.org/P80096 and previous config saved to /var/cache/conftool/dbconfig/20250728-121725-marostegui.json
  • 12:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T399728)', diff saved to https://phabricator.wikimedia.org/P80095 and previous config saved to /var/cache/conftool/dbconfig/20250728-121234-fceratto.json
  • 12:08 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2223 (T399728)', diff saved to https://phabricator.wikimedia.org/P80094 and previous config saved to /var/cache/conftool/dbconfig/20250728-120839-fceratto.json
  • 12:08 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2223.codfw.wmnet with reason: Maintenance
  • 12:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T399728)', diff saved to https://phabricator.wikimedia.org/P80093 and previous config saved to /var/cache/conftool/dbconfig/20250728-120816-fceratto.json
  • 12:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P80092 and previous config saved to /var/cache/conftool/dbconfig/20250728-120217-marostegui.json
  • 11:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P80091 and previous config saved to /var/cache/conftool/dbconfig/20250728-115309-fceratto.json
  • 11:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P80090 and previous config saved to /var/cache/conftool/dbconfig/20250728-114710-marostegui.json
  • 11:47 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 11:46 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 11:46 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 11:45 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 11:44 ladsgroup@deploy1003: Finished scap sync-world: Backport for ParserCache: Enable purgePeriod for SqlBagOStuff (T398806) (duration: 19m 51s)
  • 11:43 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:43 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P80089 and previous config saved to /var/cache/conftool/dbconfig/20250728-113801-fceratto.json
  • 11:34 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T399249)', diff saved to https://phabricator.wikimedia.org/P80088 and previous config saved to /var/cache/conftool/dbconfig/20250728-113202-marostegui.json
  • 11:31 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 11:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T399249)', diff saved to https://phabricator.wikimedia.org/P80087 and previous config saved to /var/cache/conftool/dbconfig/20250728-113054-marostegui.json
  • 11:30 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 11:30 ladsgroup@deploy1003: ladsgroup: Backport for ParserCache: Enable purgePeriod for SqlBagOStuff (T398806) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T399249)', diff saved to https://phabricator.wikimedia.org/P80086 and previous config saved to /var/cache/conftool/dbconfig/20250728-113031-marostegui.json
  • 11:24 ladsgroup@deploy1003: Started scap sync-world: Backport for ParserCache: Enable purgePeriod for SqlBagOStuff (T398806)
  • 11:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T399728)', diff saved to https://phabricator.wikimedia.org/P80085 and previous config saved to /var/cache/conftool/dbconfig/20250728-112254-fceratto.json
  • 11:20 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T399728)', diff saved to https://phabricator.wikimedia.org/P80084 and previous config saved to /var/cache/conftool/dbconfig/20250728-111958-fceratto.json
  • 11:19 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 11:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 11:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T399728)', diff saved to https://phabricator.wikimedia.org/P80083 and previous config saved to /var/cache/conftool/dbconfig/20250728-111730-fceratto.json
  • 11:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P80082 and previous config saved to /var/cache/conftool/dbconfig/20250728-111524-marostegui.json
  • 11:15 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:11 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P80081 and previous config saved to /var/cache/conftool/dbconfig/20250728-110222-fceratto.json
  • 11:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P80080 and previous config saved to /var/cache/conftool/dbconfig/20250728-110016-marostegui.json
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P80079 and previous config saved to /var/cache/conftool/dbconfig/20250728-105323-root.json
  • 10:52 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P80078 and previous config saved to /var/cache/conftool/dbconfig/20250728-104715-fceratto.json
  • 10:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T399249)', diff saved to https://phabricator.wikimedia.org/P80077 and previous config saved to /var/cache/conftool/dbconfig/20250728-104508-marostegui.json
  • 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T399249)', diff saved to https://phabricator.wikimedia.org/P80076 and previous config saved to /var/cache/conftool/dbconfig/20250728-104400-marostegui.json
  • 10:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T399249)', diff saved to https://phabricator.wikimedia.org/P80075 and previous config saved to /var/cache/conftool/dbconfig/20250728-104337-marostegui.json
  • 10:41 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:40 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:39 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P80074 and previous config saved to /var/cache/conftool/dbconfig/20250728-103817-root.json
  • 10:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T399728)', diff saved to https://phabricator.wikimedia.org/P80073 and previous config saved to /var/cache/conftool/dbconfig/20250728-103208-fceratto.json
  • 10:29 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T399728)', diff saved to https://phabricator.wikimedia.org/P80072 and previous config saved to /var/cache/conftool/dbconfig/20250728-102918-fceratto.json
  • 10:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 10:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T399728)', diff saved to https://phabricator.wikimedia.org/P80071 and previous config saved to /var/cache/conftool/dbconfig/20250728-102856-fceratto.json
  • 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P80070 and previous config saved to /var/cache/conftool/dbconfig/20250728-102830-marostegui.json
  • 10:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 10:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P80069 and previous config saved to /var/cache/conftool/dbconfig/20250728-102444-root.json
  • 10:24 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1012.eqiad.wmnet with OS trixie
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P80068 and previous config saved to /var/cache/conftool/dbconfig/20250728-102311-root.json
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P80067 and previous config saved to /var/cache/conftool/dbconfig/20250728-102300-root.json
  • 10:17 btullis@deploy1003: Finished scap build-images: Updating mediawiki-cli image for T400383 (duration: 16m 00s)
  • 10:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P80066 and previous config saved to /var/cache/conftool/dbconfig/20250728-101348-fceratto.json
  • 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P80065 and previous config saved to /var/cache/conftool/dbconfig/20250728-101322-marostegui.json
  • 10:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P80064 and previous config saved to /var/cache/conftool/dbconfig/20250728-100938-root.json
  • 10:09 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1012.eqiad.wmnet with reason: host reimage
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P80063 and previous config saved to /var/cache/conftool/dbconfig/20250728-100806-root.json
  • 10:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P80062 and previous config saved to /var/cache/conftool/dbconfig/20250728-100754-root.json
  • 10:05 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1012.eqiad.wmnet with reason: host reimage
  • 10:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2162 T400599', diff saved to https://phabricator.wikimedia.org/P80061 and previous config saved to /var/cache/conftool/dbconfig/20250728-100243-marostegui.json
  • 10:02 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2241 to x3 primary T400599', diff saved to https://phabricator.wikimedia.org/P80060 and previous config saved to /var/cache/conftool/dbconfig/20250728-100208-root.json
  • 10:01 marostegui: Starting x3 codfw failover from db2162 to db2241 - T400599
  • 10:01 btullis@deploy1003: Started scap build-images: Updating mediawiki-cli image for T400383
  • 09:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P80059 and previous config saved to /var/cache/conftool/dbconfig/20250728-095841-fceratto.json
  • 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T399249)', diff saved to https://phabricator.wikimedia.org/P80058 and previous config saved to /var/cache/conftool/dbconfig/20250728-095815-marostegui.json
  • 09:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T399249)', diff saved to https://phabricator.wikimedia.org/P80057 and previous config saved to /var/cache/conftool/dbconfig/20250728-095601-marostegui.json
  • 09:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 09:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 16 hosts with reason: Primary switchover x3 T400599
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T399249)', diff saved to https://phabricator.wikimedia.org/P80056 and previous config saved to /var/cache/conftool/dbconfig/20250728-095539-marostegui.json
  • 09:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P80055 and previous config saved to /var/cache/conftool/dbconfig/20250728-095432-root.json
  • 09:53 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS trixie
  • 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P80054 and previous config saved to /var/cache/conftool/dbconfig/20250728-095249-root.json
  • 09:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T399728)', diff saved to https://phabricator.wikimedia.org/P80053 and previous config saved to /var/cache/conftool/dbconfig/20250728-094333-fceratto.json
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P80052 and previous config saved to /var/cache/conftool/dbconfig/20250728-094031-marostegui.json
  • 09:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T399728)', diff saved to https://phabricator.wikimedia.org/P80051 and previous config saved to /var/cache/conftool/dbconfig/20250728-093945-fceratto.json
  • 09:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 09:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P80050 and previous config saved to /var/cache/conftool/dbconfig/20250728-093926-root.json
  • 09:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T399728)', diff saved to https://phabricator.wikimedia.org/P80049 and previous config saved to /var/cache/conftool/dbconfig/20250728-093922-fceratto.json
  • 09:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P80048 and previous config saved to /var/cache/conftool/dbconfig/20250728-093743-root.json
  • 09:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:29 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:29 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P80047 and previous config saved to /var/cache/conftool/dbconfig/20250728-092524-marostegui.json
  • 09:24 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P80046 and previous config saved to /var/cache/conftool/dbconfig/20250728-092421-root.json
  • 09:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P80045 and previous config saved to /var/cache/conftool/dbconfig/20250728-092414-fceratto.json
  • 09:24 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P80044 and previous config saved to /var/cache/conftool/dbconfig/20250728-092237-root.json
  • 09:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2218,2243].codfw.wmnet with reason: Maintenance
  • 09:19 marostegui@cumin1002: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T399249)', diff saved to https://phabricator.wikimedia.org/P80043 and previous config saved to /var/cache/conftool/dbconfig/20250728-091016-marostegui.json
  • 09:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P80042 and previous config saved to /var/cache/conftool/dbconfig/20250728-090907-fceratto.json
  • 09:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T399249)', diff saved to https://phabricator.wikimedia.org/P80041 and previous config saved to /var/cache/conftool/dbconfig/20250728-090802-marostegui.json
  • 09:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T399249)', diff saved to https://phabricator.wikimedia.org/P80040 and previous config saved to /var/cache/conftool/dbconfig/20250728-090739-marostegui.json
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2218 T400591', diff saved to https://phabricator.wikimedia.org/P80039 and previous config saved to /var/cache/conftool/dbconfig/20250728-090407-marostegui.json
  • 09:03 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2220 to s7 primary T400591', diff saved to https://phabricator.wikimedia.org/P80038 and previous config saved to /var/cache/conftool/dbconfig/20250728-090314-root.json
  • 09:02 marostegui: Starting s7 codfw failover from db2218 to db2220 - T400591
  • 08:59 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2220 from API/vslow/dump T400591', diff saved to https://phabricator.wikimedia.org/P80037 and previous config saved to /var/cache/conftool/dbconfig/20250728-085912-root.json
  • 08:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T400591
  • 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2220 with weight 0 T400591', diff saved to https://phabricator.wikimedia.org/P80036 and previous config saved to /var/cache/conftool/dbconfig/20250728-085840-root.json
  • 08:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T399728)', diff saved to https://phabricator.wikimedia.org/P80035 and previous config saved to /var/cache/conftool/dbconfig/20250728-085359-fceratto.json
  • 08:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P80034 and previous config saved to /var/cache/conftool/dbconfig/20250728-085231-marostegui.json
  • 08:52 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:50 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T399728)', diff saved to https://phabricator.wikimedia.org/P80033 and previous config saved to /var/cache/conftool/dbconfig/20250728-085004-fceratto.json
  • 08:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 08:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T399728)', diff saved to https://phabricator.wikimedia.org/P80032 and previous config saved to /var/cache/conftool/dbconfig/20250728-084941-fceratto.json
  • 08:49 hashar@deploy1003: Finished deploy [integration/docroot@827d626]: build: Updating brace-expansion to 1.1.12, 2.0.2 (duration: 00m 13s)
  • 08:48 hashar@deploy1003: Started deploy [integration/docroot@827d626]: build: Updating brace-expansion to 1.1.12, 2.0.2
  • 08:47 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:46 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:38 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:38 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P80031 and previous config saved to /var/cache/conftool/dbconfig/20250728-083724-marostegui.json
  • 08:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P80030 and previous config saved to /var/cache/conftool/dbconfig/20250728-083433-fceratto.json
  • 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T399249)', diff saved to https://phabricator.wikimedia.org/P80029 and previous config saved to /var/cache/conftool/dbconfig/20250728-082216-marostegui.json
  • 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T399249)', diff saved to https://phabricator.wikimedia.org/P80028 and previous config saved to /var/cache/conftool/dbconfig/20250728-082002-marostegui.json
  • 08:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 08:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T399249)', diff saved to https://phabricator.wikimedia.org/P80027 and previous config saved to /var/cache/conftool/dbconfig/20250728-081939-marostegui.json
  • 08:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P80026 and previous config saved to /var/cache/conftool/dbconfig/20250728-081926-fceratto.json
  • 08:19 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P80025 and previous config saved to /var/cache/conftool/dbconfig/20250728-080940-root.json
  • 08:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P80024 and previous config saved to /var/cache/conftool/dbconfig/20250728-080432-marostegui.json
  • 08:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T399728)', diff saved to https://phabricator.wikimedia.org/P80023 and previous config saved to /var/cache/conftool/dbconfig/20250728-080418-fceratto.json
  • 08:00 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T399728)', diff saved to https://phabricator.wikimedia.org/P80022 and previous config saved to /var/cache/conftool/dbconfig/20250728-080026-fceratto.json
  • 08:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P80021 and previous config saved to /var/cache/conftool/dbconfig/20250728-075435-root.json
  • 07:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Secondary switchover s7 T400591
  • 07:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P80020 and previous config saved to /var/cache/conftool/dbconfig/20250728-074924-marostegui.json
  • 07:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P80019 and previous config saved to /var/cache/conftool/dbconfig/20250728-073929-root.json
  • 07:38 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T399249)', diff saved to https://phabricator.wikimedia.org/P80018 and previous config saved to /var/cache/conftool/dbconfig/20250728-073417-marostegui.json
  • 07:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T399249)', diff saved to https://phabricator.wikimedia.org/P80016 and previous config saved to /var/cache/conftool/dbconfig/20250728-073203-marostegui.json
  • 07:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T399249)', diff saved to https://phabricator.wikimedia.org/P80015 and previous config saved to /var/cache/conftool/dbconfig/20250728-073119-marostegui.json
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P80014 and previous config saved to /var/cache/conftool/dbconfig/20250728-072423-root.json
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2220 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P80013 and previous config saved to /var/cache/conftool/dbconfig/20250728-071643-marostegui.json
  • 07:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P80012 and previous config saved to /var/cache/conftool/dbconfig/20250728-071611-marostegui.json
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P80011 and previous config saved to /var/cache/conftool/dbconfig/20250728-070103-marostegui.json
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T399249)', diff saved to https://phabricator.wikimedia.org/P80010 and previous config saved to /var/cache/conftool/dbconfig/20250728-064556-marostegui.json
  • 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T399249)', diff saved to https://phabricator.wikimedia.org/P80009 and previous config saved to /var/cache/conftool/dbconfig/20250728-064241-marostegui.json
  • 06:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2038 T400436', diff saved to https://phabricator.wikimedia.org/P80008 and previous config saved to /var/cache/conftool/dbconfig/20250728-063039-root.json
  • 06:30 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es2038.codfw.wmnet with reason: Maintenance

2025-07-27

  • 17:44 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 17:44 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: sync

2025-07-26

  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P80007 and previous config saved to /var/cache/conftool/dbconfig/20250726-103838-root.json
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P80006 and previous config saved to /var/cache/conftool/dbconfig/20250726-103833-root.json
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P80005 and previous config saved to /var/cache/conftool/dbconfig/20250726-102333-root.json
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P80004 and previous config saved to /var/cache/conftool/dbconfig/20250726-102327-root.json
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P80003 and previous config saved to /var/cache/conftool/dbconfig/20250726-100827-root.json
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P80002 and previous config saved to /var/cache/conftool/dbconfig/20250726-100821-root.json
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P80001 and previous config saved to /var/cache/conftool/dbconfig/20250726-095321-root.json
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P80000 and previous config saved to /var/cache/conftool/dbconfig/20250726-095315-root.json
  • 09:38 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P79999 and previous config saved to /var/cache/conftool/dbconfig/20250726-093815-root.json
  • 09:38 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P79998 and previous config saved to /var/cache/conftool/dbconfig/20250726-093810-root.json

2025-07-25

  • 22:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2196 T400514', diff saved to https://phabricator.wikimedia.org/P79997 and previous config saved to /var/cache/conftool/dbconfig/20250725-223736-marostegui.json
  • 22:36 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2215 to x1 primary T400514', diff saved to https://phabricator.wikimedia.org/P79996 and previous config saved to /var/cache/conftool/dbconfig/20250725-223622-marostegui.json
  • 22:35 marostegui: Starting x1 codfw failover from db2196 to db2215 - T400514
  • 22:29 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2215 with weight 0 T400514', diff saved to https://phabricator.wikimedia.org/P79995 and previous config saved to /var/cache/conftool/dbconfig/20250725-222856-root.json
  • 22:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Primary switchover x1 T400514
  • 22:07 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2036.codfw.wmnet with OS bookworm
  • 21:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 21:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T399249)', diff saved to https://phabricator.wikimedia.org/P79994 and previous config saved to /var/cache/conftool/dbconfig/20250725-215314-marostegui.json
  • 21:44 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
  • 21:41 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2036.codfw.wmnet with reason: host reimage
  • 21:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P79993 and previous config saved to /var/cache/conftool/dbconfig/20250725-213806-marostegui.json
  • 21:23 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host logstash2036
  • 21:23 cwhite@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2036
  • 21:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P79992 and previous config saved to /var/cache/conftool/dbconfig/20250725-212259-marostegui.json
  • 21:20 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2006.codfw.wmnet with OS bookworm
  • 21:15 cwhite@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2036
  • 21:15 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2036.codfw.wmnet 54.16.192.10.in-addr.arpa 4.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:15 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2036.codfw.wmnet 54.16.192.10.in-addr.arpa 4.5.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 21:15 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:15 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host logstash2036 - cwhite@cumin2002"
  • 21:15 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host logstash2036 - cwhite@cumin2002"
  • 21:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T399249)', diff saved to https://phabricator.wikimedia.org/P79991 and previous config saved to /var/cache/conftool/dbconfig/20250725-210752-marostegui.json
  • 21:07 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 21:07 cwhite@cumin2002: START - Cookbook sre.hosts.move-vlan for host logstash2036
  • 21:06 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2036.codfw.wmnet with OS bookworm
  • 21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T399249)', diff saved to https://phabricator.wikimedia.org/P79990 and previous config saved to /var/cache/conftool/dbconfig/20250725-210536-marostegui.json
  • 21:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T399249)', diff saved to https://phabricator.wikimedia.org/P79989 and previous config saved to /var/cache/conftool/dbconfig/20250725-210513-marostegui.json
  • 20:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P79988 and previous config saved to /var/cache/conftool/dbconfig/20250725-205005-marostegui.json
  • 20:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P79987 and previous config saved to /var/cache/conftool/dbconfig/20250725-203458-marostegui.json
  • 20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T399249)', diff saved to https://phabricator.wikimedia.org/P79986 and previous config saved to /var/cache/conftool/dbconfig/20250725-201951-marostegui.json
  • 20:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T399249)', diff saved to https://phabricator.wikimedia.org/P79985 and previous config saved to /var/cache/conftool/dbconfig/20250725-201735-marostegui.json
  • 20:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 20:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T399249)', diff saved to https://phabricator.wikimedia.org/P79984 and previous config saved to /var/cache/conftool/dbconfig/20250725-201711-marostegui.json
  • 20:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P79983 and previous config saved to /var/cache/conftool/dbconfig/20250725-200204-marostegui.json
  • 19:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P79982 and previous config saved to /var/cache/conftool/dbconfig/20250725-194657-marostegui.json
  • 19:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T399249)', diff saved to https://phabricator.wikimedia.org/P79981 and previous config saved to /var/cache/conftool/dbconfig/20250725-193149-marostegui.json
  • 19:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T399249)', diff saved to https://phabricator.wikimedia.org/P79980 and previous config saved to /var/cache/conftool/dbconfig/20250725-192933-marostegui.json
  • 19:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 19:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T399249)', diff saved to https://phabricator.wikimedia.org/P79979 and previous config saved to /var/cache/conftool/dbconfig/20250725-192910-marostegui.json
  • 19:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P79978 and previous config saved to /var/cache/conftool/dbconfig/20250725-191403-marostegui.json
  • 18:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P79977 and previous config saved to /var/cache/conftool/dbconfig/20250725-185855-marostegui.json
  • 18:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T399249)', diff saved to https://phabricator.wikimedia.org/P79976 and previous config saved to /var/cache/conftool/dbconfig/20250725-184349-marostegui.json
  • 18:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T399249)', diff saved to https://phabricator.wikimedia.org/P79975 and previous config saved to /var/cache/conftool/dbconfig/20250725-184133-marostegui.json
  • 18:41 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T399249)', diff saved to https://phabricator.wikimedia.org/P79974 and previous config saved to /var/cache/conftool/dbconfig/20250725-184110-marostegui.json
  • 18:35 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2005.codfw.wmnet with OS bookworm
  • 18:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P79973 and previous config saved to /var/cache/conftool/dbconfig/20250725-182603-marostegui.json
  • 18:23 sbisson@deploy1003: Finished scap sync-world: Backport for Change how VE mobile toolbar is overridden (T400486) (duration: 09m 09s)
  • 18:17 sbisson@deploy1003: sbisson: Continuing with sync
  • 18:16 sbisson@deploy1003: sbisson: Backport for Change how VE mobile toolbar is overridden (T400486) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:14 sbisson@deploy1003: Started scap sync-world: Backport for Change how VE mobile toolbar is overridden (T400486)
  • 18:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P79972 and previous config saved to /var/cache/conftool/dbconfig/20250725-181055-marostegui.json
  • 17:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T399249)', diff saved to https://phabricator.wikimedia.org/P79971 and previous config saved to /var/cache/conftool/dbconfig/20250725-175548-marostegui.json
  • 17:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T399249)', diff saved to https://phabricator.wikimedia.org/P79970 and previous config saved to /var/cache/conftool/dbconfig/20250725-175332-marostegui.json
  • 17:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 17:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T399249)', diff saved to https://phabricator.wikimedia.org/P79969 and previous config saved to /var/cache/conftool/dbconfig/20250725-175310-marostegui.json
  • 17:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P79968 and previous config saved to /var/cache/conftool/dbconfig/20250725-173804-fceratto.json
  • 17:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P79967 and previous config saved to /var/cache/conftool/dbconfig/20250725-173803-marostegui.json
  • 17:32 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 17:26 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2005.codfw.wmnet with OS bullseye
  • 17:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P79966 and previous config saved to /var/cache/conftool/dbconfig/20250725-172255-marostegui.json
  • 17:15 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 17:13 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:10 dancy@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 07s)
  • 17:09 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bullseye
  • 17:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T399728)', diff saved to https://phabricator.wikimedia.org/P79965 and previous config saved to /var/cache/conftool/dbconfig/20250725-170749-fceratto.json
  • 17:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T399249)', diff saved to https://phabricator.wikimedia.org/P79964 and previous config saved to /var/cache/conftool/dbconfig/20250725-170748-marostegui.json
  • 17:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T399249)', diff saved to https://phabricator.wikimedia.org/P79963 and previous config saved to /var/cache/conftool/dbconfig/20250725-170532-marostegui.json
  • 17:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 17:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T399249)', diff saved to https://phabricator.wikimedia.org/P79962 and previous config saved to /var/cache/conftool/dbconfig/20250725-170509-marostegui.json
  • 17:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2240 (T399728)', diff saved to https://phabricator.wikimedia.org/P79961 and previous config saved to /var/cache/conftool/dbconfig/20250725-170254-fceratto.json
  • 17:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2240.codfw.wmnet with reason: Maintenance
  • 16:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 16:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T399728)', diff saved to https://phabricator.wikimedia.org/P79960 and previous config saved to /var/cache/conftool/dbconfig/20250725-165923-fceratto.json
  • 16:59 dancy@deploy1003: Started scap build-images: Publishing wmf/next image
  • 16:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P79959 and previous config saved to /var/cache/conftool/dbconfig/20250725-165002-marostegui.json
  • 16:44 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host sretest2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P79958 and previous config saved to /var/cache/conftool/dbconfig/20250725-164416-fceratto.json
  • 16:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P79957 and previous config saved to /var/cache/conftool/dbconfig/20250725-163454-marostegui.json
  • 16:29 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2037.codfw.wmnet with OS bookworm
  • 16:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P79956 and previous config saved to /var/cache/conftool/dbconfig/20250725-162908-fceratto.json
  • 16:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T399249)', diff saved to https://phabricator.wikimedia.org/P79955 and previous config saved to /var/cache/conftool/dbconfig/20250725-161946-marostegui.json
  • 16:18 dancy@deploy1003: Installation of scap version "4.191.0" completed for 2 hosts
  • 16:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T399249)', diff saved to https://phabricator.wikimedia.org/P79954 and previous config saved to /var/cache/conftool/dbconfig/20250725-161730-marostegui.json
  • 16:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 16:17 dancy@deploy1003: Installing scap version "4.191.0" for 2 host(s)
  • 16:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T399249)', diff saved to https://phabricator.wikimedia.org/P79953 and previous config saved to /var/cache/conftool/dbconfig/20250725-161707-marostegui.json
  • 16:15 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T399728)', diff saved to https://phabricator.wikimedia.org/P79952 and previous config saved to /var/cache/conftool/dbconfig/20250725-161402-fceratto.json
  • 16:10 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 16:09 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 16:09 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T399728)', diff saved to https://phabricator.wikimedia.org/P79951 and previous config saved to /var/cache/conftool/dbconfig/20250725-160904-fceratto.json
  • 16:08 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 16:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T399728)', diff saved to https://phabricator.wikimedia.org/P79950 and previous config saved to /var/cache/conftool/dbconfig/20250725-160840-fceratto.json
  • 16:03 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage
  • 16:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P79949 and previous config saved to /var/cache/conftool/dbconfig/20250725-160200-marostegui.json
  • 15:56 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2037.codfw.wmnet with reason: host reimage
  • 15:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P79948 and previous config saved to /var/cache/conftool/dbconfig/20250725-155333-fceratto.json
  • 15:48 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host sretest2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P79947 and previous config saved to /var/cache/conftool/dbconfig/20250725-154652-marostegui.json
  • 15:46 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:44 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:42 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:41 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:41 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:39 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:38 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host logstash2037
  • 15:38 cwhite@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host logstash2037
  • 15:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P79946 and previous config saved to /var/cache/conftool/dbconfig/20250725-153825-fceratto.json
  • 15:37 cwhite@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host logstash2037
  • 15:37 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2037.codfw.wmnet 130.32.192.10.in-addr.arpa 0.3.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:37 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2037.codfw.wmnet 130.32.192.10.in-addr.arpa 0.3.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 15:37 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host logstash2037 - cwhite@cumin2002"
  • 15:37 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host logstash2037 - cwhite@cumin2002"
  • 15:35 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2037
  • 15:35 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es2037
  • 15:33 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:33 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating es2037 to codfw - jhancock@cumin1003"
  • 15:33 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating es2037 to codfw - jhancock@cumin1003"
  • 15:31 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 15:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T399249)', diff saved to https://phabricator.wikimedia.org/P79945 and previous config saved to /var/cache/conftool/dbconfig/20250725-153145-marostegui.json
  • 15:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T399249)', diff saved to https://phabricator.wikimedia.org/P79944 and previous config saved to /var/cache/conftool/dbconfig/20250725-152930-marostegui.json
  • 15:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 15:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T399249)', diff saved to https://phabricator.wikimedia.org/P79943 and previous config saved to /var/cache/conftool/dbconfig/20250725-152906-marostegui.json
  • 15:27 cwhite@cumin2002: START - Cookbook sre.hosts.move-vlan for host logstash2037
  • 15:26 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2037.codfw.wmnet with OS bookworm
  • 15:26 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 15:25 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:24 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T399728)', diff saved to https://phabricator.wikimedia.org/P79942 and previous config saved to /var/cache/conftool/dbconfig/20250725-152318-fceratto.json
  • 15:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2236 (T399728)', diff saved to https://phabricator.wikimedia.org/P79941 and previous config saved to /var/cache/conftool/dbconfig/20250725-151823-fceratto.json
  • 15:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2236.codfw.wmnet with reason: Maintenance
  • 15:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T399728)', diff saved to https://phabricator.wikimedia.org/P79940 and previous config saved to /var/cache/conftool/dbconfig/20250725-151801-fceratto.json
  • 15:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P79939 and previous config saved to /var/cache/conftool/dbconfig/20250725-151359-marostegui.json
  • 15:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P79938 and previous config saved to /var/cache/conftool/dbconfig/20250725-150253-fceratto.json
  • 14:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P79937 and previous config saved to /var/cache/conftool/dbconfig/20250725-145851-marostegui.json
  • 14:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P79936 and previous config saved to /var/cache/conftool/dbconfig/20250725-144746-fceratto.json
  • 14:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T399249)', diff saved to https://phabricator.wikimedia.org/P79934 and previous config saved to /var/cache/conftool/dbconfig/20250725-144344-marostegui.json
  • 14:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T399249)', diff saved to https://phabricator.wikimedia.org/P79933 and previous config saved to /var/cache/conftool/dbconfig/20250725-144329-marostegui.json
  • 14:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 14:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T399728)', diff saved to https://phabricator.wikimedia.org/P79932 and previous config saved to /var/cache/conftool/dbconfig/20250725-143238-fceratto.json
  • 14:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T399728)', diff saved to https://phabricator.wikimedia.org/P79931 and previous config saved to /var/cache/conftool/dbconfig/20250725-142730-fceratto.json
  • 14:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 14:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T399728)', diff saved to https://phabricator.wikimedia.org/P79930 and previous config saved to /var/cache/conftool/dbconfig/20250725-142708-fceratto.json
  • 14:20 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.2 - cmooney@cumin1003
  • 14:17 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.2 - cmooney@cumin1003
  • 14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P79929 and previous config saved to /var/cache/conftool/dbconfig/20250725-141201-fceratto.json
  • 13:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P79928 and previous config saved to /var/cache/conftool/dbconfig/20250725-135653-fceratto.json
  • 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T399728)', diff saved to https://phabricator.wikimedia.org/P79926 and previous config saved to /var/cache/conftool/dbconfig/20250725-134145-fceratto.json
  • 13:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T399728)', diff saved to https://phabricator.wikimedia.org/P79925 and previous config saved to /var/cache/conftool/dbconfig/20250725-133644-fceratto.json
  • 13:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 13:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T399728)', diff saved to https://phabricator.wikimedia.org/P79924 and previous config saved to /var/cache/conftool/dbconfig/20250725-133621-fceratto.json
  • 13:34 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:31 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P79922 and previous config saved to /var/cache/conftool/dbconfig/20250725-132113-fceratto.json
  • 13:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P79921 and previous config saved to /var/cache/conftool/dbconfig/20250725-130606-fceratto.json
  • 12:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T399728)', diff saved to https://phabricator.wikimedia.org/P79920 and previous config saved to /var/cache/conftool/dbconfig/20250725-125058-fceratto.json
  • 12:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T399728)', diff saved to https://phabricator.wikimedia.org/P79919 and previous config saved to /var/cache/conftool/dbconfig/20250725-124550-fceratto.json
  • 12:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 12:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 12:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T399728)', diff saved to https://phabricator.wikimedia.org/P79918 and previous config saved to /var/cache/conftool/dbconfig/20250725-124234-fceratto.json
  • 12:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P79917 and previous config saved to /var/cache/conftool/dbconfig/20250725-122727-fceratto.json
  • 12:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P79916 and previous config saved to /var/cache/conftool/dbconfig/20250725-121219-fceratto.json
  • 11:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T399728)', diff saved to https://phabricator.wikimedia.org/P79915 and previous config saved to /var/cache/conftool/dbconfig/20250725-115712-fceratto.json
  • 11:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T399728)', diff saved to https://phabricator.wikimedia.org/P79914 and previous config saved to /var/cache/conftool/dbconfig/20250725-115208-fceratto.json
  • 11:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 11:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T399728)', diff saved to https://phabricator.wikimedia.org/P79913 and previous config saved to /var/cache/conftool/dbconfig/20250725-115145-fceratto.json
  • 11:42 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.2 - cmooney@cumin1003
  • 11:39 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.2 - cmooney@cumin1003
  • 11:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P79912 and previous config saved to /var/cache/conftool/dbconfig/20250725-113638-fceratto.json
  • 11:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P79911 and previous config saved to /var/cache/conftool/dbconfig/20250725-112130-fceratto.json
  • 11:12 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.2 - cmooney@cumin1003
  • 11:10 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.2 - cmooney@cumin1003
  • 11:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T399728)', diff saved to https://phabricator.wikimedia.org/P79910 and previous config saved to /var/cache/conftool/dbconfig/20250725-110623-fceratto.json
  • 11:01 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T399728)', diff saved to https://phabricator.wikimedia.org/P79909 and previous config saved to /var/cache/conftool/dbconfig/20250725-110121-fceratto.json
  • 11:01 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 11:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T399728)', diff saved to https://phabricator.wikimedia.org/P79908 and previous config saved to /var/cache/conftool/dbconfig/20250725-110058-fceratto.json
  • 10:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P79907 and previous config saved to /var/cache/conftool/dbconfig/20250725-104551-fceratto.json
  • 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8657
  • 10:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P79906 and previous config saved to /var/cache/conftool/dbconfig/20250725-103043-fceratto.json
  • 10:30 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 8657
  • 10:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T399728)', diff saved to https://phabricator.wikimedia.org/P79905 and previous config saved to /var/cache/conftool/dbconfig/20250725-101536-fceratto.json
  • 10:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T399728)', diff saved to https://phabricator.wikimedia.org/P79904 and previous config saved to /var/cache/conftool/dbconfig/20250725-101017-fceratto.json
  • 10:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2221 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79903 and previous config saved to /var/cache/conftool/dbconfig/20250725-091715-root.json
  • 09:15 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:02 marostegui@cumin1002: dbctl commit (dc=all): 'db2221 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79902 and previous config saved to /var/cache/conftool/dbconfig/20250725-090209-root.json
  • 08:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2221 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79901 and previous config saved to /var/cache/conftool/dbconfig/20250725-084703-root.json
  • 08:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2221 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79900 and previous config saved to /var/cache/conftool/dbconfig/20250725-083158-root.json
  • 08:30 stevemunene@dns1004: END - running authdns-update
  • 08:29 stevemunene@dns1004: START - running authdns-update
  • 08:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2221 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79899 and previous config saved to /var/cache/conftool/dbconfig/20250725-082430-marostegui.json
  • 08:24 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2221.codfw.wmnet with reason: Maintenance
  • 08:03 eileen: config revision changed from 1e605150 to 1081636a
  • 07:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2222 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79898 and previous config saved to /var/cache/conftool/dbconfig/20250725-070721-root.json
  • 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2222 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79897 and previous config saved to /var/cache/conftool/dbconfig/20250725-065215-root.json
  • 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2222 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79896 and previous config saved to /var/cache/conftool/dbconfig/20250725-063710-root.json
  • 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2222 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79895 and previous config saved to /var/cache/conftool/dbconfig/20250725-062204-root.json
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2222 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79894 and previous config saved to /var/cache/conftool/dbconfig/20250725-061426-marostegui.json
  • 06:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2222.codfw.wmnet with reason: Maintenance
  • 06:02 marostegui: Starting es6 codfw failover from es2037 to es2035 - T400436
  • 06:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es2037.codfw.wmnet with reason: Maintenance
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2037 T400436', diff saved to https://phabricator.wikimedia.org/P79893 and previous config saved to /var/cache/conftool/dbconfig/20250725-060103-root.json
  • 06:00 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2035 to es6 primary T400436', diff saved to https://phabricator.wikimedia.org/P79892 and previous config saved to /var/cache/conftool/dbconfig/20250725-060005-marostegui.json
  • 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Set es2035 with weight 0 T400436', diff saved to https://phabricator.wikimedia.org/P79891 and previous config saved to /var/cache/conftool/dbconfig/20250725-055749-root.json
  • 05:57 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Primary switchover es6 T400436
  • 05:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2038 T400435', diff saved to https://phabricator.wikimedia.org/P79890 and previous config saved to /var/cache/conftool/dbconfig/20250725-055449-root.json
  • 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2039 to es7 primary T400435', diff saved to https://phabricator.wikimedia.org/P79889 and previous config saved to /var/cache/conftool/dbconfig/20250725-055342-root.json
  • 05:53 marostegui: Starting es7 codfw failover from es2038 to es2039 - T400435
  • 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'Set es2039 with weight 0 T400435', diff saved to https://phabricator.wikimedia.org/P79888 and previous config saved to /var/cache/conftool/dbconfig/20250725-055105-root.json
  • 05:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Primary switchover es7 T400435
  • 05:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 04:52 eileen: civicrm upgraded from 3c23a5c0 to 8d57b366
  • 00:01 ejegg: payments-wiki upgraded from eeff4fda to b942983f
  • 00:00 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddb1022.eqiad.wmnet with OS bookworm

2025-07-24

  • 23:11 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1022.eqiad.wmnet with OS bookworm
  • 23:09 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 23:04 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 23:04 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddb1022.eqiad.wmnet with OS bookworm
  • 22:42 logmsgbot: dreamyjazz Deployed security patch for T399093
  • 22:31 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Make SecurePoll channel log warnings (duration: 08m 03s)
  • 22:26 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 22:25 dreamyjazz@deploy1003: dreamyjazz: Backport for Make SecurePoll channel log warnings synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:23 dreamyjazz@deploy1003: Started scap sync-world: Backport for Make SecurePoll channel log warnings
  • 22:15 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1032.eqiad.wmnet with OS bookworm
  • 22:15 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1022.eqiad.wmnet with OS bookworm
  • 22:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 22:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T399249)', diff saved to https://phabricator.wikimedia.org/P79887 and previous config saved to /var/cache/conftool/dbconfig/20250724-221439-marostegui.json
  • 22:12 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:08 dreamyjazz@deploy1003: Finished scap sync-world: (no justification provided) (duration: 03m 36s)
  • 22:07 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:04 dreamyjazz@deploy1003: Started scap sync-world: (no justification provided)
  • 22:02 dancy@deploy1003: Installation of scap version "4.191.0" completed for 2 hosts
  • 22:00 dancy@deploy1003: Installing scap version "4.191.0" for 2 host(s)
  • 21:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P79886 and previous config saved to /var/cache/conftool/dbconfig/20250724-215931-marostegui.json
  • 21:51 logmsgbot: dreamyjazz Deployed security patch for T399093
  • 21:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P79885 and previous config saved to /var/cache/conftool/dbconfig/20250724-214423-marostegui.json
  • 21:41 ejegg: standalone (IPN listener) SmashPig upgraded from de30a87f to de092313
  • 21:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T399249)', diff saved to https://phabricator.wikimedia.org/P79884 and previous config saved to /var/cache/conftool/dbconfig/20250724-212916-marostegui.json
  • 21:23 zabe@deploy1003: Finished scap sync-world: Backport for Set categorylinks to read new on most wikis (T397912) (duration: 09m 28s)
  • 21:23 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1032.eqiad.wmnet with reason: host reimage
  • 21:18 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1032.eqiad.wmnet with reason: host reimage
  • 21:17 zabe@deploy1003: zabe: Continuing with sync
  • 21:16 zabe@deploy1003: zabe: Backport for Set categorylinks to read new on most wikis (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:14 zabe@deploy1003: Started scap sync-world: Backport for Set categorylinks to read new on most wikis (T397912)
  • 21:00 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1032.eqiad.wmnet with OS bookworm
  • 20:50 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1031.eqiad.wmnet with OS bookworm
  • 20:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1253 (T399249)', diff saved to https://phabricator.wikimedia.org/P79883 and previous config saved to /var/cache/conftool/dbconfig/20250724-204543-marostegui.json
  • 20:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1253.eqiad.wmnet with reason: Maintenance
  • 20:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T399249)', diff saved to https://phabricator.wikimedia.org/P79882 and previous config saved to /var/cache/conftool/dbconfig/20250724-204521-marostegui.json
  • 20:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P79881 and previous config saved to /var/cache/conftool/dbconfig/20250724-203013-marostegui.json
  • 20:23 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1031.eqiad.wmnet with reason: host reimage
  • 20:18 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1031.eqiad.wmnet with reason: host reimage
  • 20:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P79880 and previous config saved to /var/cache/conftool/dbconfig/20250724-201506-marostegui.json
  • 20:12 zabe@deploy1003: Finished scap sync-world: Backport for Enable the CampaignEvents extension on wikimaniawiki (T397369) (duration: 09m 46s)
  • 20:06 zabe@deploy1003: zabe, daimona: Continuing with sync
  • 20:04 zabe@deploy1003: zabe, daimona: Backport for Enable the CampaignEvents extension on wikimaniawiki (T397369) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:03 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1031.eqiad.wmnet with OS bookworm
  • 20:02 zabe@deploy1003: Started scap sync-world: Backport for Enable the CampaignEvents extension on wikimaniawiki (T397369)
  • 20:00 brett@dns1004: END - running authdns-update
  • 20:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T399249)', diff saved to https://phabricator.wikimedia.org/P79879 and previous config saved to /var/cache/conftool/dbconfig/20250724-195958-marostegui.json
  • 19:58 brett@dns1004: START - running authdns-update
  • 19:45 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1030.eqiad.wmnet with OS bookworm
  • 19:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddb1022.eqiad.wmnet with OS bookworm
  • 19:08 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1030.eqiad.wmnet with reason: host reimage
  • 19:03 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1030.eqiad.wmnet with reason: host reimage
  • 18:54 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1022.eqiad.wmnet with OS bookworm
  • 18:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T399249)', diff saved to https://phabricator.wikimedia.org/P79878 and previous config saved to /var/cache/conftool/dbconfig/20250724-185343-marostegui.json
  • 18:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 18:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T399249)', diff saved to https://phabricator.wikimedia.org/P79877 and previous config saved to /var/cache/conftool/dbconfig/20250724-185320-marostegui.json
  • 18:52 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host clouddb1022.eqiad.wmnet with OS bookworm
  • 18:48 marostegui@cumin1002: dbctl commit (dc=all): 'es2036 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P79876 and previous config saved to /var/cache/conftool/dbconfig/20250724-184815-root.json
  • 18:45 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1030.eqiad.wmnet with OS bookworm
  • 18:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P79875 and previous config saved to /var/cache/conftool/dbconfig/20250724-183813-marostegui.json
  • 18:33 marostegui@cumin1002: dbctl commit (dc=all): 'es2036 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P79874 and previous config saved to /var/cache/conftool/dbconfig/20250724-183309-root.json
  • 18:29 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1025.eqiad.wmnet with OS bookworm
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P79873 and previous config saved to /var/cache/conftool/dbconfig/20250724-182306-marostegui.json
  • 18:18 marostegui@cumin1002: dbctl commit (dc=all): 'es2036 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P79872 and previous config saved to /var/cache/conftool/dbconfig/20250724-181803-root.json
  • 18:15 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host clouddb1022.eqiad.wmnet with OS bookworm
  • 18:13 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:12 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.11 refs T396372
  • 18:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T399249)', diff saved to https://phabricator.wikimedia.org/P79871 and previous config saved to /var/cache/conftool/dbconfig/20250724-180758-marostegui.json
  • 18:06 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:02 marostegui@cumin1002: dbctl commit (dc=all): 'es2036 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P79870 and previous config saved to /var/cache/conftool/dbconfig/20250724-180258-root.json
  • 17:58 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1025.eqiad.wmnet with reason: host reimage
  • 17:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 17:52 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1025.eqiad.wmnet with reason: host reimage
  • 17:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T399728)', diff saved to https://phabricator.wikimedia.org/P79869 and previous config saved to /var/cache/conftool/dbconfig/20250724-175242-fceratto.json
  • 17:47 marostegui@cumin1002: dbctl commit (dc=all): 'es2036 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P79868 and previous config saved to /var/cache/conftool/dbconfig/20250724-174752-root.json
  • 17:38 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1025.eqiad.wmnet with OS bookworm
  • 17:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P79867 and previous config saved to /var/cache/conftool/dbconfig/20250724-173734-fceratto.json
  • 17:34 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1024.eqiad.wmnet with OS bookworm
  • 17:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 17:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P79866 and previous config saved to /var/cache/conftool/dbconfig/20250724-172227-fceratto.json
  • 17:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T399249)', diff saved to https://phabricator.wikimedia.org/P79865 and previous config saved to /var/cache/conftool/dbconfig/20250724-172140-marostegui.json
  • 17:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 17:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T399249)', diff saved to https://phabricator.wikimedia.org/P79864 and previous config saved to /var/cache/conftool/dbconfig/20250724-172117-marostegui.json
  • 17:17 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2036
  • 17:17 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es2036
  • 17:16 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:14 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 17:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T399728)', diff saved to https://phabricator.wikimedia.org/P79863 and previous config saved to /var/cache/conftool/dbconfig/20250724-170719-fceratto.json
  • 17:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P79862 and previous config saved to /var/cache/conftool/dbconfig/20250724-170608-marostegui.json
  • 16:53 hnowlan: delete thumbor pod where all instances displayed signs of T374350
  • 16:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2227 (T399728)', diff saved to https://phabricator.wikimedia.org/P79860 and previous config saved to /var/cache/conftool/dbconfig/20250724-165228-fceratto.json
  • 16:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2227.codfw.wmnet with reason: Maintenance
  • 16:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T399728)', diff saved to https://phabricator.wikimedia.org/P79859 and previous config saved to /var/cache/conftool/dbconfig/20250724-165205-fceratto.json
  • 16:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P79858 and previous config saved to /var/cache/conftool/dbconfig/20250724-165100-marostegui.json
  • 16:48 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1024.eqiad.wmnet with reason: host reimage
  • 16:43 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1024.eqiad.wmnet with reason: host reimage
  • 16:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P79857 and previous config saved to /var/cache/conftool/dbconfig/20250724-163658-fceratto.json
  • 16:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T399249)', diff saved to https://phabricator.wikimedia.org/P79856 and previous config saved to /var/cache/conftool/dbconfig/20250724-163553-marostegui.json
  • 16:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2036 T399927', diff saved to https://phabricator.wikimedia.org/P79855 and previous config saved to /var/cache/conftool/dbconfig/20250724-163439-root.json
  • 16:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2036.codfw.wmnet with reason: Maintenance
  • 16:27 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1024.eqiad.wmnet with OS bookworm
  • 16:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P79854 and previous config saved to /var/cache/conftool/dbconfig/20250724-162150-fceratto.json
  • 16:08 dancy@deploy1003: Installation of scap version "4.190.0" completed for 2 hosts
  • 16:06 dancy@deploy1003: Installing scap version "4.190.0" for 2 host(s)
  • 16:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T399728)', diff saved to https://phabricator.wikimedia.org/P79852 and previous config saved to /var/cache/conftool/dbconfig/20250724-160643-fceratto.json
  • 15:58 jhancock@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2079']
  • 15:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T399249)', diff saved to https://phabricator.wikimedia.org/P79851 and previous config saved to /var/cache/conftool/dbconfig/20250724-155206-marostegui.json
  • 15:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 15:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T399728)', diff saved to https://phabricator.wikimedia.org/P79850 and previous config saved to /var/cache/conftool/dbconfig/20250724-155151-fceratto.json
  • 15:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T399249)', diff saved to https://phabricator.wikimedia.org/P79849 and previous config saved to /var/cache/conftool/dbconfig/20250724-155144-marostegui.json
  • 15:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 15:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T399728)', diff saved to https://phabricator.wikimedia.org/P79848 and previous config saved to /var/cache/conftool/dbconfig/20250724-155128-fceratto.json
  • 15:51 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2079']
  • 15:48 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 15:48 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 15:46 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 15:37 jhancock@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cirrussearch2079']
  • 15:37 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2079']
  • 15:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P79847 and previous config saved to /var/cache/conftool/dbconfig/20250724-153637-marostegui.json
  • 15:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P79846 and previous config saved to /var/cache/conftool/dbconfig/20250724-153620-fceratto.json
  • 15:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P79845 and previous config saved to /var/cache/conftool/dbconfig/20250724-152129-marostegui.json
  • 15:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P79844 and previous config saved to /var/cache/conftool/dbconfig/20250724-152113-fceratto.json
  • 15:13 swfrench-wmf: reprepro include php-xhprof_2.3.10-1+wmf11u1 tideways_5.0.4-16+wmf11u2 in component/php83 - T398245
  • 15:13 swfrench-wmf: reprepro include php-xhprof_2.3.10-1+wmf11u1 in component/php81 - T398245
  • 15:06 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T399249)', diff saved to https://phabricator.wikimedia.org/P79843 and previous config saved to /var/cache/conftool/dbconfig/20250724-150622-marostegui.json
  • 15:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T399728)', diff saved to https://phabricator.wikimedia.org/P79842 and previous config saved to /var/cache/conftool/dbconfig/20250724-150605-fceratto.json
  • 15:05 krinkle@deploy1003: Finished scap sync-world: Backport for build: Fix failing `phpcs` in CI on commits updating interwiki.php, Update interwiki cache (duration: 08m 48s)
  • 15:01 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 15:00 krinkle@deploy1003: krinkle: Continuing with sync
  • 14:59 krinkle@deploy1003: krinkle: Backport for build: Fix failing `phpcs` in CI on commits updating interwiki.php, Update interwiki cache synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:57 krinkle@deploy1003: Started scap sync-world: Backport for build: Fix failing `phpcs` in CI on commits updating interwiki.php, Update interwiki cache
  • 14:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T399728)', diff saved to https://phabricator.wikimedia.org/P79841 and previous config saved to /var/cache/conftool/dbconfig/20250724-145112-fceratto.json
  • 14:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 14:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T399728)', diff saved to https://phabricator.wikimedia.org/P79840 and previous config saved to /var/cache/conftool/dbconfig/20250724-145048-fceratto.json
  • 14:44 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch2091.codfw.wmnet with reason: host reimage
  • 14:41 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch2091.codfw.wmnet with reason: host reimage
  • 14:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P79839 and previous config saved to /var/cache/conftool/dbconfig/20250724-143541-fceratto.json
  • 14:33 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:24 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 14:24 bking@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cirrussearch2079.codfw.wmnet with reason: T396718
  • 14:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P79838 and previous config saved to /var/cache/conftool/dbconfig/20250724-142033-fceratto.json
  • 14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T399249)', diff saved to https://phabricator.wikimedia.org/P79837 and previous config saved to /var/cache/conftool/dbconfig/20250724-142024-marostegui.json
  • 14:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T399249)', diff saved to https://phabricator.wikimedia.org/P79836 and previous config saved to /var/cache/conftool/dbconfig/20250724-142001-marostegui.json
  • 14:10 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 14:09 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 14:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T399728)', diff saved to https://phabricator.wikimedia.org/P79834 and previous config saved to /var/cache/conftool/dbconfig/20250724-140519-fceratto.json
  • 14:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P79833 and previous config saved to /var/cache/conftool/dbconfig/20250724-140454-marostegui.json
  • 13:50 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T399728)', diff saved to https://phabricator.wikimedia.org/P79832 and previous config saved to /var/cache/conftool/dbconfig/20250724-135027-fceratto.json
  • 13:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 13:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T399728)', diff saved to https://phabricator.wikimedia.org/P79831 and previous config saved to /var/cache/conftool/dbconfig/20250724-135004-fceratto.json
  • 13:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P79830 and previous config saved to /var/cache/conftool/dbconfig/20250724-134946-marostegui.json
  • 13:46 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Make TSP extensions have warning logs in logstash (duration: 08m 51s)
  • 13:43 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search,name=codfw
  • 13:40 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 13:39 dreamyjazz@deploy1003: dreamyjazz: Backport for Make TSP extensions have warning logs in logstash synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:37 dreamyjazz@deploy1003: Started scap sync-world: Backport for Make TSP extensions have warning logs in logstash
  • 13:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P79829 and previous config saved to /var/cache/conftool/dbconfig/20250724-133456-fceratto.json
  • 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T399249)', diff saved to https://phabricator.wikimedia.org/P79828 and previous config saved to /var/cache/conftool/dbconfig/20250724-133439-marostegui.json
  • 13:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P79826 and previous config saved to /var/cache/conftool/dbconfig/20250724-131949-fceratto.json
  • 13:19 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:18 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Deploy Readers Use Cases Survey v2 (T399736) (duration: 12m 07s)
  • 13:10 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, dani: Continuing with sync
  • 13:09 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, dani: Backport for Deploy Readers Use Cases Survey v2 (T399736) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:09 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1013.eqiad.wmnet with OS trixie
  • 13:09 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 13:08 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 13:05 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Deploy Readers Use Cases Survey v2 (T399736)
  • 13:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T399728)', diff saved to https://phabricator.wikimedia.org/P79824 and previous config saved to /var/cache/conftool/dbconfig/20250724-130441-fceratto.json
  • 12:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1013.eqiad.wmnet with reason: host reimage
  • 12:50 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T399728)', diff saved to https://phabricator.wikimedia.org/P79823 and previous config saved to /var/cache/conftool/dbconfig/20250724-125017-fceratto.json
  • 12:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 12:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T399728)', diff saved to https://phabricator.wikimedia.org/P79822 and previous config saved to /var/cache/conftool/dbconfig/20250724-124944-fceratto.json
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T399249)', diff saved to https://phabricator.wikimedia.org/P79821 and previous config saved to /var/cache/conftool/dbconfig/20250724-124904-marostegui.json
  • 12:48 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T399249)', diff saved to https://phabricator.wikimedia.org/P79820 and previous config saved to /var/cache/conftool/dbconfig/20250724-124842-marostegui.json
  • 12:47 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1013.eqiad.wmnet with reason: host reimage
  • 12:40 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:35 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1013.eqiad.wmnet with OS trixie
  • 12:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P79819 and previous config saved to /var/cache/conftool/dbconfig/20250724-123437-fceratto.json
  • 12:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P79818 and previous config saved to /var/cache/conftool/dbconfig/20250724-123334-marostegui.json
  • 12:31 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on gerrit2002.wikimedia.org with reason: maintenance
  • 12:29 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on gerrit2003.wikimedia.org with reason: maintenance
  • 12:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P79817 and previous config saved to /var/cache/conftool/dbconfig/20250724-121930-fceratto.json
  • 12:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P79816 and previous config saved to /var/cache/conftool/dbconfig/20250724-121827-marostegui.json
  • 12:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T399728)', diff saved to https://phabricator.wikimedia.org/P79815 and previous config saved to /var/cache/conftool/dbconfig/20250724-120422-fceratto.json
  • 12:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T399249)', diff saved to https://phabricator.wikimedia.org/P79814 and previous config saved to /var/cache/conftool/dbconfig/20250724-120319-marostegui.json
  • 11:50 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T399728)', diff saved to https://phabricator.wikimedia.org/P79813 and previous config saved to /var/cache/conftool/dbconfig/20250724-114957-fceratto.json
  • 11:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 11:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T399728)', diff saved to https://phabricator.wikimedia.org/P79812 and previous config saved to /var/cache/conftool/dbconfig/20250724-114934-fceratto.json
  • 11:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P79810 and previous config saved to /var/cache/conftool/dbconfig/20250724-113427-fceratto.json
  • 11:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T399249)', diff saved to https://phabricator.wikimedia.org/P79809 and previous config saved to /var/cache/conftool/dbconfig/20250724-112008-marostegui.json
  • 11:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P79808 and previous config saved to /var/cache/conftool/dbconfig/20250724-111919-fceratto.json
  • 11:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T399728)', diff saved to https://phabricator.wikimedia.org/P79807 and previous config saved to /var/cache/conftool/dbconfig/20250724-110412-fceratto.json
  • 10:49 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T399728)', diff saved to https://phabricator.wikimedia.org/P79806 and previous config saved to /var/cache/conftool/dbconfig/20250724-104938-fceratto.json
  • 10:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 10:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 10:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T399249)', diff saved to https://phabricator.wikimedia.org/P79805 and previous config saved to /var/cache/conftool/dbconfig/20250724-101721-marostegui.json
  • 10:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P79804 and previous config saved to /var/cache/conftool/dbconfig/20250724-100213-marostegui.json
  • 09:58 cgoubert@dns1004: END - running authdns-update
  • 09:58 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 09:58 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 09:57 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 09:57 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 09:57 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm
  • 09:57 cgoubert@dns1004: START - running authdns-update
  • 09:55 hnowlan@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:54 hnowlan@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P79803 and previous config saved to /var/cache/conftool/dbconfig/20250724-094706-marostegui.json
  • 09:42 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 09:37 vgutierrez: disable BGP for lvs1013 on lsw1-e1-eqiad.mgmt.eqiad.wmnet - T400259
  • 09:36 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T399249)', diff saved to https://phabricator.wikimedia.org/P79801 and previous config saved to /var/cache/conftool/dbconfig/20250724-093158-marostegui.json
  • 09:22 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
  • 09:13 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs1013.eqiad.wmnet} and A:liberica (T400259)
  • 09:12 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs1013.eqiad.wmnet} and A:liberica (T400259)
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T399249)', diff saved to https://phabricator.wikimedia.org/P79800 and previous config saved to /var/cache/conftool/dbconfig/20250724-082213-marostegui.json
  • 08:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T399249)', diff saved to https://phabricator.wikimedia.org/P79799 and previous config saved to /var/cache/conftool/dbconfig/20250724-082150-marostegui.json
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P79798 and previous config saved to /var/cache/conftool/dbconfig/20250724-080643-marostegui.json
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79797 and previous config saved to /var/cache/conftool/dbconfig/20250724-080617-root.json
  • 07:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P79796 and previous config saved to /var/cache/conftool/dbconfig/20250724-075135-marostegui.json
  • 07:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79795 and previous config saved to /var/cache/conftool/dbconfig/20250724-075112-root.json
  • 07:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T399249)', diff saved to https://phabricator.wikimedia.org/P79794 and previous config saved to /var/cache/conftool/dbconfig/20250724-073628-marostegui.json
  • 07:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79793 and previous config saved to /var/cache/conftool/dbconfig/20250724-073606-root.json
  • 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79792 and previous config saved to /var/cache/conftool/dbconfig/20250724-072100-root.json
  • 07:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1227 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79791 and previous config saved to /var/cache/conftool/dbconfig/20250724-071300-marostegui.json
  • 07:12 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T399249)', diff saved to https://phabricator.wikimedia.org/P79790 and previous config saved to /var/cache/conftool/dbconfig/20250724-065222-marostegui.json
  • 06:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P79789 and previous config saved to /var/cache/conftool/dbconfig/20250724-063300-root.json
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P79788 and previous config saved to /var/cache/conftool/dbconfig/20250724-061755-root.json
  • 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P79787 and previous config saved to /var/cache/conftool/dbconfig/20250724-060249-root.json
  • 05:47 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P79786 and previous config saved to /var/cache/conftool/dbconfig/20250724-054743-root.json
  • 05:32 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P79785 and previous config saved to /var/cache/conftool/dbconfig/20250724-053236-root.json
  • 05:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 01:28 ryankemper: [Cirrus] `ryankemper@cirrussearch2071:~$ sudo systemctl restart opensearch-disable-readahead-production-search-psi-codfw.service`
  • 01:01 ryankemper@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - ryankemper@cumin1002 - T397227

2025-07-23

  • 23:54 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release 20250723
  • 23:49 ryankemper@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - ryankemper@cumin1002 - T397227
  • 23:46 ryankemper: [Cirrus] Depooled codfw in anticipation of rolling restart. Hopefully minimal noise on this one :)
  • 23:46 ryankemper@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=search,name=codfw
  • 23:15 inflatador: pool cirrussearch eqiad, will resume investigations tomorrow T400160
  • 23:14 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search,name=eqiad
  • 23:08 bking@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 55 hosts with reason: testing cluster quorum
  • 22:53 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 22:17 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host clouddb1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:05 vriley@cumin1002: START - Cookbook sre.hosts.provision for host clouddb1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:57 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host clouddb1022
  • 21:56 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host clouddb1022
  • 21:55 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:55 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt clouddb1022 - vriley@cumin1002"
  • 21:55 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt clouddb1022 - vriley@cumin1002"
  • 21:52 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:15 cscott@deploy1003: Finished scap sync-world: Backport for Create "report visual bug" dialog (T365371), Disable ParserMigration indicator and user notice (T363484 T363472) (duration: 40m 57s)
  • 21:02 cscott@deploy1003: cscott: Continuing with sync
  • 20:58 cscott@deploy1003: cscott: Backport for Create "report visual bug" dialog (T365371), Disable ParserMigration indicator and user notice (T363484 T363472) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T399728)', diff saved to https://phabricator.wikimedia.org/P79784 and previous config saved to /var/cache/conftool/dbconfig/20250723-205548-fceratto.json
  • 20:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P79783 and previous config saved to /var/cache/conftool/dbconfig/20250723-204041-fceratto.json
  • 20:38 eileen: * civicrm upgraded from 3c23a5c0 to fccd9ef9
  • 20:37 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash1023.eqiad.wmnet with OS bookworm
  • 20:34 cscott@deploy1003: Started scap sync-world: Backport for Create "report visual bug" dialog (T365371), Disable ParserMigration indicator and user notice (T363484 T363472)
  • 20:32 cscott@deploy1003: Finished scap sync-world: Backport for Enable the "Report Visual Bug" feature of Extension:ParserMigration (T365371) (duration: 10m 32s)
  • 20:30 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 20:29 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 20:29 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 20:29 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 20:29 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 20:28 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 20:26 cscott@deploy1003: cscott: Continuing with sync
  • 20:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P79781 and previous config saved to /var/cache/conftool/dbconfig/20250723-202533-fceratto.json
  • 20:23 cscott@deploy1003: cscott: Backport for Enable the "Report Visual Bug" feature of Extension:ParserMigration (T365371) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:21 cscott@deploy1003: Started scap sync-world: Backport for Enable the "Report Visual Bug" feature of Extension:ParserMigration (T365371)
  • 20:18 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest1003.eqiad.wmnet with reason: redfish-test
  • 20:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T399728)', diff saved to https://phabricator.wikimedia.org/P79780 and previous config saved to /var/cache/conftool/dbconfig/20250723-201025-fceratto.json
  • 20:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2238 (T399728)', diff saved to https://phabricator.wikimedia.org/P79779 and previous config saved to /var/cache/conftool/dbconfig/20250723-200722-fceratto.json
  • 20:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2238.codfw.wmnet with reason: Maintenance
  • 20:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T399728)', diff saved to https://phabricator.wikimedia.org/P79778 and previous config saved to /var/cache/conftool/dbconfig/20250723-200659-fceratto.json
  • 20:02 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1023.eqiad.wmnet with reason: host reimage
  • 19:57 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1023.eqiad.wmnet with reason: host reimage
  • 19:57 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ml-serve1012.eqiad.wmnet with reason: redfish-test
  • 19:53 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2001.codfw.wmnet with reason: redfish-test
  • 19:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P79777 and previous config saved to /var/cache/conftool/dbconfig/20250723-195152-fceratto.json
  • 19:41 kharlan@deploy1003: Finished scap sync-world: Backport for AuthManager: Move temp account login to continueAuthentication (T398270) (duration: 11m 39s)
  • 19:41 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash1023.eqiad.wmnet with OS bookworm
  • 19:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P79776 and previous config saved to /var/cache/conftool/dbconfig/20250723-193644-fceratto.json
  • 19:36 kharlan@deploy1003: kharlan: Continuing with sync
  • 19:32 kharlan@deploy1003: kharlan: Backport for AuthManager: Move temp account login to continueAuthentication (T398270) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:30 kharlan@deploy1003: Started scap sync-world: Backport for AuthManager: Move temp account login to continueAuthentication (T398270)
  • 19:29 mutante: gitlab-runner* - apt-get upgrade - upgrading gitlab-runner, libgnutls30, ca-certificates
  • 19:28 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 19:26 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release 20250723
  • 19:24 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release 20250723
  • 19:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T399728)', diff saved to https://phabricator.wikimedia.org/P79775 and previous config saved to /var/cache/conftool/dbconfig/20250723-192136-fceratto.json
  • 19:20 bking@cumin1002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 19:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2226 (T399728)', diff saved to https://phabricator.wikimedia.org/P79774 and previous config saved to /var/cache/conftool/dbconfig/20250723-191841-fceratto.json
  • 19:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2226.codfw.wmnet with reason: Maintenance
  • 19:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T399728)', diff saved to https://phabricator.wikimedia.org/P79773 and previous config saved to /var/cache/conftool/dbconfig/20250723-191817-fceratto.json
  • 19:16 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release 20250723
  • 19:14 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 19:14 bking@cumin1002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release 20250723
  • 19:06 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:04 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P79772 and previous config saved to /var/cache/conftool/dbconfig/20250723-190309-fceratto.json
  • 19:03 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:02 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:02 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20250723
  • 19:01 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 19:01 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:00 ottomata: deploying eventgate-analytics-external and eventgate-logging-external to get meta.dt logic change - T376026
  • 18:59 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 18:59 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:58 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 18:52 inflatador: depool cirrussearch eqiad in preparation for rolling restart T399162
  • 18:51 bking@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=search,name=eqiad
  • 18:50 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.11 refs T396372
  • 18:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P79771 and previous config saved to /var/cache/conftool/dbconfig/20250723-184801-fceratto.json
  • 18:47 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:47 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 18:43 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:42 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T399728)', diff saved to https://phabricator.wikimedia.org/P79770 and previous config saved to /var/cache/conftool/dbconfig/20250723-183254-fceratto.json
  • 18:29 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2225 (T399728)', diff saved to https://phabricator.wikimedia.org/P79769 and previous config saved to /var/cache/conftool/dbconfig/20250723-182951-fceratto.json
  • 18:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2225.codfw.wmnet with reason: Maintenance
  • 18:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T399728)', diff saved to https://phabricator.wikimedia.org/P79768 and previous config saved to /var/cache/conftool/dbconfig/20250723-182928-fceratto.json
  • 18:18 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2032.codfw.wmnet with OS bookworm
  • 18:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P79767 and previous config saved to /var/cache/conftool/dbconfig/20250723-181420-fceratto.json
  • 17:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P79766 and previous config saved to /var/cache/conftool/dbconfig/20250723-175912-fceratto.json
  • 17:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T399728)', diff saved to https://phabricator.wikimedia.org/P79765 and previous config saved to /var/cache/conftool/dbconfig/20250723-174405-fceratto.json
  • 17:42 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8309
  • 17:41 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 8309
  • 17:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T399728)', diff saved to https://phabricator.wikimedia.org/P79764 and previous config saved to /var/cache/conftool/dbconfig/20250723-174104-fceratto.json
  • 17:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 17:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 17:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T399728)', diff saved to https://phabricator.wikimedia.org/P79763 and previous config saved to /var/cache/conftool/dbconfig/20250723-173930-fceratto.json
  • 17:37 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 63516
  • 17:37 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 63516
  • 17:25 swfrench-wmf: deleted tags for docker-registry.discovery.wmnet/httpd-bookworm - T378128
  • 17:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P79762 and previous config saved to /var/cache/conftool/dbconfig/20250723-172423-fceratto.json
  • 17:24 swfrench-wmf: deleted tags for docker-registry.discovery.wmnet/httpd-fcgi-bookworm - T378128
  • 17:22 swfrench-wmf: deleted tags for docker-registry.discovery.wmnet/mediawiki-httpd-bookworm - T378128
  • 17:11 swfrench@deploy1003: Finished scap sync-world: Deploy to remove php-ldap from debug images (duration: 03m 29s)
  • 17:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P79761 and previous config saved to /var/cache/conftool/dbconfig/20250723-170915-fceratto.json
  • 17:08 swfrench@deploy1003: Started scap sync-world: Deploy to remove php-ldap from debug images
  • 16:58 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2032.codfw.wmnet with reason: host reimage
  • 16:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T399728)', diff saved to https://phabricator.wikimedia.org/P79759 and previous config saved to /var/cache/conftool/dbconfig/20250723-165407-fceratto.json
  • 16:53 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2032.codfw.wmnet with reason: host reimage
  • 16:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T399728)', diff saved to https://phabricator.wikimedia.org/P79758 and previous config saved to /var/cache/conftool/dbconfig/20250723-165106-fceratto.json
  • 16:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 16:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T399728)', diff saved to https://phabricator.wikimedia.org/P79757 and previous config saved to /var/cache/conftool/dbconfig/20250723-165043-fceratto.json
  • 16:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P79756 and previous config saved to /var/cache/conftool/dbconfig/20250723-163536-fceratto.json
  • 16:31 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2032.codfw.wmnet with OS bookworm
  • 16:30 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2031.codfw.wmnet with OS bookworm
  • 16:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P79755 and previous config saved to /var/cache/conftool/dbconfig/20250723-162028-fceratto.json
  • 16:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T399728)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250723-160516-fceratto.json
  • 16:02 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T399728)', diff saved to https://phabricator.wikimedia.org/P79754 and previous config saved to /var/cache/conftool/dbconfig/20250723-160215-fceratto.json
  • 16:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 16:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T399728)', diff saved to https://phabricator.wikimedia.org/P79753 and previous config saved to /var/cache/conftool/dbconfig/20250723-160152-fceratto.json
  • 15:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P79752 and previous config saved to /var/cache/conftool/dbconfig/20250723-154645-fceratto.json
  • 15:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P79751 and previous config saved to /var/cache/conftool/dbconfig/20250723-153137-fceratto.json
  • 15:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T399728)', diff saved to https://phabricator.wikimedia.org/P79750 and previous config saved to /var/cache/conftool/dbconfig/20250723-151630-fceratto.json
  • 15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T399728)', diff saved to https://phabricator.wikimedia.org/P79749 and previous config saved to /var/cache/conftool/dbconfig/20250723-151325-fceratto.json
  • 15:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:03 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2031.codfw.wmnet with reason: host reimage
  • 14:58 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2031.codfw.wmnet with reason: host reimage
  • 14:46 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 14:46 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 14:38 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2031.codfw.wmnet with OS bookworm
  • 14:37 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2030.codfw.wmnet with OS bookworm
  • 14:27 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:27 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:17 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T399728)', diff saved to https://phabricator.wikimedia.org/P79746 and previous config saved to /var/cache/conftool/dbconfig/20250723-141353-fceratto.json
  • 14:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:04 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:04 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:03 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2030.codfw.wmnet with reason: host reimage
  • 14:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:59 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P79745 and previous config saved to /var/cache/conftool/dbconfig/20250723-135846-fceratto.json
  • 13:58 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2030.codfw.wmnet with reason: host reimage
  • 13:45 mszabo@deploy1003: Finished scap sync-world: Backport for Enable wgWikimediaEventsCreateAccountInstrumentation (T394744) (duration: 09m 31s)
  • 13:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P79743 and previous config saved to /var/cache/conftool/dbconfig/20250723-134338-fceratto.json
  • 13:40 mszabo@deploy1003: mszabo: Continuing with sync
  • 13:39 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2030.codfw.wmnet with OS bookworm
  • 13:38 mszabo@deploy1003: mszabo: Backport for Enable wgWikimediaEventsCreateAccountInstrumentation (T394744) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:36 mszabo@deploy1003: Started scap sync-world: Backport for Enable wgWikimediaEventsCreateAccountInstrumentation (T394744)
  • 13:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ssw1-d1-eqiad mgmt - ayounsi@cumin1003"
  • 13:35 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ssw1-d1-eqiad mgmt - ayounsi@cumin1003"
  • 13:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T399728)', diff saved to https://phabricator.wikimedia.org/P79741 and previous config saved to /var/cache/conftool/dbconfig/20250723-132831-fceratto.json
  • 13:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 13:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T399728)', diff saved to https://phabricator.wikimedia.org/P79740 and previous config saved to /var/cache/conftool/dbconfig/20250723-132548-fceratto.json
  • 13:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 13:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T399728)', diff saved to https://phabricator.wikimedia.org/P79739 and previous config saved to /var/cache/conftool/dbconfig/20250723-132525-fceratto.json
  • 13:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P79738 and previous config saved to /var/cache/conftool/dbconfig/20250723-131018-fceratto.json
  • 13:00 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 12:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 12:57 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:57 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P79737 and previous config saved to /var/cache/conftool/dbconfig/20250723-125510-fceratto.json
  • 12:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T399728)', diff saved to https://phabricator.wikimedia.org/P79736 and previous config saved to /var/cache/conftool/dbconfig/20250723-124003-fceratto.json
  • 12:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T399728)', diff saved to https://phabricator.wikimedia.org/P79735 and previous config saved to /var/cache/conftool/dbconfig/20250723-123722-fceratto.json
  • 12:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 12:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T399728)', diff saved to https://phabricator.wikimedia.org/P79734 and previous config saved to /var/cache/conftool/dbconfig/20250723-123659-fceratto.json
  • 12:32 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:31 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:29 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:29 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:28 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:28 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:28 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:27 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:27 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:27 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:27 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:25 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:25 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:24 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:23 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P79733 and previous config saved to /var/cache/conftool/dbconfig/20250723-122152-fceratto.json
  • 12:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P79732 and previous config saved to /var/cache/conftool/dbconfig/20250723-120645-fceratto.json
  • 11:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T399728)', diff saved to https://phabricator.wikimedia.org/P79731 and previous config saved to /var/cache/conftool/dbconfig/20250723-115137-fceratto.json
  • 11:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T399728)', diff saved to https://phabricator.wikimedia.org/P79730 and previous config saved to /var/cache/conftool/dbconfig/20250723-114853-fceratto.json
  • 11:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 11:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 11:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T399728)', diff saved to https://phabricator.wikimedia.org/P79729 and previous config saved to /var/cache/conftool/dbconfig/20250723-114740-fceratto.json
  • 11:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P79728 and previous config saved to /var/cache/conftool/dbconfig/20250723-113233-fceratto.json
  • 11:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P79727 and previous config saved to /var/cache/conftool/dbconfig/20250723-111725-fceratto.json
  • 11:13 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:07 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2198.codfw.wmnet
  • 11:07 jynus@cumin1003: START - Cookbook sre.hosts.remove-downtime for db2198.codfw.wmnet
  • 11:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T399728)', diff saved to https://phabricator.wikimedia.org/P79726 and previous config saved to /var/cache/conftool/dbconfig/20250723-110217-fceratto.json
  • 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T399728)', diff saved to https://phabricator.wikimedia.org/P79725 and previous config saved to /var/cache/conftool/dbconfig/20250723-105941-fceratto.json
  • 10:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T399728)', diff saved to https://phabricator.wikimedia.org/P79724 and previous config saved to /var/cache/conftool/dbconfig/20250723-105919-fceratto.json
  • 10:56 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 10:51 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1171.eqiad.wmnet
  • 10:51 jynus@cumin1003: START - Cookbook sre.hosts.remove-downtime for db1171.eqiad.wmnet
  • 10:49 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:47 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:45 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 10:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P79723 and previous config saved to /var/cache/conftool/dbconfig/20250723-104412-fceratto.json
  • 10:44 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2198.codfw.wmnet with reason: upgrade mariadb
  • 10:43 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 10:40 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:40 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:37 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:36 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:36 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:29 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: upgrade mariadb
  • 10:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P79722 and previous config saved to /var/cache/conftool/dbconfig/20250723-102905-fceratto.json
  • 10:27 arnaudb@cumin1003: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab1004.wikimedia.org to gitlab1003.wikimedia.org
  • 10:23 arnaudb@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'https://gitlab-replica-a.wikimedia.org/ https://gitlab-replica-b.wikimedia.org/' on all recursors
  • 10:23 arnaudb@cumin1003: START - Cookbook sre.dns.wipe-cache 'https://gitlab-replica-a.wikimedia.org/ https://gitlab-replica-b.wikimedia.org/' on all recursors
  • 10:23 arnaudb@dns1004: END - running authdns-update
  • 10:21 arnaudb@dns1004: START - running authdns-update
  • 10:17 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 10:16 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T399728)', diff saved to https://phabricator.wikimedia.org/P79721 and previous config saved to /var/cache/conftool/dbconfig/20250723-101358-fceratto.json
  • 10:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T399728)', diff saved to https://phabricator.wikimedia.org/P79720 and previous config saved to /var/cache/conftool/dbconfig/20250723-101226-fceratto.json
  • 10:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 10:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T399728)', diff saved to https://phabricator.wikimedia.org/P79719 and previous config saved to /var/cache/conftool/dbconfig/20250723-101204-fceratto.json
  • 10:01 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 10:01 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:01 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:59 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 09:59 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 09:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P79718 and previous config saved to /var/cache/conftool/dbconfig/20250723-095656-fceratto.json
  • 09:56 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:54 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:49 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 09:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P79717 and previous config saved to /var/cache/conftool/dbconfig/20250723-094149-fceratto.json
  • 09:28 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:27 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T399728)', diff saved to https://phabricator.wikimedia.org/P79716 and previous config saved to /var/cache/conftool/dbconfig/20250723-092641-fceratto.json
  • 09:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T399728)', diff saved to https://phabricator.wikimedia.org/P79715 and previous config saved to /var/cache/conftool/dbconfig/20250723-092359-fceratto.json
  • 09:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 09:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T399728)', diff saved to https://phabricator.wikimedia.org/P79714 and previous config saved to /var/cache/conftool/dbconfig/20250723-092336-fceratto.json
  • 09:15 arnaudb@cumin1003: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab1004.wikimedia.org to gitlab1003.wikimedia.org
  • 09:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P79712 and previous config saved to /var/cache/conftool/dbconfig/20250723-090829-fceratto.json
  • 08:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P79711 and previous config saved to /var/cache/conftool/dbconfig/20250723-085321-fceratto.json
  • 08:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T399728)', diff saved to https://phabricator.wikimedia.org/P79710 and previous config saved to /var/cache/conftool/dbconfig/20250723-083814-fceratto.json
  • 08:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T399728)', diff saved to https://phabricator.wikimedia.org/P79709 and previous config saved to /var/cache/conftool/dbconfig/20250723-083531-fceratto.json
  • 08:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 08:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T399728)', diff saved to https://phabricator.wikimedia.org/P79708 and previous config saved to /var/cache/conftool/dbconfig/20250723-083508-fceratto.json
  • 08:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P79707 and previous config saved to /var/cache/conftool/dbconfig/20250723-082000-fceratto.json
  • 08:15 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:15 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P79706 and previous config saved to /var/cache/conftool/dbconfig/20250723-080453-fceratto.json
  • 07:51 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 07:51 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 07:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T399728)', diff saved to https://phabricator.wikimedia.org/P79705 and previous config saved to /var/cache/conftool/dbconfig/20250723-074945-fceratto.json
  • 07:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T399728)', diff saved to https://phabricator.wikimedia.org/P79704 and previous config saved to /var/cache/conftool/dbconfig/20250723-074700-fceratto.json
  • 07:46 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 07:19 mvernon@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7:00:00 on aqs1012.eqiad.wmnet with reason: wait for eevans
  • 02:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T399728)', diff saved to https://phabricator.wikimedia.org/P79703 and previous config saved to /var/cache/conftool/dbconfig/20250723-021507-fceratto.json
  • 02:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T399249)', diff saved to https://phabricator.wikimedia.org/P79702 and previous config saved to /var/cache/conftool/dbconfig/20250723-020643-marostegui.json
  • 02:04 swfrench@deploy1003: Finished scap sync-world: Test deployment to verify new php8.1 images - T383557 (duration: 34m 39s)
  • 02:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P79701 and previous config saved to /var/cache/conftool/dbconfig/20250723-015959-fceratto.json
  • 01:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P79700 and previous config saved to /var/cache/conftool/dbconfig/20250723-015135-marostegui.json
  • 01:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P79699 and previous config saved to /var/cache/conftool/dbconfig/20250723-014451-fceratto.json
  • 01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P79698 and previous config saved to /var/cache/conftool/dbconfig/20250723-013627-marostegui.json
  • 01:32 swfrench@deploy1003: Started scap sync-world: Test deployment to verify new php8.1 images - T383557
  • 01:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T399728)', diff saved to https://phabricator.wikimedia.org/P79697 and previous config saved to /var/cache/conftool/dbconfig/20250723-012944-fceratto.json
  • 01:26 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T399728)', diff saved to https://phabricator.wikimedia.org/P79696 and previous config saved to /var/cache/conftool/dbconfig/20250723-012559-fceratto.json
  • 01:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 01:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T399249)', diff saved to https://phabricator.wikimedia.org/P79695 and previous config saved to /var/cache/conftool/dbconfig/20250723-012120-marostegui.json
  • 00:46 swfrench-wmf: rebuilt php8.1 production images (8.1.33-1-s2) on build2001 - T383557
  • 00:43 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2006-dev.codfw.wmnet with reason: host reimage
  • 00:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T399728)', diff saved to https://phabricator.wikimedia.org/P79694 and previous config saved to /var/cache/conftool/dbconfig/20250723-004014-fceratto.json
  • 00:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T399728)', diff saved to https://phabricator.wikimedia.org/P79693 and previous config saved to /var/cache/conftool/dbconfig/20250723-003740-fceratto.json
  • 00:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 00:37 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2006-dev.codfw.wmnet with reason: host reimage
  • 00:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 00:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T399728)', diff saved to https://phabricator.wikimedia.org/P79692 and previous config saved to /var/cache/conftool/dbconfig/20250723-003535-fceratto.json
  • 00:33 swfrench-wmf: ran DISTRIBUTIONS="bullseye" build-base-images on build2001 - T383557
  • 00:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2222 (T399249)', diff saved to https://phabricator.wikimedia.org/P79691 and previous config saved to /var/cache/conftool/dbconfig/20250723-002129-marostegui.json
  • 00:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2222.codfw.wmnet with reason: Maintenance
  • 00:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T399249)', diff saved to https://phabricator.wikimedia.org/P79690 and previous config saved to /var/cache/conftool/dbconfig/20250723-002106-marostegui.json
  • 00:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P79689 and previous config saved to /var/cache/conftool/dbconfig/20250723-002024-fceratto.json
  • 00:17 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2006-dev.codfw.wmnet with OS bullseye
  • 00:15 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2005-dev.codfw.wmnet with OS bullseye
  • 00:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P79688 and previous config saved to /var/cache/conftool/dbconfig/20250723-000558-marostegui.json
  • 00:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P79687 and previous config saved to /var/cache/conftool/dbconfig/20250723-000516-fceratto.json

2025-07-22

  • 23:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P79686 and previous config saved to /var/cache/conftool/dbconfig/20250722-235051-marostegui.json
  • 23:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T399728)', diff saved to https://phabricator.wikimedia.org/P79685 and previous config saved to /var/cache/conftool/dbconfig/20250722-235009-fceratto.json
  • 23:50 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2025.codfw.wmnet with OS bookworm
  • 23:46 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T399728)', diff saved to https://phabricator.wikimedia.org/P79684 and previous config saved to /var/cache/conftool/dbconfig/20250722-234625-fceratto.json
  • 23:46 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 23:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T399728)', diff saved to https://phabricator.wikimedia.org/P79683 and previous config saved to /var/cache/conftool/dbconfig/20250722-234602-fceratto.json
  • 23:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T399249)', diff saved to https://phabricator.wikimedia.org/P79682 and previous config saved to /var/cache/conftool/dbconfig/20250722-233543-marostegui.json
  • 23:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P79681 and previous config saved to /var/cache/conftool/dbconfig/20250722-233055-fceratto.json
  • 23:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P79680 and previous config saved to /var/cache/conftool/dbconfig/20250722-231547-fceratto.json
  • 23:14 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2025.codfw.wmnet with reason: host reimage
  • 23:08 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2025.codfw.wmnet with reason: host reimage
  • 23:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T399728)', diff saved to https://phabricator.wikimedia.org/P79679 and previous config saved to /var/cache/conftool/dbconfig/20250722-230039-fceratto.json
  • 22:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T399728)', diff saved to https://phabricator.wikimedia.org/P79678 and previous config saved to /var/cache/conftool/dbconfig/20250722-225657-fceratto.json
  • 22:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 22:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T399728)', diff saved to https://phabricator.wikimedia.org/P79677 and previous config saved to /var/cache/conftool/dbconfig/20250722-225634-fceratto.json
  • 22:49 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2025.codfw.wmnet with OS bookworm
  • 22:47 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2024.codfw.wmnet with OS bookworm
  • 22:43 ejegg: fundraising civicrm upgraded from 5eed1b2a to 3c23a5c0
  • 22:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P79676 and previous config saved to /var/cache/conftool/dbconfig/20250722-224126-fceratto.json
  • 22:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2221 (T399249)', diff saved to https://phabricator.wikimedia.org/P79675 and previous config saved to /var/cache/conftool/dbconfig/20250722-223603-marostegui.json
  • 22:35 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2221.codfw.wmnet with reason: Maintenance
  • 22:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T399249)', diff saved to https://phabricator.wikimedia.org/P79674 and previous config saved to /var/cache/conftool/dbconfig/20250722-223540-marostegui.json
  • 22:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P79673 and previous config saved to /var/cache/conftool/dbconfig/20250722-222619-fceratto.json
  • 22:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P79672 and previous config saved to /var/cache/conftool/dbconfig/20250722-222033-marostegui.json
  • 22:12 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2024.codfw.wmnet with reason: host reimage
  • 22:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T399728)', diff saved to https://phabricator.wikimedia.org/P79671 and previous config saved to /var/cache/conftool/dbconfig/20250722-221111-fceratto.json
  • 22:08 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2024.codfw.wmnet with reason: host reimage
  • 22:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T399728)', diff saved to https://phabricator.wikimedia.org/P79670 and previous config saved to /var/cache/conftool/dbconfig/20250722-220730-fceratto.json
  • 22:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 22:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T399728)', diff saved to https://phabricator.wikimedia.org/P79669 and previous config saved to /var/cache/conftool/dbconfig/20250722-220707-fceratto.json
  • 22:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P79668 and previous config saved to /var/cache/conftool/dbconfig/20250722-220525-marostegui.json
  • 21:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cirrussearch2089.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 21:53 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 21:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P79667 and previous config saved to /var/cache/conftool/dbconfig/20250722-215200-fceratto.json
  • 21:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T399249)', diff saved to https://phabricator.wikimedia.org/P79666 and previous config saved to /var/cache/conftool/dbconfig/20250722-215018-marostegui.json
  • 21:50 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2024.codfw.wmnet with OS bookworm
  • 21:49 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 21:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cirrussearch2089.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 21:45 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cirrussearch2089.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 21:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cirrussearch2089.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 21:41 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logstash2023.codfw.wmnet with OS bookworm
  • 21:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P79665 and previous config saved to /var/cache/conftool/dbconfig/20250722-213652-fceratto.json
  • 21:29 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2005-dev.codfw.wmnet with OS bullseye
  • 21:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T399728)', diff saved to https://phabricator.wikimedia.org/P79664 and previous config saved to /var/cache/conftool/dbconfig/20250722-212144-fceratto.json
  • 21:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T399728)', diff saved to https://phabricator.wikimedia.org/P79662 and previous config saved to /var/cache/conftool/dbconfig/20250722-211803-fceratto.json
  • 21:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 21:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T399728)', diff saved to https://phabricator.wikimedia.org/P79661 and previous config saved to /var/cache/conftool/dbconfig/20250722-211739-fceratto.json
  • 21:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P79660 and previous config saved to /var/cache/conftool/dbconfig/20250722-210232-fceratto.json
  • 20:56 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS bullseye
  • 20:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T399249)', diff saved to https://phabricator.wikimedia.org/P79659 and previous config saved to /var/cache/conftool/dbconfig/20250722-205039-marostegui.json
  • 20:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 20:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T399249)', diff saved to https://phabricator.wikimedia.org/P79658 and previous config saved to /var/cache/conftool/dbconfig/20250722-205017-marostegui.json
  • 20:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P79657 and previous config saved to /var/cache/conftool/dbconfig/20250722-204725-fceratto.json
  • 20:38 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 20:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P79656 and previous config saved to /var/cache/conftool/dbconfig/20250722-203509-marostegui.json
  • 20:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T399728)', diff saved to https://phabricator.wikimedia.org/P79655 and previous config saved to /var/cache/conftool/dbconfig/20250722-203217-fceratto.json
  • 20:31 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 20:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T399728)', diff saved to https://phabricator.wikimedia.org/P79654 and previous config saved to /var/cache/conftool/dbconfig/20250722-202835-fceratto.json
  • 20:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 20:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T399728)', diff saved to https://phabricator.wikimedia.org/P79653 and previous config saved to /var/cache/conftool/dbconfig/20250722-202813-fceratto.json
  • 20:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P79651 and previous config saved to /var/cache/conftool/dbconfig/20250722-202002-marostegui.json
  • 20:15 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bullseye
  • 20:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P79650 and previous config saved to /var/cache/conftool/dbconfig/20250722-201305-fceratto.json
  • 20:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T399249)', diff saved to https://phabricator.wikimedia.org/P79649 and previous config saved to /var/cache/conftool/dbconfig/20250722-200454-marostegui.json
  • 19:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P79648 and previous config saved to /var/cache/conftool/dbconfig/20250722-195757-fceratto.json
  • 19:44 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:43 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T399728)', diff saved to https://phabricator.wikimedia.org/P79647 and previous config saved to /var/cache/conftool/dbconfig/20250722-194250-fceratto.json
  • 19:41 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:41 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:41 ottomata: deploying eventgate-logging-external and eventgate-analytics-external to pick up meta.dt change - T376026
  • 19:41 eileen: civicrm upgraded from 63c1860b to 5eed1b2a
  • 19:40 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:39 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 19:39 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T399728)', diff saved to https://phabricator.wikimedia.org/P79646 and previous config saved to /var/cache/conftool/dbconfig/20250722-193908-fceratto.json
  • 19:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 19:38 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 19:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T399728)', diff saved to https://phabricator.wikimedia.org/P79645 and previous config saved to /var/cache/conftool/dbconfig/20250722-193846-fceratto.json
  • 19:37 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:36 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 19:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P79644 and previous config saved to /var/cache/conftool/dbconfig/20250722-192338-fceratto.json
  • 19:15 jgreen@dns1004: END - running authdns-update
  • 19:14 jgreen@dns1004: START - running authdns-update
  • 19:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P79643 and previous config saved to /var/cache/conftool/dbconfig/20250722-190830-fceratto.json
  • 19:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T399249)', diff saved to https://phabricator.wikimedia.org/P79642 and previous config saved to /var/cache/conftool/dbconfig/20250722-190144-marostegui.json
  • 19:01 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 18:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T399728)', diff saved to https://phabricator.wikimedia.org/P79641 and previous config saved to /var/cache/conftool/dbconfig/20250722-185323-fceratto.json
  • 18:49 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T399728)', diff saved to https://phabricator.wikimedia.org/P79640 and previous config saved to /var/cache/conftool/dbconfig/20250722-184932-fceratto.json
  • 18:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 18:49 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh7004.wikimedia.org with OS bookworm
  • 18:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T399728)', diff saved to https://phabricator.wikimedia.org/P79639 and previous config saved to /var/cache/conftool/dbconfig/20250722-184909-fceratto.json
  • 18:46 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum7003.magru.wmnet with OS bookworm
  • 18:44 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum7004.magru.wmnet with OS bookworm
  • 18:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P79638 and previous config saved to /var/cache/conftool/dbconfig/20250722-183402-fceratto.json
  • 18:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P79637 and previous config saved to /var/cache/conftool/dbconfig/20250722-181854-fceratto.json
  • 18:17 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.11 refs T396372
  • 18:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T399728)', diff saved to https://phabricator.wikimedia.org/P79636 and previous config saved to /var/cache/conftool/dbconfig/20250722-180347-fceratto.json
  • 18:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 17:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T399728)', diff saved to https://phabricator.wikimedia.org/P79635 and previous config saved to /var/cache/conftool/dbconfig/20250722-175943-fceratto.json
  • 17:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 17:59 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum7004.magru.wmnet with reason: host reimage
  • 17:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 17:54 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum7004.magru.wmnet with reason: host reimage
  • 17:47 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbprov2003.codfw.wmnet,dbprov1003.eqiad.wmnet
  • 17:47 jynus@cumin1003: START - Cookbook sre.hosts.remove-downtime for dbprov2003.codfw.wmnet,dbprov1003.eqiad.wmnet
  • 17:47 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2141.codfw.wmnet
  • 17:47 jynus@cumin1003: START - Cookbook sre.hosts.remove-downtime for db2141.codfw.wmnet
  • 17:44 topranks: un-drain Arelion 100G transport circuit IC-374549 cr1-eqiad <-> cr1-codfw after service restoration T399097
  • 17:44 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
  • 17:39 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
  • 17:32 swfrench@deploy1003: Finished scap sync-world: Make data-gateway mesh listener available - T368096 (duration: 06m 46s)
  • 17:32 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbprov2003.codfw.wmnet,dbprov1003.eqiad.wmnet with reason: upgrade mariadb
  • 17:32 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh7004.wikimedia.org with reason: host reimage
  • 17:31 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2141.codfw.wmnet with reason: upgrade mariadb
  • 17:29 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on doh7004.wikimedia.org with reason: host reimage
  • 17:27 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2023.codfw.wmnet with reason: host reimage
  • 17:27 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum7004.magru.wmnet with OS bookworm
  • 17:26 swfrench@deploy1003: Started scap sync-world: Make data-gateway mesh listener available - T368096
  • 17:23 cwhite@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2023.codfw.wmnet with reason: host reimage
  • 17:14 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 17:14 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 17:13 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 17:13 jhancock@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2091']
  • 17:13 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 17:12 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1150.eqiad.wmnet
  • 17:12 jynus@cumin1003: START - Cookbook sre.hosts.remove-downtime for db1150.eqiad.wmnet
  • 17:12 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 17:12 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 17:06 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2091']
  • 17:05 cwhite@cumin2002: START - Cookbook sre.hosts.reimage for host logstash2023.codfw.wmnet with OS bookworm
  • 17:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 17:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T399249)', diff saved to https://phabricator.wikimedia.org/P79634 and previous config saved to /var/cache/conftool/dbconfig/20250722-170347-marostegui.json
  • 17:01 jhancock@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cirrussearch2091']
  • 16:58 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host doh7004.wikimedia.org with OS bookworm
  • 16:58 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum7003.magru.wmnet with OS bookworm
  • 16:53 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2091']
  • 16:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P79633 and previous config saved to /var/cache/conftool/dbconfig/20250722-164838-marostegui.json
  • 16:47 jhancock@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cirrussearch2091']
  • 16:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P79632 and previous config saved to /var/cache/conftool/dbconfig/20250722-163330-marostegui.json
  • 16:30 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cirrussearch2091']
  • 16:20 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cirrussearch2091.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 16:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T399249)', diff saved to https://phabricator.wikimedia.org/P79631 and previous config saved to /var/cache/conftool/dbconfig/20250722-161823-marostegui.json
  • 16:16 sukhe: sudo cumin -b1 -s10 "A:dnsbox" "run-puppet-agent --enable 'merging CR 1170570'"
  • 16:14 sukhe@dns7002: END - running authdns-update
  • 16:13 sukhe@dns7002: START - running authdns-update
  • 16:13 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7002.wikimedia.org [reason: testing]
  • 16:10 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7002.wikimedia.org [reason: testing]
  • 16:10 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7002.magru.wmnet [reason: testing]
  • 16:10 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cirrussearch2091.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 16:06 sukhe: sudo cumin "A:dnsbox" "disable-puppet 'merging CR 1170570'": T362392
  • 16:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 16:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T399728)', diff saved to https://phabricator.wikimedia.org/P79630 and previous config saved to /var/cache/conftool/dbconfig/20250722-160419-fceratto.json
  • 15:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P79629 and previous config saved to /var/cache/conftool/dbconfig/20250722-154912-fceratto.json
  • 15:34 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.1 - cmooney@cumin1003
  • 15:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P79628 and previous config saved to /var/cache/conftool/dbconfig/20250722-153404-fceratto.json
  • 15:31 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.1 - cmooney@cumin1003
  • 15:20 zabe@deploy1003: Finished scap sync-world: Backport for Set virtual domain for GlobalUsage (T400169) (duration: 08m 26s)
  • 15:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T399728)', diff saved to https://phabricator.wikimedia.org/P79627 and previous config saved to /var/cache/conftool/dbconfig/20250722-151857-fceratto.json
  • 15:14 zabe@deploy1003: zabe: Continuing with sync
  • 15:14 zabe@deploy1003: zabe: Backport for Set virtual domain for GlobalUsage (T400169) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T399728)', diff saved to https://phabricator.wikimedia.org/P79626 and previous config saved to /var/cache/conftool/dbconfig/20250722-151355-fceratto.json
  • 15:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T399728)', diff saved to https://phabricator.wikimedia.org/P79625 and previous config saved to /var/cache/conftool/dbconfig/20250722-151331-fceratto.json
  • 15:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T399249)', diff saved to https://phabricator.wikimedia.org/P79624 and previous config saved to /var/cache/conftool/dbconfig/20250722-151224-marostegui.json
  • 15:12 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 15:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T399249)', diff saved to https://phabricator.wikimedia.org/P79623 and previous config saved to /var/cache/conftool/dbconfig/20250722-151201-marostegui.json
  • 15:11 zabe@deploy1003: Started scap sync-world: Backport for Set virtual domain for GlobalUsage (T400169)
  • 15:07 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 15:07 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 14:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P79622 and previous config saved to /var/cache/conftool/dbconfig/20250722-145823-fceratto.json
  • 14:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P79621 and previous config saved to /var/cache/conftool/dbconfig/20250722-145652-marostegui.json
  • 14:54 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 14:53 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 14:48 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 14:48 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: upgrade mariadb
  • 14:48 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 14:43 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 14:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P79619 and previous config saved to /var/cache/conftool/dbconfig/20250722-144316-fceratto.json
  • 14:42 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 14:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P79618 and previous config saved to /var/cache/conftool/dbconfig/20250722-144145-marostegui.json
  • 14:36 Daimona: mwscript-k8s --comment="T397270" -f --file /srv/mediawiki/php-1.45.0-wmf.9/extensions/CampaignEvents/maintenance/countryExceptionMappings.csv -- CampaignEvents:UpdateCountriesColumn --wiki metawiki --exceptions countryExceptionMappings.csv
  • 14:32 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2035
  • 14:32 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es2035
  • 14:32 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:30 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:29 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 14:29 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 14:28 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T399728)', diff saved to https://phabricator.wikimedia.org/P79612 and previous config saved to /var/cache/conftool/dbconfig/20250722-142808-fceratto.json
  • 14:27 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 14:27 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T399249)', diff saved to https://phabricator.wikimedia.org/P79611 and previous config saved to /var/cache/conftool/dbconfig/20250722-142637-marostegui.json
  • 14:26 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T399728)', diff saved to https://phabricator.wikimedia.org/P79610 and previous config saved to /var/cache/conftool/dbconfig/20250722-142302-fceratto.json
  • 14:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 14:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T399728)', diff saved to https://phabricator.wikimedia.org/P79609 and previous config saved to /var/cache/conftool/dbconfig/20250722-142239-fceratto.json
  • 14:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P79608 and previous config saved to /var/cache/conftool/dbconfig/20250722-140731-fceratto.json
  • 14:07 XioNoX: setup BGP to inter.link in esams
  • 13:57 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 13:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P79607 and previous config saved to /var/cache/conftool/dbconfig/20250722-135223-fceratto.json
  • 13:48 jgreen@dns1004: END - running authdns-update
  • 13:47 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 13:47 jgreen@dns1004: START - running authdns-update
  • 13:45 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2006.codfw.wmnet with OS bookworm
  • 13:45 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 13:44 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 13:44 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 13:43 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 13:39 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T399728)', diff saved to https://phabricator.wikimedia.org/P79606 and previous config saved to /var/cache/conftool/dbconfig/20250722-133716-fceratto.json
  • 13:36 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Modifications to UpdateCountriesScript (T397270) (duration: 08m 37s)
  • 13:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T399728)', diff saved to https://phabricator.wikimedia.org/P79605 and previous config saved to /var/cache/conftool/dbconfig/20250722-133211-fceratto.json
  • 13:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 13:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T399728)', diff saved to https://phabricator.wikimedia.org/P79604 and previous config saved to /var/cache/conftool/dbconfig/20250722-133148-fceratto.json
  • 13:31 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 13:31 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, daimona: Continuing with sync
  • 13:30 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, daimona: Backport for Modifications to UpdateCountriesScript (T397270) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1012.eqiad.wmnet with reason: host reimage
  • 13:28 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Modifications to UpdateCountriesScript (T397270)
  • 13:25 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1012.eqiad.wmnet with reason: host reimage
  • 13:23 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit2002, replica=gerrit2003)
  • 13:23 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit2002, replica=gerrit2003)
  • 13:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T399249)', diff saved to https://phabricator.wikimedia.org/P79603 and previous config saved to /var/cache/conftool/dbconfig/20250722-132133-marostegui.json
  • 13:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T399249)', diff saved to https://phabricator.wikimedia.org/P79602 and previous config saved to /var/cache/conftool/dbconfig/20250722-132107-marostegui.json
  • 13:19 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit2002, replica=gerrit2003)
  • 13:19 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit2002, replica=gerrit2003)
  • 13:18 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit2002, replica=gerrit2003)
  • 13:18 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit2002, replica=gerrit2003)
  • 13:17 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit2002, replica=gerrit2003)
  • 13:17 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit2002, replica=gerrit2003)
  • 13:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P79601 and previous config saved to /var/cache/conftool/dbconfig/20250722-131640-fceratto.json
  • 13:16 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Use new `sul` dblist for $wmgCampaignEventsUseCentralDB (duration: 11m 54s)
  • 13:13 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 13:13 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 13:12 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 13:12 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 13:10 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, daimona: Continuing with sync
  • 13:07 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Bugfix on rename - oblivian@cumin1003"
  • 13:07 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfix on rename - oblivian@cumin1003
  • 13:06 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfix on rename - oblivian@cumin1003
  • 13:06 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Bugfix on rename - oblivian@cumin1003"
  • 13:06 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, daimona: Backport for Use new `sul` dblist for $wmgCampaignEventsUseCentralDB synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P79600 and previous config saved to /var/cache/conftool/dbconfig/20250722-130600-marostegui.json
  • 13:04 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Use new `sul` dblist for $wmgCampaignEventsUseCentralDB
  • 13:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P79599 and previous config saved to /var/cache/conftool/dbconfig/20250722-130133-fceratto.json
  • 12:53 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 12:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P79598 and previous config saved to /var/cache/conftool/dbconfig/20250722-125052-marostegui.json
  • 12:49 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 12:49 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 12:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T399728)', diff saved to https://phabricator.wikimedia.org/P79597 and previous config saved to /var/cache/conftool/dbconfig/20250722-124626-fceratto.json
  • 12:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T399728)', diff saved to https://phabricator.wikimedia.org/P79596 and previous config saved to /var/cache/conftool/dbconfig/20250722-124121-fceratto.json
  • 12:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 12:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T399728)', diff saved to https://phabricator.wikimedia.org/P79595 and previous config saved to /var/cache/conftool/dbconfig/20250722-124058-fceratto.json
  • 12:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T399249)', diff saved to https://phabricator.wikimedia.org/P79594 and previous config saved to /var/cache/conftool/dbconfig/20250722-123545-marostegui.json
  • 12:29 ayounsi@dns1004: END - running authdns-update
  • 12:29 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:29 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:28 ayounsi@dns1004: START - running authdns-update
  • 12:28 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:27 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw2 decom - ayounsi@cumin1003"
  • 12:26 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw2 decom - ayounsi@cumin1003"
  • 12:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P79593 and previous config saved to /var/cache/conftool/dbconfig/20250722-122551-fceratto.json
  • 12:21 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "force sync after netmask changes netbox - cmooney@cumin1003"
  • 12:21 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 12:18 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "force sync after netmask changes netbox - cmooney@cumin1003"
  • 12:12 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "force sync after netmask changes netbox - cmooney@cumin1003"
  • 12:12 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "force sync after netmask changes netbox - cmooney@cumin1003"
  • 12:11 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:11 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P79592 and previous config saved to /var/cache/conftool/dbconfig/20250722-121043-fceratto.json
  • 12:08 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 12:08 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:08 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:07 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:06 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "force sync after netmask changes netbox - cmooney@cumin1003"
  • 12:06 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "force sync after netmask changes netbox - cmooney@cumin1003"
  • 12:01 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:00 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 11:58 logmsgbot: dreamyjazz Deployed security patch for T399627
  • 11:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T399728)', diff saved to https://phabricator.wikimedia.org/P79591 and previous config saved to /var/cache/conftool/dbconfig/20250722-115536-fceratto.json
  • 11:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T399728)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250722-115029-fceratto.json
  • 11:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 11:50 logmsgbot: dreamyjazz Deployed security patch for T399627
  • 11:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T399728)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250722-115002-fceratto.json
  • 11:37 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2239.codfw.wmnet
  • 11:37 jynus@cumin1003: START - Cookbook sre.hosts.remove-downtime for db2239.codfw.wmnet
  • 11:35 samtar@deploy1003: Finished scap sync-world: Backport for IS: Set wgTemplateDataEnableFeaturedTemplates default true (T391064) (duration: 13m 16s)
  • 11:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P79589 and previous config saved to /var/cache/conftool/dbconfig/20250722-113454-fceratto.json
  • 11:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T399249)', diff saved to https://phabricator.wikimedia.org/P79588 and previous config saved to /var/cache/conftool/dbconfig/20250722-112929-marostegui.json
  • 11:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T399249)', diff saved to https://phabricator.wikimedia.org/P79587 and previous config saved to /var/cache/conftool/dbconfig/20250722-112907-marostegui.json
  • 11:28 samtar@deploy1003: samtar: Continuing with sync
  • 11:26 samtar@deploy1003: samtar: Backport for IS: Set wgTemplateDataEnableFeaturedTemplates default true (T391064) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:22 samtar@deploy1003: Started scap sync-world: Backport for IS: Set wgTemplateDataEnableFeaturedTemplates default true (T391064)
  • 11:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P79586 and previous config saved to /var/cache/conftool/dbconfig/20250722-111947-fceratto.json
  • 11:18 phuedx@deploy1003: Finished scap sync-world: Backport for Revert "InstrumentConfigsFetcher: Make updating configs asynchronous" (duration: 38m 33s)
  • 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P79585 and previous config saved to /var/cache/conftool/dbconfig/20250722-111400-marostegui.json
  • 11:05 phuedx@deploy1003: phuedx: Continuing with sync
  • 11:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T399728)', diff saved to https://phabricator.wikimedia.org/P79584 and previous config saved to /var/cache/conftool/dbconfig/20250722-110440-fceratto.json
  • 11:04 phuedx@deploy1003: phuedx: Backport for Revert "InstrumentConfigsFetcher: Make updating configs asynchronous" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T399728)', diff saved to https://phabricator.wikimedia.org/P79583 and previous config saved to /var/cache/conftool/dbconfig/20250722-105936-fceratto.json
  • 10:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T399728)', diff saved to https://phabricator.wikimedia.org/P79582 and previous config saved to /var/cache/conftool/dbconfig/20250722-105914-fceratto.json
  • 10:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P79581 and previous config saved to /var/cache/conftool/dbconfig/20250722-105852-marostegui.json
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1253 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79580 and previous config saved to /var/cache/conftool/dbconfig/20250722-105345-root.json
  • 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2208 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79579 and previous config saved to /var/cache/conftool/dbconfig/20250722-104719-root.json
  • 10:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P79578 and previous config saved to /var/cache/conftool/dbconfig/20250722-104407-fceratto.json
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T399249)', diff saved to https://phabricator.wikimedia.org/P79577 and previous config saved to /var/cache/conftool/dbconfig/20250722-104345-marostegui.json
  • 10:40 phuedx@deploy1003: Started scap sync-world: Backport for Revert "InstrumentConfigsFetcher: Make updating configs asynchronous"
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1253 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79576 and previous config saved to /var/cache/conftool/dbconfig/20250722-103840-root.json
  • 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2208 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79575 and previous config saved to /var/cache/conftool/dbconfig/20250722-103213-root.json
  • 10:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P79574 and previous config saved to /var/cache/conftool/dbconfig/20250722-102859-fceratto.json
  • 10:24 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 10:24 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 10:23 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 10:23 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1253 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79573 and previous config saved to /var/cache/conftool/dbconfig/20250722-102334-root.json
  • 10:23 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 10:23 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 10:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2208 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79572 and previous config saved to /var/cache/conftool/dbconfig/20250722-101707-root.json
  • 10:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T399728)', diff saved to https://phabricator.wikimedia.org/P79571 and previous config saved to /var/cache/conftool/dbconfig/20250722-101352-fceratto.json
  • 10:12 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1243.eqiad.wmnet
  • 10:12 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1243.eqiad.wmnet
  • 10:09 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:09 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:08 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T399728)', diff saved to https://phabricator.wikimedia.org/P79570 and previous config saved to /var/cache/conftool/dbconfig/20250722-100851-fceratto.json
  • 10:08 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 10:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T399728)', diff saved to https://phabricator.wikimedia.org/P79569 and previous config saved to /var/cache/conftool/dbconfig/20250722-100829-fceratto.json
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1253 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79568 and previous config saved to /var/cache/conftool/dbconfig/20250722-100828-root.json
  • 10:07 claime: homer "cr*eqiad*" commit 'wikikube-worker1243 to active'
  • 10:06 cgoubert@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 10:04 cgoubert@cumin1003: START - Cookbook sre.dns.netbox
  • 10:02 marostegui@cumin1002: dbctl commit (dc=all): 'db2208 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79567 and previous config saved to /var/cache/conftool/dbconfig/20250722-100201-root.json
  • 10:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1253 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79566 and previous config saved to /var/cache/conftool/dbconfig/20250722-100040-marostegui.json
  • 10:00 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1253.eqiad.wmnet with reason: Maintenance
  • 09:56 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1240.eqiad.wmnet
  • 09:56 jynus@cumin1003: START - Cookbook sre.hosts.remove-downtime for db1240.eqiad.wmnet
  • 09:55 dcaro@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 09:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2208 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79565 and previous config saved to /var/cache/conftool/dbconfig/20250722-095402-marostegui.json
  • 09:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 09:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P79564 and previous config saved to /var/cache/conftool/dbconfig/20250722-095321-fceratto.json
  • 09:47 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2239.codfw.wmnet with reason: upgrade mariadb
  • 09:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 09:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T399249)', diff saved to https://phabricator.wikimedia.org/P79561 and previous config saved to /var/cache/conftool/dbconfig/20250722-093901-marostegui.json
  • 09:38 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 09:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P79560 and previous config saved to /var/cache/conftool/dbconfig/20250722-093814-fceratto.json
  • 09:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T399728)', diff saved to https://phabricator.wikimedia.org/P79558 and previous config saved to /var/cache/conftool/dbconfig/20250722-092306-fceratto.json
  • 09:18 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: upgrade mariadb
  • 09:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T399728)', diff saved to https://phabricator.wikimedia.org/P79556 and previous config saved to /var/cache/conftool/dbconfig/20250722-091800-fceratto.json
  • 09:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 09:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T399728)', diff saved to https://phabricator.wikimedia.org/P79555 and previous config saved to /var/cache/conftool/dbconfig/20250722-091737-fceratto.json
  • 09:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P79554 and previous config saved to /var/cache/conftool/dbconfig/20250722-090230-fceratto.json
  • 08:55 dcaro@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 08:51 dcaro@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 08:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P79553 and previous config saved to /var/cache/conftool/dbconfig/20250722-084722-fceratto.json
  • 08:35 dcaro@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 08:33 dcaro@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 08:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow2004.codfw.wmnet with OS bookworm
  • 08:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T399728)', diff saved to https://phabricator.wikimedia.org/P79552 and previous config saved to /var/cache/conftool/dbconfig/20250722-083215-fceratto.json
  • 08:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T399728)', diff saved to https://phabricator.wikimedia.org/P79551 and previous config saved to /var/cache/conftool/dbconfig/20250722-082819-fceratto.json
  • 08:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 08:23 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:23 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:13 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow2004.codfw.wmnet with reason: host reimage
  • 08:10 apergos: Ran fixStuckGlobalRename.php for T400117
  • 08:07 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow2004.codfw.wmnet with reason: host reimage
  • 07:53 dcaro@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 07:50 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host netflow2004.codfw.wmnet with OS bookworm
  • 07:40 ayounsi@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host netflow2004.codfw.wmnet
  • 07:40 ayounsi@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host netflow2004.codfw.wmnet with OS bookworm
  • 06:59 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host netflow2004.codfw.wmnet with OS bookworm
  • 06:58 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow2004.codfw.wmnet - ayounsi@cumin1003"
  • 06:58 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow2004.codfw.wmnet - ayounsi@cumin1003"
  • 06:58 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow2004.codfw.wmnet on all recursors
  • 06:58 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache netflow2004.codfw.wmnet on all recursors
  • 06:58 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:58 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow2004.codfw.wmnet - ayounsi@cumin1003"
  • 06:57 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow2004.codfw.wmnet - ayounsi@cumin1003"
  • 06:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es2035.codfw.wmnet with reason: Maintenance
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2035 T399927', diff saved to https://phabricator.wikimedia.org/P79550 and previous config saved to /var/cache/conftool/dbconfig/20250722-065454-root.json
  • 06:54 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 06:54 ayounsi@cumin1003: START - Cookbook sre.ganeti.makevm for new host netflow2004.codfw.wmnet
  • 06:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 06:35 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 06:35 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 06:14 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 06:13 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 04:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 04:02 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.8 (duration: 01m 50s)
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.11 refs T396372
  • 00:33 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search,name=eqiad
  • 00:26 bking@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on cirrussearch1081.eqiad.wmnet with reason: bad cluster node/MSS issue
  • 00:23 amastilovic@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 00:22 amastilovic@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 00:09 ryankemper: stop opensearch on `cirrussearch1081`
  • 00:00 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227

2025-07-21

  • 23:56 zabe@deploy1003: Finished scap sync-world: Backport for Set categorylinks to read new on remaining s2 and s3 wikis (duration: 08m 19s)
  • 23:53 bking@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=search,name=eqiad
  • 23:51 zabe@deploy1003: zabe: Continuing with sync
  • 23:50 zabe@deploy1003: zabe: Backport for Set categorylinks to read new on remaining s2 and s3 wikis synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:48 zabe@deploy1003: Started scap sync-world: Backport for Set categorylinks to read new on remaining s2 and s3 wikis
  • 23:32 zabe@deploy1003: Finished scap sync-world: Backport for Set categorylinks to read new on remaining large s2 wikis (T397912) (duration: 08m 23s)
  • 23:27 zabe@deploy1003: zabe: Continuing with sync
  • 23:26 zabe@deploy1003: zabe: Backport for Set categorylinks to read new on remaining large s2 wikis (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:24 zabe@deploy1003: Started scap sync-world: Backport for Set categorylinks to read new on remaining large s2 wikis (T397912)
  • 23:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 23:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T399249)', diff saved to https://phabricator.wikimedia.org/P79549 and previous config saved to /var/cache/conftool/dbconfig/20250721-231417-marostegui.json
  • 22:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P79548 and previous config saved to /var/cache/conftool/dbconfig/20250721-225910-marostegui.json
  • 22:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P79547 and previous config saved to /var/cache/conftool/dbconfig/20250721-224402-marostegui.json
  • 22:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T399249)', diff saved to https://phabricator.wikimedia.org/P79546 and previous config saved to /var/cache/conftool/dbconfig/20250721-222855-marostegui.json
  • 22:24 jhathaway@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 22:22 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 22:05 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 22:02 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 21:52 maryum: undeploy security fix for T399627
  • 21:39 maryum: deploy security fix for T399662
  • 21:39 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 21:39 maryum: deploy securty fix for T399662
  • 21:38 bking@cumin2002: conftool action : set/weight=10; selector: name=cirrussearch2061.*
  • 21:38 bking@cumin2002: conftool action : set/pooled=yes; selector: name=cirrussearch2061.*
  • 21:38 bking@cumin2002: conftool action : set/pooled=yes; selector: name=cirrussearch2061*
  • 21:34 jhathaway@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 21:32 bking@cumin2002: conftool action : set/pooled=yes; selector: name=cirrussearch2061
  • 21:32 bking@cumin2002: conftool action : set/weight=10; selector: name=cirrussearch2061
  • 21:29 maryum: deploy security patch for T399627
  • 21:28 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 21:25 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 21:21 jhathaway@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 21:21 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 21:14 jhathaway@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 21:14 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 21:13 jhathaway@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 21:13 eileen: civicrm upgraded from 60b2a914 to 63c1860b
  • 21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T399249)', diff saved to https://phabricator.wikimedia.org/P79544 and previous config saved to /var/cache/conftool/dbconfig/20250721-210531-marostegui.json
  • 21:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T399249)', diff saved to https://phabricator.wikimedia.org/P79542 and previous config saved to /var/cache/conftool/dbconfig/20250721-210501-marostegui.json
  • 21:04 cjming: end of UTC late backport window
  • 21:04 cjming@deploy1003: Finished scap sync-world: Backport for InstrumentConfigsFetcher: Make updating configs asynchronous (T398422) (duration: 46m 25s)
  • 20:55 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 20:51 cjming@deploy1003: cjming: Continuing with sync
  • 20:51 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2005-dev.codfw.wmnet with OS bullseye
  • 20:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P79541 and previous config saved to /var/cache/conftool/dbconfig/20250721-204954-marostegui.json
  • 20:41 cjming@deploy1003: cjming: Backport for InstrumentConfigsFetcher: Make updating configs asynchronous (T398422) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P79540 and previous config saved to /var/cache/conftool/dbconfig/20250721-203446-marostegui.json
  • 20:33 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2005-dev.codfw.wmnet with reason: host reimage
  • 20:30 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2005-dev.codfw.wmnet with reason: host reimage
  • 20:27 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.*
  • 20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T399249)', diff saved to https://phabricator.wikimedia.org/P79539 and previous config saved to /var/cache/conftool/dbconfig/20250721-201939-marostegui.json
  • 20:17 cjming@deploy1003: Started scap sync-world: Backport for InstrumentConfigsFetcher: Make updating configs asynchronous (T398422)
  • 20:15 jhathaway@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 20:14 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2005-dev.codfw.wmnet with OS bullseye
  • 20:14 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd2005-dev.codfw.wmnet with OS bullseye
  • 20:13 bvibber@deploy1003: Finished scap sync-world: Backport for xLab: Add instrumentation for logged-out user retention (T399227) (duration: 10m 54s)
  • 20:13 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 20:11 jhathaway@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 20:10 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 20:09 jhathaway@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 20:06 bvibber@deploy1003: ksarabia, bvibber: Continuing with sync
  • 20:05 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2005-dev.codfw.wmnet with OS bullseye
  • 20:04 bvibber@deploy1003: ksarabia, bvibber: Backport for xLab: Add instrumentation for logged-out user retention (T399227) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:02 bvibber@deploy1003: Started scap sync-world: Backport for xLab: Add instrumentation for logged-out user retention (T399227)
  • 20:01 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 19:59 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 19:59 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 19:58 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 19:49 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 19:49 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 19:49 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2064.codfw.wmnet|cirrussearch2073.codfw.wmnet|cirrussearch2078.codfw.wmnet|cirrussearch2094.codfw.wmnet|cirrussearch2095.codfw.wmnet|cirrussearch2096.codfw.wmnet|cirrussearch2110.codfw.wmnet
  • 19:36 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
  • 19:34 bking@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 19:21 bking@cumin1002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 19:21 bking@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 19:18 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
  • 19:12 bking@cumin1002: START - Cookbook sre.hosts.reimage for host cirrussearch2091.codfw.wmnet with OS bullseye
  • 19:12 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
  • 18:54 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
  • 18:53 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
  • 18:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T399249)', diff saved to https://phabricator.wikimedia.org/P79537 and previous config saved to /var/cache/conftool/dbconfig/20250721-185203-marostegui.json
  • 18:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 18:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T399249)', diff saved to https://phabricator.wikimedia.org/P79536 and previous config saved to /var/cache/conftool/dbconfig/20250721-185152-marostegui.json
  • 18:38 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
  • 18:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P79535 and previous config saved to /var/cache/conftool/dbconfig/20250721-183645-marostegui.json
  • 18:35 andrew@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
  • 18:32 bking@cumin2002: conftool action : set/pooled=yes; selector: name=cirrussearch2078
  • 18:30 bking@cumin2002: conftool action : set/weight=10; selector: name=cirrussearch2078
  • 18:30 bking@cumin2002: conftool action : set/pooled=yes; selector: name=cirrussearch2094
  • 18:30 bking@cumin2002: conftool action : set/pooled=yes; selector: name=cirrussearch2078
  • 18:27 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: T399162 - bking@cumin1002
  • 18:25 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: T399162 - bking@cumin1002
  • 18:25 bking@cumin2002: conftool action : set/weight=10; selector: name=cirrussearch2079
  • 18:25 bking@cumin2002: conftool action : set/pooled=yes; selector: name=cirrussearch2079
  • 18:24 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: T399162 - bking@cumin1002
  • 18:22 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: T399162 - bking@cumin1002
  • 18:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P79534 and previous config saved to /var/cache/conftool/dbconfig/20250721-182137-marostegui.json
  • 18:20 bking@cumin2002: conftool action : set/pooled=yes; selector: name=cirrussearch2095
  • 18:20 bking@cumin2002: conftool action : set/weight=10; selector: name=cirrussearch2095
  • 18:17 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: T399162 - bking@cumin1002
  • 18:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T399249)', diff saved to https://phabricator.wikimedia.org/P79533 and previous config saved to /var/cache/conftool/dbconfig/20250721-180630-marostegui.json
  • 17:55 brett@dns1004: END - running authdns-update
  • 17:54 brett@dns1004: START - running authdns-update
  • 17:53 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: T399162 - bking@cumin1002
  • 17:42 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: T399162 - bking@cumin1002
  • 17:35 jynus: recovering database data T399980
  • 17:27 mutante: deploying varnish change on cp4037 as test host
  • 17:24 mutante: disabling puppet on A:cp (112 hosts) to deploy gerrit:117941 T274228
  • 17:18 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2004-dev.codfw.wmnet with OS bullseye
  • 17:13 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: T399162 - bking@cumin1002
  • 17:09 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
  • 16:51 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:51 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T399249)', diff saved to https://phabricator.wikimedia.org/P79532 and previous config saved to /var/cache/conftool/dbconfig/20250721-164008-marostegui.json
  • 16:40 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 16:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T399249)', diff saved to https://phabricator.wikimedia.org/P79531 and previous config saved to /var/cache/conftool/dbconfig/20250721-163945-marostegui.json
  • 16:36 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2006-dev.codfw.wmnet with OS bullseye
  • 16:25 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:25 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:25 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
  • 16:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P79530 and previous config saved to /var/cache/conftool/dbconfig/20250721-162437-marostegui.json
  • 16:18 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2006-dev.codfw.wmnet with reason: host reimage
  • 16:14 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2006-dev.codfw.wmnet with reason: host reimage
  • 16:10 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 16:10 vgutierrez: cumin 'A:cp' 'systemctl reset-failed update-ocsp-all.timer' - T399114
  • 16:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P79528 and previous config saved to /var/cache/conftool/dbconfig/20250721-160930-marostegui.json
  • 16:08 jforrester@deploy1003: Finished scap sync-world: Backport for ZLangRegistry::fetchLanguageCodeFromZid: Check for invalid Title too (T399755), Provide a repo-mode pair of parser functions for showing label/description (duration: 37m 34s)
  • 16:08 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: Dell support
  • 16:08 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7001.*
  • 16:06 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 16:00 bking@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
  • 15:57 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2006-dev.codfw.wmnet with OS bullseye
  • 15:56 andrew@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2006-dev.codfw.wmnet with OS bullseye
  • 15:56 jforrester@deploy1003: jforrester: Continuing with sync
  • 15:55 jforrester@deploy1003: jforrester: Backport for ZLangRegistry::fetchLanguageCodeFromZid: Check for invalid Title too (T399755), Provide a repo-mode pair of parser functions for showing label/description synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:54 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 15:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T399249)', diff saved to https://phabricator.wikimedia.org/P79527 and previous config saved to /var/cache/conftool/dbconfig/20250721-155421-marostegui.json
  • 15:50 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 15:48 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
  • 15:32 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 15:31 jforrester@deploy1003: Started scap sync-world: Backport for ZLangRegistry::fetchLanguageCodeFromZid: Check for invalid Title too (T399755), Provide a repo-mode pair of parser functions for showing label/description
  • 15:27 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 15:08 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 14:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T395241)', diff saved to https://phabricator.wikimedia.org/P79526 and previous config saved to /var/cache/conftool/dbconfig/20250721-144358-fceratto.json
  • 14:42 bking@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
  • 14:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
  • 14:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:38 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh7003.wikimedia.org with OS bookworm
  • 14:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T395241)', diff saved to https://phabricator.wikimedia.org/P79525 and previous config saved to /var/cache/conftool/dbconfig/20250721-143658-fceratto.json
  • 14:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 14:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T399249)', diff saved to https://phabricator.wikimedia.org/P79524 and previous config saved to /var/cache/conftool/dbconfig/20250721-142811-marostegui.json
  • 14:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 14:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T399249)', diff saved to https://phabricator.wikimedia.org/P79523 and previous config saved to /var/cache/conftool/dbconfig/20250721-142749-marostegui.json
  • 14:20 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2212 to s1 primary T398014', diff saved to https://phabricator.wikimedia.org/P79521 and previous config saved to /var/cache/conftool/dbconfig/20250721-142011-fceratto.json
  • 14:19 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2006-dev.codfw.wmnet with OS bullseye
  • 14:18 federico3: Starting s1 codfw failover from db2203 to db2212 - T398014
  • 14:18 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh7003.wikimedia.org with reason: host reimage
  • 14:15 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Clean up some settings for special wikis no longer in wikipedia group (T183549) (duration: 11m 34s)
  • 14:14 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on doh7003.wikimedia.org with reason: host reimage
  • 14:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P79520 and previous config saved to /var/cache/conftool/dbconfig/20250721-141242-marostegui.json
  • 14:11 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2212 with weight 0 T398014', diff saved to https://phabricator.wikimedia.org/P79519 and previous config saved to /var/cache/conftool/dbconfig/20250721-141126-fceratto.json
  • 14:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s1 T398014
  • 14:09 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, daimona: Continuing with sync
  • 14:05 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, daimona: Backport for Clean up some settings for special wikis no longer in wikipedia group (T183549) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:03 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Clean up some settings for special wikis no longer in wikipedia group (T183549)
  • 13:59 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Add a test to verify that "normal" DBLists contain only SUL wikis (T183549) (duration: 08m 01s)
  • 13:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P79518 and previous config saved to /var/cache/conftool/dbconfig/20250721-135735-marostegui.json
  • 13:54 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, daimona: Continuing with sync
  • 13:53 bking@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
  • 13:53 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, daimona: Backport for Add a test to verify that "normal" DBLists contain only SUL wikis (T183549) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:53 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/ using stat1009.eqiad.wmnet)
  • 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Add a test to verify that "normal" DBLists contain only SUL wikis (T183549)
  • 13:50 bking@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/ using stat1009.eqiad.wmnet)
  • 13:49 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs://wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/ using stat1009.eqiad.wmnet)
  • 13:49 bking@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs://wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/ using stat1009.eqiad.wmnet)
  • 13:49 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Add phan and use it to detect duplicated array keys (duration: 10m 03s)
  • 13:48 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs://wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/ using stat1009.eqiad.wmnet)
  • 13:48 bking@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs://wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250714/wiki=wikidata/ using stat1009.eqiad.wmnet)
  • 13:47 cmooney@dns2005: END - running authdns-update
  • 13:46 cmooney@dns2005: START - running authdns-update
  • 13:45 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host doh7003.wikimedia.org with OS bookworm
  • 13:44 lucaswerkmeister-wmde@deploy1003: daimona, lucaswerkmeister-wmde: Continuing with sync
  • 13:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T399249)', diff saved to https://phabricator.wikimedia.org/P79517 and previous config saved to /var/cache/conftool/dbconfig/20250721-134227-marostegui.json
  • 13:41 lucaswerkmeister-wmde@deploy1003: daimona, lucaswerkmeister-wmde: Backport for Add phan and use it to detect duplicated array keys synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:39 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Add phan and use it to detect duplicated array keys
  • 13:34 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:34 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:33 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Move special wikis outside of the 'wikipedia' group (T183549) (duration: 14m 48s)
  • 13:27 lucaswerkmeister-wmde@deploy1003: daimona, lucaswerkmeister-wmde: Continuing with sync
  • 13:25 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:25 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:20 lucaswerkmeister-wmde@deploy1003: daimona, lucaswerkmeister-wmde: Backport for Move special wikis outside of the 'wikipedia' group (T183549) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:18 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Move special wikis outside of the 'wikipedia' group (T183549)
  • 13:15 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Explicitly set wgServer etc. for private wikis under the 'wikipedia' dblist (T183549), Enable wbui2025 mobile user interface on Wikidata Beta (T399703) (duration: 11m 50s)
  • 13:13 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:13 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:10 lucaswerkmeister-wmde@deploy1003: jforrester, arthurtaylor, lucaswerkmeister-wmde: Continuing with sync
  • 13:06 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:06 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:05 lucaswerkmeister-wmde@deploy1003: jforrester, arthurtaylor, lucaswerkmeister-wmde: Backport for Explicitly set wgServer etc. for private wikis under the 'wikipedia' dblist (T183549), Enable wbui2025 mobile user interface on Wikidata Beta (T399703) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:03 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Explicitly set wgServer etc. for private wikis under the 'wikipedia' dblist (T183549), Enable wbui2025 mobile user interface on Wikidata Beta (T399703)
  • 12:39 XioNoX: deploy CR1169662 to test and magru routed ganeti
  • 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T399249)', diff saved to https://phabricator.wikimedia.org/P79516 and previous config saved to /var/cache/conftool/dbconfig/20250721-121549-marostegui.json
  • 12:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T399249)', diff saved to https://phabricator.wikimedia.org/P79515 and previous config saved to /var/cache/conftool/dbconfig/20250721-121526-marostegui.json
  • 12:15 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:15 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:14 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:14 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P79514 and previous config saved to /var/cache/conftool/dbconfig/20250721-120019-marostegui.json
  • 11:57 ayounsi@puppetserver1001: conftool action : set/pooled=yes; selector: name=ncredir7004.magru.wmnet
  • 11:56 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:56 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:56 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:55 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:53 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for backup1007.eqiad.wmnet
  • 11:53 jynus@cumin1003: START - Cookbook sre.hosts.remove-downtime for backup1007.eqiad.wmnet
  • 11:50 XioNoX: depool ncredir7004 for ganeti7002 bird upgrade
  • 11:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P79513 and previous config saved to /var/cache/conftool/dbconfig/20250721-114511-marostegui.json
  • 11:30 XioNoX: depool and move ncredir7003 to ganeti7003
  • 11:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T399249)', diff saved to https://phabricator.wikimedia.org/P79512 and previous config saved to /var/cache/conftool/dbconfig/20250721-113004-marostegui.json
  • 11:24 XioNoX: move doh7003 (insetup) to ganeti7002
  • 10:50 samtar@deploy1003: Finished scap sync-world: Backport for IS/IS-labs: Initial state of wgTemplateDataEnableFeaturedTemplates (T391064) (duration: 15m 20s)
  • 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2182 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79511 and previous config saved to /var/cache/conftool/dbconfig/20250721-104439-root.json
  • 10:42 samtar@deploy1003: samtar: Continuing with sync
  • 10:36 samtar@deploy1003: samtar: Backport for IS/IS-labs: Initial state of wgTemplateDataEnableFeaturedTemplates (T391064) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:35 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79510 and previous config saved to /var/cache/conftool/dbconfig/20250721-103520-root.json
  • 10:34 samtar@deploy1003: Started scap sync-world: Backport for IS/IS-labs: Initial state of wgTemplateDataEnableFeaturedTemplates (T391064)
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2182 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79509 and previous config saved to /var/cache/conftool/dbconfig/20250721-102934-root.json
  • 10:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79508 and previous config saved to /var/cache/conftool/dbconfig/20250721-102014-root.json
  • 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2182 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79507 and previous config saved to /var/cache/conftool/dbconfig/20250721-101426-root.json
  • 10:13 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 10:05 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79506 and previous config saved to /var/cache/conftool/dbconfig/20250721-100508-root.json
  • 10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T399249)', diff saved to https://phabricator.wikimedia.org/P79505 and previous config saved to /var/cache/conftool/dbconfig/20250721-100418-marostegui.json
  • 10:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:03 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 10:03 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 10:02 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 10:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 09:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2182 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79504 and previous config saved to /var/cache/conftool/dbconfig/20250721-095918-root.json
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2182 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79503 and previous config saved to /var/cache/conftool/dbconfig/20250721-095112-marostegui.json
  • 09:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79502 and previous config saved to /var/cache/conftool/dbconfig/20250721-095009-root.json
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79501 and previous config saved to /var/cache/conftool/dbconfig/20250721-095001-root.json
  • 09:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79500 and previous config saved to /var/cache/conftool/dbconfig/20250721-094636-root.json
  • 09:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1194 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79499 and previous config saved to /var/cache/conftool/dbconfig/20250721-094221-marostegui.json
  • 09:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79498 and previous config saved to /var/cache/conftool/dbconfig/20250721-093504-root.json
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79497 and previous config saved to /var/cache/conftool/dbconfig/20250721-093131-root.json
  • 09:20 XioNoX: manually install bird2_2.17.1+branch.mq.bgp.multilisten.c47b08 on ganeti2033 and ganeti700x - T362392
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79496 and previous config saved to /var/cache/conftool/dbconfig/20250721-091958-root.json
  • 09:18 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db1246.eqiad.wmnet
  • 09:18 marostegui@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:18 marostegui@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1246.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
  • 09:18 marostegui@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1246.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
  • 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79495 and previous config saved to /var/cache/conftool/dbconfig/20250721-091625-root.json
  • 09:15 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Drop bw_limit_duration from haproxy_action - oblivian@cumin1003"
  • 09:15 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Drop bw_limit_duration from haproxy_action - oblivian@cumin1003
  • 09:15 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Drop bw_limit_duration from haproxy_action - oblivian@cumin1003
  • 09:15 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Drop bw_limit_duration from haproxy_action - oblivian@cumin1003"
  • 09:14 marostegui@cumin1002: START - Cookbook sre.dns.netbox
  • 09:08 marostegui@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1246.eqiad.wmnet
  • 09:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P79494 and previous config saved to /var/cache/conftool/dbconfig/20250721-090830-root.json
  • 09:05 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79493 and previous config saved to /var/cache/conftool/dbconfig/20250721-090452-root.json
  • 09:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79492 and previous config saved to /var/cache/conftool/dbconfig/20250721-090119-root.json
  • 08:59 fabfur: restarting haproxykafka service on cp5017
  • 08:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1191 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79491 and previous config saved to /var/cache/conftool/dbconfig/20250721-085719-marostegui.json
  • 08:57 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2168 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79490 and previous config saved to /var/cache/conftool/dbconfig/20250721-085333-marostegui.json
  • 08:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P79489 and previous config saved to /var/cache/conftool/dbconfig/20250721-085325-root.json
  • 08:52 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 08:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79488 and previous config saved to /var/cache/conftool/dbconfig/20250721-084604-root.json
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79487 and previous config saved to /var/cache/conftool/dbconfig/20250721-084334-root.json
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P79486 and previous config saved to /var/cache/conftool/dbconfig/20250721-083819-root.json
  • 08:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79485 and previous config saved to /var/cache/conftool/dbconfig/20250721-083058-root.json
  • 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79484 and previous config saved to /var/cache/conftool/dbconfig/20250721-082829-root.json
  • 08:26 awight: slow morning deployment finished
  • 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P79483 and previous config saved to /var/cache/conftool/dbconfig/20250721-082337-root.json
  • 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P79482 and previous config saved to /var/cache/conftool/dbconfig/20250721-082313-root.json
  • 08:21 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 08:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79481 and previous config saved to /var/cache/conftool/dbconfig/20250721-081553-root.json
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79480 and previous config saved to /var/cache/conftool/dbconfig/20250721-081323-root.json
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2209 T399930', diff saved to https://phabricator.wikimedia.org/P79479 and previous config saved to /var/cache/conftool/dbconfig/20250721-080951-marostegui.json
  • 08:09 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2205 to s3 primary T399930', diff saved to https://phabricator.wikimedia.org/P79478 and previous config saved to /var/cache/conftool/dbconfig/20250721-080907-marostegui.json
  • 08:08 marostegui: Starting s3 codfw failover from db2209 to db2205 - T399930
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P79477 and previous config saved to /var/cache/conftool/dbconfig/20250721-080831-root.json
  • 08:07 awight@deploy1003: Finished scap sync-world: Backport for Revert "VE: Enforce referenceslist reserialization when MW changed" (T400013 T396017) (duration: 37m 13s)
  • 08:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s3 T399930
  • 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2205 with weight 0 T399930', diff saved to https://phabricator.wikimedia.org/P79476 and previous config saved to /var/cache/conftool/dbconfig/20250721-080528-root.json
  • 08:01 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Maintenance in s3
  • 08:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2159 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79475 and previous config saved to /var/cache/conftool/dbconfig/20250721-080047-root.json
  • 08:00 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:59 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79474 and previous config saved to /var/cache/conftool/dbconfig/20250721-075817-root.json
  • 07:55 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 07:54 awight@deploy1003: wmde-fisch, awight: Continuing with sync
  • 07:53 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P79473 and previous config saved to /var/cache/conftool/dbconfig/20250721-075325-root.json
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2159 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79472 and previous config saved to /var/cache/conftool/dbconfig/20250721-075256-marostegui.json
  • 07:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 07:51 awight@deploy1003: wmde-fisch, awight: Backport for Revert "VE: Enforce referenceslist reserialization when MW changed" (T400013 T396017) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P79471 and previous config saved to /var/cache/conftool/dbconfig/20250721-075120-root.json
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1174 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79470 and previous config saved to /var/cache/conftool/dbconfig/20250721-075025-marostegui.json
  • 07:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P79469 and previous config saved to /var/cache/conftool/dbconfig/20250721-073819-root.json
  • 07:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P79468 and previous config saved to /var/cache/conftool/dbconfig/20250721-073614-root.json
  • 07:30 awight@deploy1003: Started scap sync-world: Backport for Revert "VE: Enforce referenceslist reserialization when MW changed" (T400013 T396017)
  • 07:23 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P79467 and previous config saved to /var/cache/conftool/dbconfig/20250721-072313-root.json
  • 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P79466 and previous config saved to /var/cache/conftool/dbconfig/20250721-072108-root.json
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1037', diff saved to https://phabricator.wikimedia.org/P79465 and previous config saved to /var/cache/conftool/dbconfig/20250721-072037-root.json
  • 07:19 marostegui@cumin1002: dbctl commit (dc=all): 'Remove weight from es7 master', diff saved to https://phabricator.wikimedia.org/P79464 and previous config saved to /var/cache/conftool/dbconfig/20250721-071949-marostegui.json
  • 07:17 marostegui@cumin1002: dbctl commit (dc=all): 'Set es6 eqiad back to read-write - T400027', diff saved to https://phabricator.wikimedia.org/P79463 and previous config saved to /var/cache/conftool/dbconfig/20250721-071744-marostegui.json
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Set es6 eqiad as read-only for maintenance - T400027', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250721-071604-marostegui.json
  • 07:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Maintenance in es6
  • 07:10 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2235].codfw.wmnet,db[1164,1217].eqiad.wmnet with reason: Maintenance
  • 07:09 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 07:09 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79461 and previous config saved to /var/cache/conftool/dbconfig/20250721-070910-root.json
  • 07:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2018.codfw.wmnet,pc1018.eqiad.wmnet with reason: Maintenance
  • 07:06 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 07:06 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P79459 and previous config saved to /var/cache/conftool/dbconfig/20250721-070602-root.json
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79458 and previous config saved to /var/cache/conftool/dbconfig/20250721-070243-root.json
  • 06:57 marostegui@cumin1002: dbctl commit (dc=all): 'Set es7 eqiad back to read-write', diff saved to https://phabricator.wikimedia.org/P79457 and previous config saved to /var/cache/conftool/dbconfig/20250721-065710-marostegui.json
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79456 and previous config saved to /var/cache/conftool/dbconfig/20250721-065405-root.json
  • 06:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Primary switchover es7 T400028
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P79455 and previous config saved to /var/cache/conftool/dbconfig/20250721-065057-root.json
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'Set es7 eqiad as read-only for maintenance', diff saved to https://phabricator.wikimedia.org/P79454 and previous config saved to /var/cache/conftool/dbconfig/20250721-065049-marostegui.json
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1039', diff saved to https://phabricator.wikimedia.org/P79453 and previous config saved to /var/cache/conftool/dbconfig/20250721-064755-root.json
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79452 and previous config saved to /var/cache/conftool/dbconfig/20250721-064738-root.json
  • 06:47 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1039.eqiad.wmnet with reason: Maintenance
  • 06:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79451 and previous config saved to /var/cache/conftool/dbconfig/20250721-063859-root.json
  • 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79450 and previous config saved to /var/cache/conftool/dbconfig/20250721-063232-root.json
  • 06:31 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79449 and previous config saved to /var/cache/conftool/dbconfig/20250721-062353-root.json
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2150 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79448 and previous config saved to /var/cache/conftool/dbconfig/20250721-061726-root.json
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1170 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79447 and previous config saved to /var/cache/conftool/dbconfig/20250721-061606-marostegui.json
  • 06:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2150 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79446 and previous config saved to /var/cache/conftool/dbconfig/20250721-060923-marostegui.json
  • 06:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 05:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 04:59 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 04:56 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .

2025-07-19

  • 19:09 Ammar: Ran fixStuckGlobalRename.php for T399985
  • 11:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T399249)', diff saved to https://phabricator.wikimedia.org/P79444 and previous config saved to /var/cache/conftool/dbconfig/20250719-112151-marostegui.json
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P79443 and previous config saved to /var/cache/conftool/dbconfig/20250719-110644-marostegui.json
  • 10:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P79442 and previous config saved to /var/cache/conftool/dbconfig/20250719-105135-marostegui.json
  • 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T399249)', diff saved to https://phabricator.wikimedia.org/P79441 and previous config saved to /var/cache/conftool/dbconfig/20250719-103628-marostegui.json
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2227 (T399249)', diff saved to https://phabricator.wikimedia.org/P79440 and previous config saved to /var/cache/conftool/dbconfig/20250719-080850-marostegui.json
  • 08:08 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2227.codfw.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T399249)', diff saved to https://phabricator.wikimedia.org/P79439 and previous config saved to /var/cache/conftool/dbconfig/20250719-080837-marostegui.json
  • 07:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P79438 and previous config saved to /var/cache/conftool/dbconfig/20250719-075330-marostegui.json
  • 07:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P79437 and previous config saved to /var/cache/conftool/dbconfig/20250719-073822-marostegui.json
  • 07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T399249)', diff saved to https://phabricator.wikimedia.org/P79436 and previous config saved to /var/cache/conftool/dbconfig/20250719-072315-marostegui.json
  • 06:59 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 06:59 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 06:55 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 06:55 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 05:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T399249)', diff saved to https://phabricator.wikimedia.org/P79433 and previous config saved to /var/cache/conftool/dbconfig/20250719-050137-marostegui.json
  • 05:01 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 05:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T399249)', diff saved to https://phabricator.wikimedia.org/P79432 and previous config saved to /var/cache/conftool/dbconfig/20250719-050114-marostegui.json
  • 04:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P79431 and previous config saved to /var/cache/conftool/dbconfig/20250719-044607-marostegui.json
  • 04:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P79430 and previous config saved to /var/cache/conftool/dbconfig/20250719-043058-marostegui.json
  • 04:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T399249)', diff saved to https://phabricator.wikimedia.org/P79429 and previous config saved to /var/cache/conftool/dbconfig/20250719-041550-marostegui.json
  • 01:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T399249)', diff saved to https://phabricator.wikimedia.org/P79428 and previous config saved to /var/cache/conftool/dbconfig/20250719-015846-marostegui.json
  • 01:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T399249)', diff saved to https://phabricator.wikimedia.org/P79427 and previous config saved to /var/cache/conftool/dbconfig/20250719-015823-marostegui.json
  • 01:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P79426 and previous config saved to /var/cache/conftool/dbconfig/20250719-014315-marostegui.json
  • 01:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P79425 and previous config saved to /var/cache/conftool/dbconfig/20250719-012808-marostegui.json
  • 01:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T399249)', diff saved to https://phabricator.wikimedia.org/P79424 and previous config saved to /var/cache/conftool/dbconfig/20250719-011301-marostegui.json

2025-07-18

  • 22:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T399249)', diff saved to https://phabricator.wikimedia.org/P79422 and previous config saved to /var/cache/conftool/dbconfig/20250718-225658-marostegui.json
  • 22:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 22:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T399249)', diff saved to https://phabricator.wikimedia.org/P79421 and previous config saved to /var/cache/conftool/dbconfig/20250718-225635-marostegui.json
  • 22:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P79420 and previous config saved to /var/cache/conftool/dbconfig/20250718-224127-marostegui.json
  • 22:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P79419 and previous config saved to /var/cache/conftool/dbconfig/20250718-222620-marostegui.json
  • 22:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T399249)', diff saved to https://phabricator.wikimedia.org/P79418 and previous config saved to /var/cache/conftool/dbconfig/20250718-221112-marostegui.json
  • 21:57 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 21:57 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 21:49 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 21:49 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 21:28 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 21:28 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:14 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:13 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T399249)', diff saved to https://phabricator.wikimedia.org/P79417 and previous config saved to /var/cache/conftool/dbconfig/20250718-194951-marostegui.json
  • 19:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T399249)', diff saved to https://phabricator.wikimedia.org/P79416 and previous config saved to /var/cache/conftool/dbconfig/20250718-194938-marostegui.json
  • 19:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P79415 and previous config saved to /var/cache/conftool/dbconfig/20250718-193431-marostegui.json
  • 19:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P79414 and previous config saved to /var/cache/conftool/dbconfig/20250718-191924-marostegui.json
  • 19:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T399249)', diff saved to https://phabricator.wikimedia.org/P79413 and previous config saved to /var/cache/conftool/dbconfig/20250718-190416-marostegui.json
  • 17:49 jgleeson: civicrm upgraded from bf098cc5 to 60b2a914
  • 16:55 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 16:55 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 16:55 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 16:55 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 16:55 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 16:55 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 16:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T399249)', diff saved to https://phabricator.wikimedia.org/P79411 and previous config saved to /var/cache/conftool/dbconfig/20250718-164128-marostegui.json
  • 16:41 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 16:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T399249)', diff saved to https://phabricator.wikimedia.org/P79410 and previous config saved to /var/cache/conftool/dbconfig/20250718-164105-marostegui.json
  • 16:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P79409 and previous config saved to /var/cache/conftool/dbconfig/20250718-162557-marostegui.json
  • 16:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P79408 and previous config saved to /var/cache/conftool/dbconfig/20250718-161050-marostegui.json
  • 15:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T399249)', diff saved to https://phabricator.wikimedia.org/P79407 and previous config saved to /var/cache/conftool/dbconfig/20250718-155542-marostegui.json
  • 15:25 hashar@deploy1003: Finished deploy [integration/docroot@6384514]: build: Updating mediawiki/mediawiki-phan-config to 0.16.0 (duration: 00m 12s)
  • 15:25 hashar@deploy1003: Started deploy [integration/docroot@6384514]: build: Updating mediawiki/mediawiki-phan-config to 0.16.0
  • 14:30 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2242.codfw.wmnet
  • 14:30 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2242 gradually with 4 steps - Upgrade of db2242.codfw.wmnet completed
  • 14:25 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:21 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:20 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:14 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:14 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 14:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79404 and previous config saved to /var/cache/conftool/dbconfig/20250718-141156-root.json
  • 14:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:05 Dreamy_Jazz: Running `foreachwiki AbuseFilter:PopulateAbuseFilterLogIPHex.php` for T397842
  • 14:05 Dreamy_Jazz: Stopped the previous command
  • 14:02 Dreamy_Jazz: Running `foreachwiki AbuseFilter:PopulateAbuseFilterLogIPHex.php --batch-size 1000 --sleep 1` for T397842
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79402 and previous config saved to /var/cache/conftool/dbconfig/20250718-135650-root.json
  • 13:45 marostegui@cumin1002: START - Cookbook sre.mysql.pool db2242 gradually with 4 steps - Upgrade of db2242.codfw.wmnet completed
  • 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79400 and previous config saved to /var/cache/conftool/dbconfig/20250718-134144-root.json
  • 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'es1048 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P79399 and previous config saved to /var/cache/conftool/dbconfig/20250718-134021-root.json
  • 13:39 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2242 - Upgrading db2242.codfw.wmnet
  • 13:39 marostegui@cumin1002: START - Cookbook sre.mysql.depool db2242 - Upgrading db2242.codfw.wmnet
  • 13:39 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2242.codfw.wmnet
  • 13:38 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2242.codfw.wmnet with reason: Maintenance
  • 13:37 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 13:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T399249)', diff saved to https://phabricator.wikimedia.org/P79397 and previous config saved to /var/cache/conftool/dbconfig/20250718-133533-marostegui.json
  • 13:35 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259 (T399249)', diff saved to https://phabricator.wikimedia.org/P79396 and previous config saved to /var/cache/conftool/dbconfig/20250718-133424-marostegui.json
  • 13:30 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 13:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79395 and previous config saved to /var/cache/conftool/dbconfig/20250718-132638-root.json
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'es1048 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P79394 and previous config saved to /var/cache/conftool/dbconfig/20250718-132515-root.json
  • 13:23 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on backup1007.eqiad.wmnet with reason: failed disk
  • 13:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P79393 and previous config saved to /var/cache/conftool/dbconfig/20250718-131917-marostegui.json
  • 13:17 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:17 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:16 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1212 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79392 and previous config saved to /var/cache/conftool/dbconfig/20250718-131554-marostegui.json
  • 13:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 13:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 10 hosts with reason: Maintenance
  • 13:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79391 and previous config saved to /var/cache/conftool/dbconfig/20250718-131431-root.json
  • 13:14 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:12 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:12 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:12 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:12 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'es1048 (re)pooling @ 65%: Repooling', diff saved to https://phabricator.wikimedia.org/P79390 and previous config saved to /var/cache/conftool/dbconfig/20250718-131009-root.json
  • 13:09 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:09 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:05 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:05 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P79389 and previous config saved to /var/cache/conftool/dbconfig/20250718-130410-marostegui.json
  • 13:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:59 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 12:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79388 and previous config saved to /var/cache/conftool/dbconfig/20250718-125925-root.json
  • 12:59 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 12:58 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 12:58 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 12:57 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 12:56 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 12:56 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 12:56 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 12:55 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 12:55 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'es1048 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P79387 and previous config saved to /var/cache/conftool/dbconfig/20250718-125504-root.json
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259 (T399249)', diff saved to https://phabricator.wikimedia.org/P79386 and previous config saved to /var/cache/conftool/dbconfig/20250718-124901-marostegui.json
  • 12:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79385 and previous config saved to /var/cache/conftool/dbconfig/20250718-124419-root.json
  • 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'es1048 (re)pooling @ 35%: Repooling', diff saved to https://phabricator.wikimedia.org/P79384 and previous config saved to /var/cache/conftool/dbconfig/20250718-123958-root.json
  • 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79383 and previous config saved to /var/cache/conftool/dbconfig/20250718-122914-root.json
  • 12:24 marostegui@cumin1002: dbctl commit (dc=all): 'es1048 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P79382 and previous config saved to /var/cache/conftool/dbconfig/20250718-122452-root.json
  • 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1198 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79381 and previous config saved to /var/cache/conftool/dbconfig/20250718-121901-marostegui.json
  • 12:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 12:12 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 12:10 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 12:09 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'es1048 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P79380 and previous config saved to /var/cache/conftool/dbconfig/20250718-120946-root.json
  • 12:08 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 12:07 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 12:05 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 11:54 marostegui@cumin1002: dbctl commit (dc=all): 'es1048 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P79379 and previous config saved to /var/cache/conftool/dbconfig/20250718-115440-root.json
  • 11:49 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:49 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:48 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1259 (T399249)', diff saved to https://phabricator.wikimedia.org/P79377 and previous config saved to /var/cache/conftool/dbconfig/20250718-114618-marostegui.json
  • 11:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1259.eqiad.wmnet with reason: Maintenance
  • 11:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T399249)', diff saved to https://phabricator.wikimedia.org/P79376 and previous config saved to /var/cache/conftool/dbconfig/20250718-114555-marostegui.json
  • 11:43 marostegui: Restart pc7 T399540
  • 11:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2017.codfw.wmnet,pc1017.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:42 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'es1048 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P79374 and previous config saved to /var/cache/conftool/dbconfig/20250718-113933-root.json
  • 11:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P79373 and previous config saved to /var/cache/conftool/dbconfig/20250718-113048-marostegui.json
  • 11:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P79372 and previous config saved to /var/cache/conftool/dbconfig/20250718-111541-marostegui.json
  • 11:14 stevemunene@dns1004: END - running authdns-update
  • 11:13 stevemunene@dns1004: START - running authdns-update
  • 11:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T399249)', diff saved to https://phabricator.wikimedia.org/P79371 and previous config saved to /var/cache/conftool/dbconfig/20250718-110033-marostegui.json
  • 10:37 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 10:37 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 10:36 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 10:35 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 10:35 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:35 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 09:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1254 (T399249)', diff saved to https://phabricator.wikimedia.org/P79370 and previous config saved to /var/cache/conftool/dbconfig/20250718-095938-marostegui.json
  • 09:59 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1254.eqiad.wmnet with reason: Maintenance
  • 09:45 arnaudb@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 09:43 arnaudb@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 09:41 arnaudb@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 09:41 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:41 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:39 arnaudb@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 09:36 arnaudb@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 09:35 arnaudb@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79369 and previous config saved to /var/cache/conftool/dbconfig/20250718-093410-root.json
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79367 and previous config saved to /var/cache/conftool/dbconfig/20250718-091904-root.json
  • 09:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79366 and previous config saved to /var/cache/conftool/dbconfig/20250718-090358-root.json
  • 08:57 marostegui@cumin1002: dbctl commit (dc=all): 'Pool es1048 with 1% weight on es7 T395771', diff saved to https://phabricator.wikimedia.org/P79365 and previous config saved to /var/cache/conftool/dbconfig/20250718-085755-marostegui.json
  • 08:57 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:57 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 08:57 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T399249)', diff saved to https://phabricator.wikimedia.org/P79364 and previous config saved to /var/cache/conftool/dbconfig/20250718-085704-marostegui.json
  • 08:56 marostegui@cumin1002: dbctl commit (dc=all): 'Add es1048 to es7 depooled T395771', diff saved to https://phabricator.wikimedia.org/P79363 and previous config saved to /var/cache/conftool/dbconfig/20250718-085652-marostegui.json
  • 08:55 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79362 and previous config saved to /var/cache/conftool/dbconfig/20250718-084853-root.json
  • 08:46 elukey: elukey@kafkamon2003:~$ sudo systemctl restart burrow-main-codfw.service
  • 08:45 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 08:44 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P79361 and previous config saved to /var/cache/conftool/dbconfig/20250718-084129-marostegui.json
  • 08:41 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1189 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79360 and previous config saved to /var/cache/conftool/dbconfig/20250718-083831-marostegui.json
  • 08:38 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 08:35 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 08:32 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P79359 and previous config saved to /var/cache/conftool/dbconfig/20250718-082621-marostegui.json
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T399249)', diff saved to https://phabricator.wikimedia.org/P79358 and previous config saved to /var/cache/conftool/dbconfig/20250718-081114-marostegui.json
  • 07:56 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1229 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P79357 and previous config saved to /var/cache/conftool/dbconfig/20250718-075532-root.json
  • 07:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 07:49 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 07:49 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 07:45 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 07:45 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 07:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1229 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P79356 and previous config saved to /var/cache/conftool/dbconfig/20250718-074026-root.json
  • 07:34 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 07:34 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 07:34 elukey@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 07:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 07:31 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 07:30 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 07:29 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 07:29 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1229 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P79355 and previous config saved to /var/cache/conftool/dbconfig/20250718-072520-root.json
  • 07:25 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:16 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 07:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T399249)', diff saved to https://phabricator.wikimedia.org/P79354 and previous config saved to /var/cache/conftool/dbconfig/20250718-071112-marostegui.json
  • 07:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 07:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1229 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P79353 and previous config saved to /var/cache/conftool/dbconfig/20250718-071014-root.json
  • 07:10 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 07:06 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 07:04 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 06:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 06:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 06:22 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 06:22 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 06:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 01:10 krinkle@deploy1003: Finished scap sync-world: Backport for beta: Remove routing for *.beta.wmflabs.org (T289318) (duration: 22m 13s)
  • 01:04 krinkle@deploy1003: krinkle: Continuing with sync
  • 00:49 krinkle@deploy1003: krinkle: Backport for beta: Remove routing for *.beta.wmflabs.org (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:47 krinkle@deploy1003: Started scap sync-world: Backport for beta: Remove routing for *.beta.wmflabs.org (T289318)
  • 00:05 krinkle@deploy1003: Finished scap sync-world: Backport for multiversion: Fix "Class Wikimedia\MWConfig\Exception not found" (duration: 21m 59s)

2025-07-17

  • 23:59 krinkle@deploy1003: krinkle: Continuing with sync
  • 23:45 krinkle@deploy1003: krinkle: Backport for multiversion: Fix "Class Wikimedia\MWConfig\Exception not found" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:43 krinkle@deploy1003: Started scap sync-world: Backport for multiversion: Fix "Class Wikimedia\MWConfig\Exception not found"
  • 23:03 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance - T399249
  • 22:57 mutante: [cumin1002:~] $ sudo dbctl instance db1229 depool
  • 22:25 zabe@deploy1003: Finished scap sync-world: Backport for PendingChangesPager: Stop using ANSI-89 joins (T399641) (duration: 08m 08s)
  • 22:20 zabe@deploy1003: jforrester, zabe: Continuing with sync
  • 22:19 zabe@deploy1003: jforrester, zabe: Backport for PendingChangesPager: Stop using ANSI-89 joins (T399641) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:17 zabe@deploy1003: Started scap sync-world: Backport for PendingChangesPager: Stop using ANSI-89 joins (T399641)
  • 22:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqsin [reason: repool eqsin to test backhaul cct packet loss, T399221]
  • 22:01 cmooney@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site eqsin [reason: repool eqsin to test backhaul cct packet loss, T399221]
  • 21:48 samtar@deploy1003: Finished scap sync-world: Backport for Grant editpatrolprotected to sysops and bots (T399881) (duration: 12m 37s)
  • 21:41 samtar@deploy1003: aleksandar, samtar: Continuing with sync
  • 21:39 samtar@deploy1003: aleksandar, samtar: Backport for Grant editpatrolprotected to sysops and bots (T399881) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:35 samtar@deploy1003: Started scap sync-world: Backport for Grant editpatrolprotected to sysops and bots (T399881)
  • 21:31 samtar@deploy1003: Finished scap sync-world: Backport for Add editpatrolprotected messages (T399881) (duration: 37m 49s)
  • 21:19 samtar@deploy1003: zoranzoki21, samtar: Continuing with sync
  • 21:18 samtar@deploy1003: zoranzoki21, samtar: Backport for Add editpatrolprotected messages (T399881) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:55 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:53 samtar@deploy1003: Started scap sync-world: Backport for Add editpatrolprotected messages (T399881)
  • 20:52 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:52 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:51 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:50 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:43 bvibber@deploy1003: Finished scap sync-world: Backport for Database index hack to speed chartinfo API (T393950) (duration: 08m 47s)
  • 20:37 bvibber@deploy1003: bvibber: Continuing with sync
  • 20:36 bvibber@deploy1003: bvibber: Backport for Database index hack to speed chartinfo API (T393950) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:34 bvibber@deploy1003: Started scap sync-world: Backport for Database index hack to speed chartinfo API (T393950)
  • 20:20 sbisson@deploy1003: Finished scap sync-world: Backport for CX3 Build 1.0.0+20250717 (T388503 T395417 T395418) (duration: 11m 10s)
  • 20:14 sbisson@deploy1003: sbisson: Continuing with sync
  • 20:11 sbisson@deploy1003: sbisson: Backport for CX3 Build 1.0.0+20250717 (T388503 T395417 T395418) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:09 sbisson@deploy1003: Started scap sync-world: Backport for CX3 Build 1.0.0+20250717 (T388503 T395417 T395418)
  • 19:03 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1054.eqiad.wmnet with OS bookworm
  • 19:03 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 19:03 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 18:48 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1054.eqiad.wmnet with reason: host reimage
  • 18:43 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1054.eqiad.wmnet with reason: host reimage
  • 18:32 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2016.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 18:30 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1054.eqiad.wmnet with OS bookworm
  • 18:27 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:20 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:19 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host pc2016.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 18:15 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:12 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:12 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:12 dancy@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.10 refs T392180
  • 18:10 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:09 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:08 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:05 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 18:03 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:03 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:00 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc2016
  • 18:00 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc2016
  • 17:59 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:57 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:57 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 17:44 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:40 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:38 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:38 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1054
  • 17:37 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:37 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:37 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:36 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1054
  • 17:36 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:36 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:36 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:36 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ganeti1054 - vriley@cumin1002"
  • 17:36 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt ganeti1054 - vriley@cumin1002"
  • 17:32 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 17:14 swfrench@deploy1003: Finished scap sync-world: Migrate webserver-bookworm flavour back to (bookworm) mediawiki-httpd images - T378128 (duration: 09m 56s)
  • 17:08 swfrench@deploy1003: swfrench: Continuing with sync
  • 17:06 swfrench@deploy1003: swfrench: Migrate webserver-bookworm flavour back to (bookworm) mediawiki-httpd images - T378128 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:05 swfrench@deploy1003: Started scap sync-world: Migrate webserver-bookworm flavour back to (bookworm) mediawiki-httpd images - T378128
  • 16:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T399249)', diff saved to https://phabricator.wikimedia.org/P79350 and previous config saved to /var/cache/conftool/dbconfig/20250717-165345-marostegui.json
  • 16:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 16:40 mszabo@deploy1003: Finished scap sync-world: Backport for Load hCaptcha on first form interaction (T399849) (duration: 14m 26s)
  • 16:33 mszabo@deploy1003: mszabo: Continuing with sync
  • 16:30 mszabo@deploy1003: mszabo: Backport for Load hCaptcha on first form interaction (T399849) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:28 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:28 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:26 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:26 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 16:26 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 16:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on backup1007.eqiad.wmnet with reason: failed disk
  • 16:25 mszabo@deploy1003: Started scap sync-world: Backport for Load hCaptcha on first form interaction (T399849)
  • 16:24 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:04 kharlan@deploy1003: Finished scap sync-world: Backport for Prevent submissions of forms using hCaptcha until ready (T395619) (duration: 44m 46s)
  • 16:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2016.codfw.wmnet,pc1016.eqiad.wmnet with reason: Maintenance
  • 16:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2016.codfw.wmnet,pc1016.eqiad.wmnet with reason: Maintenance
  • 16:02 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 16:01 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 15:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T399249)', diff saved to https://phabricator.wikimedia.org/P79348 and previous config saved to /var/cache/conftool/dbconfig/20250717-155223-marostegui.json
  • 15:51 kharlan@deploy1003: kharlan: Continuing with sync
  • 15:50 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@9fc3ae8]: Pushing new artifacts (duration: 00m 17s)
  • 15:50 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@9fc3ae8]: Pushing new artifacts
  • 15:46 aqu@deploy1003: Finished deploy [airflow-dags/analytics@9fc3ae8]: Pushing new artifacts (duration: 00m 41s)
  • 15:45 aqu@deploy1003: Started deploy [airflow-dags/analytics@9fc3ae8]: Pushing new artifacts
  • 15:44 kharlan@deploy1003: kharlan: Backport for Prevent submissions of forms using hCaptcha until ready (T395619) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P79347 and previous config saved to /var/cache/conftool/dbconfig/20250717-153715-marostegui.json
  • 15:29 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:29 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 15:29 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:28 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 15:28 topranks: un-drain Arelion transport circuit from codfw -> eqsin to test performance T399221
  • 15:28 lucaswerkmeister-wmde@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 15:27 lucaswerkmeister-wmde@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 15:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P79346 and previous config saved to /var/cache/conftool/dbconfig/20250717-152207-marostegui.json
  • 15:19 kharlan@deploy1003: Started scap sync-world: Backport for Prevent submissions of forms using hCaptcha until ready (T395619)
  • 15:17 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 15:16 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 15:16 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 15:16 btullis@dns1004: END - running authdns-update
  • 15:15 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 15:15 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:15 btullis@dns1004: START - running authdns-update
  • 15:15 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:14 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 15:14 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 15:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:13 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 15:13 topranks: disable one of the 2x10G links connected to Equinix IXP Peering on cr1-codfw
  • 15:12 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 15:12 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T399249)', diff saved to https://phabricator.wikimedia.org/P79345 and previous config saved to /var/cache/conftool/dbconfig/20250717-150659-marostegui.json
  • 15:03 dancy@deploy1003: Installation of scap version "4.189.0" completed for 2 hosts
  • 15:01 dancy@deploy1003: Installing scap version "4.189.0" for 2 host(s)
  • 14:53 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 14:52 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 14:52 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 14:52 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 14:51 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:51 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:50 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 14:50 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 14:49 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:49 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 14:49 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 14:48 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 14:44 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 14:44 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 14:43 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 14:43 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 14:42 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:42 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:42 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 14:41 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 14:41 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:41 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 14:40 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 14:40 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 14:38 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqsin [reason: depool eqsin to test backhaul cct packet loss, T399221]
  • 14:38 cmooney@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: depool eqsin to test backhaul cct packet loss, T399221]
  • 14:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T399249)', diff saved to https://phabricator.wikimedia.org/P79344 and previous config saved to /var/cache/conftool/dbconfig/20250717-142228-marostegui.json
  • 14:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 14:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T399249)', diff saved to https://phabricator.wikimedia.org/P79343 and previous config saved to /var/cache/conftool/dbconfig/20250717-142205-marostegui.json
  • 14:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P79342 and previous config saved to /var/cache/conftool/dbconfig/20250717-140658-marostegui.json
  • 13:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P79341 and previous config saved to /var/cache/conftool/dbconfig/20250717-135150-marostegui.json
  • 13:39 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Create "abusefilter" editor user group for Vietnamese Wikipedia (T399535) (duration: 13m 12s)
  • 13:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T399249)', diff saved to https://phabricator.wikimedia.org/P79340 and previous config saved to /var/cache/conftool/dbconfig/20250717-133641-marostegui.json
  • 13:34 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, tryvix1509: Continuing with sync
  • 13:28 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, tryvix1509: Backport for Create "abusefilter" editor user group for Vietnamese Wikipedia (T399535) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:26 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Create "abusefilter" editor user group for Vietnamese Wikipedia (T399535)
  • 13:24 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Activate feature to resolve changelist wikibase link labels in all wikis (T388685) (duration: 11m 30s)
  • 13:19 lucaswerkmeister-wmde@deploy1003: joelyrookewmde, lucaswerkmeister-wmde: Continuing with sync
  • 13:15 lucaswerkmeister-wmde@deploy1003: joelyrookewmde, lucaswerkmeister-wmde: Backport for Activate feature to resolve changelist wikibase link labels in all wikis (T388685) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:13 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Activate feature to resolve changelist wikibase link labels in all wikis (T388685)
  • 13:00 btullis@dns1004: END - running authdns-update
  • 12:59 btullis@dns1004: START - running authdns-update
  • 12:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T399249)', diff saved to https://phabricator.wikimedia.org/P79338 and previous config saved to /var/cache/conftool/dbconfig/20250717-125029-marostegui.json
  • 12:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T399249)', diff saved to https://phabricator.wikimedia.org/P79337 and previous config saved to /var/cache/conftool/dbconfig/20250717-125007-marostegui.json
  • 12:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P79336 and previous config saved to /var/cache/conftool/dbconfig/20250717-123459-marostegui.json
  • 12:28 jgleeson: SmashPig upgraded from f373fe4d to de30a87f
  • 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P79335 and previous config saved to /var/cache/conftool/dbconfig/20250717-121952-marostegui.json
  • 12:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79334 and previous config saved to /var/cache/conftool/dbconfig/20250717-120738-root.json
  • 12:05 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:05 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T399249)', diff saved to https://phabricator.wikimedia.org/P79333 and previous config saved to /var/cache/conftool/dbconfig/20250717-120444-marostegui.json
  • 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79332 and previous config saved to /var/cache/conftool/dbconfig/20250717-120014-root.json
  • 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79330 and previous config saved to /var/cache/conftool/dbconfig/20250717-115232-root.json
  • 11:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79329 and previous config saved to /var/cache/conftool/dbconfig/20250717-114506-root.json
  • 11:41 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:41 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:38 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 11:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79327 and previous config saved to /var/cache/conftool/dbconfig/20250717-113726-root.json
  • 11:30 stevemunene@cumin1003: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1186.eqiad.wmnet
  • 11:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79326 and previous config saved to /var/cache/conftool/dbconfig/20250717-113000-root.json
  • 11:27 stevemunene@cumin1003: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1186.eqiad.wmnet
  • 11:26 stevemunene@cumin1003: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1179.eqiad.wmnet
  • 11:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 11:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 11:24 stevemunene@cumin1003: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1179.eqiad.wmnet
  • 11:24 stevemunene@cumin1003: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1176.eqiad.wmnet
  • 11:24 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 11:22 stevemunene@cumin1003: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1176.eqiad.wmnet
  • 11:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2227 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79325 and previous config saved to /var/cache/conftool/dbconfig/20250717-112220-root.json
  • 11:17 marostegui: Restart pc4 T399540
  • 11:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2014.codfw.wmnet,pc1014.eqiad.wmnet with reason: Maintenance
  • 11:15 elukey@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM install1004.wikimedia.org
  • 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79323 and previous config saved to /var/cache/conftool/dbconfig/20250717-111454-root.json
  • 11:14 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:14 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2227 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79321 and previous config saved to /var/cache/conftool/dbconfig/20250717-111132-marostegui.json
  • 11:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2227.codfw.wmnet with reason: Maintenance
  • 11:08 elukey@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM install1004.wikimedia.org
  • 11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1166 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79320 and previous config saved to /var/cache/conftool/dbconfig/20250717-110405-marostegui.json
  • 11:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:00 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T399249)', diff saved to https://phabricator.wikimedia.org/P79319 and previous config saved to /var/cache/conftool/dbconfig/20250717-105741-marostegui.json
  • 10:57 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 10:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T399249)', diff saved to https://phabricator.wikimedia.org/P79318 and previous config saved to /var/cache/conftool/dbconfig/20250717-105719-marostegui.json
  • 10:52 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 10:51 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 10:50 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 10:49 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 10:49 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:49 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 10:48 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 10:48 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:48 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P79317 and previous config saved to /var/cache/conftool/dbconfig/20250717-104211-marostegui.json
  • 10:28 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 10:27 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 10:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P79316 and previous config saved to /var/cache/conftool/dbconfig/20250717-102704-marostegui.json
  • 10:24 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 10:24 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 10:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:23 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 10:23 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 10:18 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:16 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:16 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 30182
  • 10:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:14 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 30182
  • 10:14 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T399249)', diff saved to https://phabricator.wikimedia.org/P79315 and previous config saved to /var/cache/conftool/dbconfig/20250717-101156-marostegui.json
  • 09:36 jakob@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
  • 09:36 jakob@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
  • 09:35 jakob@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
  • 09:35 jakob@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
  • 09:34 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
  • 09:34 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T399249)', diff saved to https://phabricator.wikimedia.org/P79314 and previous config saved to /var/cache/conftool/dbconfig/20250717-092854-marostegui.json
  • 09:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T399249)', diff saved to https://phabricator.wikimedia.org/P79313 and previous config saved to /var/cache/conftool/dbconfig/20250717-092831-marostegui.json
  • 09:24 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:24 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:13 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P79312 and previous config saved to /var/cache/conftool/dbconfig/20250717-091323-marostegui.json
  • 09:12 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P79311 and previous config saved to /var/cache/conftool/dbconfig/20250717-085815-marostegui.json
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T399249)', diff saved to https://phabricator.wikimedia.org/P79310 and previous config saved to /var/cache/conftool/dbconfig/20250717-084308-marostegui.json
  • 08:07 gkyziridis@deploy1003: Finished scap sync-world: Backport for CX: Remove unused config related to database and cluster (T348513) (duration: 22m 15s)
  • 08:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79309 and previous config saved to /var/cache/conftool/dbconfig/20250717-080159-root.json
  • 08:01 gkyziridis@deploy1003: gkyziridis, abi: Continuing with sync
  • 07:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79308 and previous config saved to /var/cache/conftool/dbconfig/20250717-075720-root.json
  • 07:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P79307 and previous config saved to /var/cache/conftool/dbconfig/20250717-074728-root.json
  • 07:47 gkyziridis@deploy1003: gkyziridis, abi: Backport for CX: Remove unused config related to database and cluster (T348513) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79306 and previous config saved to /var/cache/conftool/dbconfig/20250717-074653-root.json
  • 07:45 gkyziridis@deploy1003: Started scap sync-world: Backport for CX: Remove unused config related to database and cluster (T348513)
  • 07:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79305 and previous config saved to /var/cache/conftool/dbconfig/20250717-074214-root.json
  • 07:38 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 07:38 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 07:38 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T399249)', diff saved to https://phabricator.wikimedia.org/P79304 and previous config saved to /var/cache/conftool/dbconfig/20250717-073506-marostegui.json
  • 07:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 07:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P79303 and previous config saved to /var/cache/conftool/dbconfig/20250717-073223-root.json
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79302 and previous config saved to /var/cache/conftool/dbconfig/20250717-073147-root.json
  • 07:28 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 07:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79301 and previous config saved to /var/cache/conftool/dbconfig/20250717-072709-root.json
  • 07:20 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 07:20 elukey@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
  • 07:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79300 and previous config saved to /var/cache/conftool/dbconfig/20250717-071844-root.json
  • 07:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P79299 and previous config saved to /var/cache/conftool/dbconfig/20250717-071717-root.json
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79298 and previous config saved to /var/cache/conftool/dbconfig/20250717-071642-root.json
  • 07:16 gkyziridis@deploy1003: Finished scap sync-world: Backport for ores-extension: enable revertrisk filter for simplewiki and trwiki (T395668) (duration: 13m 25s)
  • 07:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79297 and previous config saved to /var/cache/conftool/dbconfig/20250717-071201-root.json
  • 07:09 gkyziridis@deploy1003: gkyziridis: Continuing with sync
  • 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1175 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79296 and previous config saved to /var/cache/conftool/dbconfig/20250717-070609-marostegui.json
  • 07:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 07:05 gkyziridis@deploy1003: gkyziridis: Backport for ores-extension: enable revertrisk filter for simplewiki and trwiki (T395668) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79295 and previous config saved to /var/cache/conftool/dbconfig/20250717-070338-root.json
  • 07:03 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable revertrisk filter for simplewiki and trwiki (T395668)
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P79294 and previous config saved to /var/cache/conftool/dbconfig/20250717-070211-root.json
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2205 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79293 and previous config saved to /var/cache/conftool/dbconfig/20250717-070112-marostegui.json
  • 07:01 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 06:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79292 and previous config saved to /var/cache/conftool/dbconfig/20250717-064833-root.json
  • 06:39 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 06:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79291 and previous config saved to /var/cache/conftool/dbconfig/20250717-063327-root.json
  • 06:33 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 06:30 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 06:30 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 06:29 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 06:29 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 06:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1255.eqiad.wmnet with reason: Maintenance
  • 06:24 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 10 hosts with reason: Maintenance
  • 06:20 marostegui@dns1006: END - running authdns-update
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1255 T399699', diff saved to https://phabricator.wikimedia.org/P79289 and previous config saved to /var/cache/conftool/dbconfig/20250717-061943-marostegui.json
  • 06:19 marostegui@dns1006: START - running authdns-update
  • 06:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1258 to x3 primary and set section read-write T399699', diff saved to https://phabricator.wikimedia.org/P79288 and previous config saved to /var/cache/conftool/dbconfig/20250717-061832-marostegui.json
  • 06:18 marostegui@cumin1002: dbctl commit (dc=all): 'Set x3 eqiad as read-only for maintenance - T399699', diff saved to https://phabricator.wikimedia.org/P79287 and previous config saved to /var/cache/conftool/dbconfig/20250717-061800-root.json
  • 06:09 marostegui: Starting x3 eqiad failover from db1255 to db1258 - T399699
  • 06:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Primary switchover x3 T399699
  • 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1258 with weight 0 T399699', diff saved to https://phabricator.wikimedia.org/P79286 and previous config saved to /var/cache/conftool/dbconfig/20250717-060629-root.json
  • 06:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2015.codfw.wmnet,pc1015.eqiad.wmnet with reason: maintenance
  • 03:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T399249)', diff saved to https://phabricator.wikimedia.org/P79285 and previous config saved to /var/cache/conftool/dbconfig/20250717-033528-marostegui.json
  • 03:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P79284 and previous config saved to /var/cache/conftool/dbconfig/20250717-032020-marostegui.json
  • 03:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P79283 and previous config saved to /var/cache/conftool/dbconfig/20250717-030511-marostegui.json
  • 02:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T399249)', diff saved to https://phabricator.wikimedia.org/P79282 and previous config saved to /var/cache/conftool/dbconfig/20250717-025002-marostegui.json
  • 01:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2238 (T399249)', diff saved to https://phabricator.wikimedia.org/P79281 and previous config saved to /var/cache/conftool/dbconfig/20250717-014658-marostegui.json
  • 01:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2238.codfw.wmnet with reason: Maintenance
  • 01:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T399249)', diff saved to https://phabricator.wikimedia.org/P79280 and previous config saved to /var/cache/conftool/dbconfig/20250717-014635-marostegui.json
  • 01:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P79279 and previous config saved to /var/cache/conftool/dbconfig/20250717-013127-marostegui.json
  • 01:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P79278 and previous config saved to /var/cache/conftool/dbconfig/20250717-011619-marostegui.json
  • 01:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T399249)', diff saved to https://phabricator.wikimedia.org/P79277 and previous config saved to /var/cache/conftool/dbconfig/20250717-010111-marostegui.json
  • 00:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2226 (T399249)', diff saved to https://phabricator.wikimedia.org/P79276 and previous config saved to /var/cache/conftool/dbconfig/20250717-002056-marostegui.json
  • 00:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2226.codfw.wmnet with reason: Maintenance
  • 00:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T399249)', diff saved to https://phabricator.wikimedia.org/P79275 and previous config saved to /var/cache/conftool/dbconfig/20250717-002045-marostegui.json
  • 00:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P79274 and previous config saved to /var/cache/conftool/dbconfig/20250717-000537-marostegui.json

2025-07-16

  • 23:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P79273 and previous config saved to /var/cache/conftool/dbconfig/20250716-235029-marostegui.json
  • 23:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T399249)', diff saved to https://phabricator.wikimedia.org/P79271 and previous config saved to /var/cache/conftool/dbconfig/20250716-233522-marostegui.json
  • 22:59 eileen: civicrm upgraded from 71b58ed5 to bf098cc5
  • 22:48 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 22:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2225 (T399249)', diff saved to https://phabricator.wikimedia.org/P79270 and previous config saved to /var/cache/conftool/dbconfig/20250716-223442-marostegui.json
  • 22:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2225.codfw.wmnet with reason: Maintenance
  • 22:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T399249)', diff saved to https://phabricator.wikimedia.org/P79269 and previous config saved to /var/cache/conftool/dbconfig/20250716-223419-marostegui.json
  • 22:34 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1053.eqiad.wmnet with reason: host reimage
  • 22:28 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1053.eqiad.wmnet with reason: host reimage
  • 22:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P79268 and previous config saved to /var/cache/conftool/dbconfig/20250716-221912-marostegui.json
  • 22:05 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P79267 and previous config saved to /var/cache/conftool/dbconfig/20250716-220405-marostegui.json
  • 21:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T399249)', diff saved to https://phabricator.wikimedia.org/P79266 and previous config saved to /var/cache/conftool/dbconfig/20250716-214857-marostegui.json
  • 21:38 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 21:28 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 21:24 bvibber@deploy1003: Finished scap sync-world: Backport for API action=chartinfo internal helper for Charts stats (T393950), API action=chartinfo internal helper for Charts stats (T393950) (duration: 42m 44s)
  • 21:10 bvibber@deploy1003: bvibber: Continuing with sync
  • 21:09 bvibber@deploy1003: bvibber: Backport for API action=chartinfo internal helper for Charts stats (T393950), API action=chartinfo internal helper for Charts stats (T393950) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T399249)', diff saved to https://phabricator.wikimedia.org/P79265 and previous config saved to /var/cache/conftool/dbconfig/20250716-205132-marostegui.json
  • 20:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 20:41 bvibber@deploy1003: Started scap sync-world: Backport for API action=chartinfo internal helper for Charts stats (T393950), API action=chartinfo internal helper for Charts stats (T393950)
  • 20:39 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 20:39 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 20:39 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 20:39 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 20:39 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 20:39 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 20:38 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 20:38 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 20:38 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 20:38 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 20:38 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 20:38 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 20:31 bvibber@deploy1003: Finished scap sync-world: Backport for CX Translation::getStatus: Fix method to properly return the status (T399732) (duration: 10m 38s)
  • 20:26 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1012.eqiad.wmnet with reason: T396970
  • 20:26 bvibber@deploy1003: bvibber, sbisson: Continuing with sync
  • 20:23 sukhe: sukhe@cumin1003:~$ homer cr3-eqsin* commit "drain codfw transport"
  • 20:23 bvibber@deploy1003: bvibber, sbisson: Backport for CX Translation::getStatus: Fix method to properly return the status (T399732) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:23 sukhe: sukhe@cumin1003:~$ homer cr1-codfw* commit "drain eqsin transport"
  • 20:22 sukhe: drain IC-331929 Arelion eqsin->codfw
  • 20:21 bvibber@deploy1003: Started scap sync-world: Backport for CX Translation::getStatus: Fix method to properly return the status (T399732)
  • 20:18 ebernhardson@deploy1003: Finished deploy [airflow-dags/search@bc4d35c]: push updated rdf-spark-tools 0.3.159 artifact (duration: 00m 20s)
  • 20:18 bvibber@deploy1003: Finished scap sync-world: Backport for cirrus: configure managed cluster list, Pre-deploy Readers Use Cases Survey v2 (T399736) (duration: 09m 04s)
  • 20:18 ebernhardson@deploy1003: Started deploy [airflow-dags/search@bc4d35c]: push updated rdf-spark-tools 0.3.159 artifact
  • 20:12 bvibber@deploy1003: ebernhardson, dani, bvibber: Continuing with sync
  • 20:11 bvibber@deploy1003: ebernhardson, dani, bvibber: Backport for cirrus: configure managed cluster list, Pre-deploy Readers Use Cases Survey v2 (T399736) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:09 bvibber@deploy1003: Started scap sync-world: Backport for cirrus: configure managed cluster list, Pre-deploy Readers Use Cases Survey v2 (T399736)
  • 20:08 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 20:03 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:52 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 19:35 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 19:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T399249)', diff saved to https://phabricator.wikimedia.org/P79263 and previous config saved to /var/cache/conftool/dbconfig/20250716-193449-marostegui.json
  • 19:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 19:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:31 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:30 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 19:30 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:30 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:30 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:30 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:30 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:27 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 19:27 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:27 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:26 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:26 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:26 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:26 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 19:26 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:26 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:26 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:26 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:26 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:22 dancy@deploy1003: Finished scap build-images: testing (duration: 01m 11s)
  • 19:21 dancy@deploy1003: Started scap build-images: testing
  • 19:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P79261 and previous config saved to /var/cache/conftool/dbconfig/20250716-191942-marostegui.json
  • 19:19 dancy@deploy1003: Finished scap build-images: Testing T398873 (duration: 04m 34s)
  • 19:14 dancy@deploy1003: Started scap build-images: Testing T398873
  • 19:08 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 19:08 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:08 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:08 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:08 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:08 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:06 dancy@deploy1003: build-images aborted: (no justification provided) (duration: 02m 09s)
  • 19:05 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 19:05 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:05 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:05 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:05 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:05 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:04 dancy@deploy1003: Started scap build-images: (no justification provided)
  • 19:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P79260 and previous config saved to /var/cache/conftool/dbconfig/20250716-190434-marostegui.json
  • 18:58 swfrench-wmf: updated all shellbox instances to 2025-07-15-174312 images
  • 18:57 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 18:56 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 18:56 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 18:56 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 18:55 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 18:55 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 18:54 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 18:54 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 18:53 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 18:53 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 18:53 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 18:52 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 18:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T399249)', diff saved to https://phabricator.wikimedia.org/P79259 and previous config saved to /var/cache/conftool/dbconfig/20250716-184927-marostegui.json
  • 18:38 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 18:38 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 18:37 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 18:37 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 18:36 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 18:36 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 18:35 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 18:35 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 18:34 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 18:34 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 18:34 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 18:33 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 18:21 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 18:21 dancy@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.10 refs T392180
  • 18:04 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 17:58 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 17:52 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on cr1-codfw with reason: downtime router before sfp swap
  • 17:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T399249)', diff saved to https://phabricator.wikimedia.org/P79258 and previous config saved to /var/cache/conftool/dbconfig/20250716-175115-marostegui.json
  • 17:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 17:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T399249)', diff saved to https://phabricator.wikimedia.org/P79257 and previous config saved to /var/cache/conftool/dbconfig/20250716-175052-marostegui.json
  • 17:47 topranks: modify BGP attributes to swing pfw1-codfw.wikimedia.org traffic from cr1-codfw to cr2-codfw T399221
  • 17:39 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 17:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P79256 and previous config saved to /var/cache/conftool/dbconfig/20250716-173545-marostegui.json
  • 17:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P79255 and previous config saved to /var/cache/conftool/dbconfig/20250716-172037-marostegui.json
  • 17:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T399249)', diff saved to https://phabricator.wikimedia.org/P79254 and previous config saved to /var/cache/conftool/dbconfig/20250716-170530-marostegui.json
  • 17:02 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 16:52 elukey@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
  • 16:44 zabe@deploy1003: Finished scap sync-world: Backport for Set categorylinks to read new on shwiki and srwiki (T397912) (duration: 08m 33s)
  • 16:43 cwhite: restart corto on alert1002
  • 16:39 zabe@deploy1003: zabe: Continuing with sync
  • 16:38 zabe@deploy1003: zabe: Backport for Set categorylinks to read new on shwiki and srwiki (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:36 zabe@deploy1003: Started scap sync-world: Backport for Set categorylinks to read new on shwiki and srwiki (T397912)
  • 15:56 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 15:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T399249)', diff saved to https://phabricator.wikimedia.org/P79253 and previous config saved to /var/cache/conftool/dbconfig/20250716-155628-marostegui.json
  • 15:56 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 15:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 15:56 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 15:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T399249)', diff saved to https://phabricator.wikimedia.org/P79252 and previous config saved to /var/cache/conftool/dbconfig/20250716-155605-marostegui.json
  • 15:56 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 15:55 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:55 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:55 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 15:55 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 15:55 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:55 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 15:54 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 15:54 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 15:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P79251 and previous config saved to /var/cache/conftool/dbconfig/20250716-154058-marostegui.json
  • 15:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P79249 and previous config saved to /var/cache/conftool/dbconfig/20250716-152551-marostegui.json
  • 15:23 gmodena@deploy1003: Finished deploy [analytics/refinery@dc1ba0e] (thin): Regular analytics weekly train THIN [analytics/refinery@dc1ba0e3] (duration: 01m 06s)
  • 15:22 gmodena@deploy1003: Started deploy [analytics/refinery@dc1ba0e] (thin): Regular analytics weekly train THIN [analytics/refinery@dc1ba0e3]
  • 15:22 gmodena@deploy1003: Finished deploy [analytics/refinery@dc1ba0e]: Regular analytics weekly train [analytics/refinery@dc1ba0e3] (duration: 03m 39s)
  • 15:18 gmodena@deploy1003: Started deploy [analytics/refinery@dc1ba0e]: Regular analytics weekly train [analytics/refinery@dc1ba0e3]
  • 15:18 gmodena@deploy1003: Finished deploy [analytics/refinery@dc1ba0e] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@dc1ba0e3] (duration: 00m 47s)
  • 15:17 gmodena@deploy1003: Started deploy [analytics/refinery@dc1ba0e] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@dc1ba0e3]
  • 15:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79248 and previous config saved to /var/cache/conftool/dbconfig/20250716-151604-root.json
  • 15:12 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T399249)', diff saved to https://phabricator.wikimedia.org/P79247 and previous config saved to /var/cache/conftool/dbconfig/20250716-151044-marostegui.json
  • 15:09 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 15:08 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc2015
  • 15:07 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc2015
  • 15:02 inflatador: bking@apt1002 publish wmf-opensearch-search-plugins_1.3.20+8_amd64 to component/opensearch13 bullseye-wikimedia
  • 15:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79246 and previous config saved to /var/cache/conftool/dbconfig/20250716-150059-root.json
  • 14:59 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 14:48 elukey@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
  • 14:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79245 and previous config saved to /var/cache/conftool/dbconfig/20250716-144553-root.json
  • 14:41 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 14:39 btullis@dns1004: END - running authdns-update
  • 14:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P79244 and previous config saved to /var/cache/conftool/dbconfig/20250716-143909-root.json
  • 14:38 btullis@dns1004: START - running authdns-update
  • 14:31 elukey@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
  • 14:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79243 and previous config saved to /var/cache/conftool/dbconfig/20250716-143048-root.json
  • 14:29 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:29 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P79242 and previous config saved to /var/cache/conftool/dbconfig/20250716-142404-root.json
  • 14:21 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-d8-codfw
  • 14:21 cmooney@cumin1003: START - Cookbook sre.network.tls for network device ssw1-d8-codfw
  • 14:21 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-d1-codfw
  • 14:21 cmooney@cumin1003: START - Cookbook sre.network.tls for network device ssw1-d1-codfw
  • 14:20 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 14:20 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-magru
  • 14:20 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr2-magru
  • 14:20 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-magru
  • 14:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2194 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79241 and previous config saved to /var/cache/conftool/dbconfig/20250716-141950-marostegui.json
  • 14:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 14:19 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr1-magru
  • 14:19 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:19 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:18 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d8-codfw
  • 14:18 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d8-codfw
  • 14:18 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d7-codfw
  • 14:18 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d7-codfw
  • 14:18 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d6-codfw
  • 14:18 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d6-codfw
  • 14:18 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d5-codfw
  • 14:18 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d5-codfw
  • 14:18 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d4-codfw
  • 14:18 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d4-codfw
  • 14:17 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d3-codfw
  • 14:17 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d3-codfw
  • 14:17 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d2-codfw
  • 14:17 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d2-codfw
  • 14:17 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d1-codfw
  • 14:17 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d1-codfw
  • 14:17 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c7-codfw
  • 14:16 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-c7-codfw
  • 14:16 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c6-codfw
  • 14:16 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-c6-codfw
  • 14:16 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c5-codfw
  • 14:16 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-c5-codfw
  • 14:16 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c4-codfw
  • 14:16 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-c4-codfw
  • 14:16 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c3-codfw
  • 14:16 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-c3-codfw
  • 14:16 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c2-codfw
  • 14:15 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-c2-codfw
  • 14:15 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c1-codfw
  • 14:15 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-c1-codfw
  • 14:15 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:15 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:14 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b4-magru
  • 14:14 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-b4-magru
  • 14:13 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b3-magru
  • 14:12 cmooney@cumin1003: START - Cookbook sre.network.tls for network device asw1-b3-magru
  • 14:10 elukey@deploy1003: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
  • 14:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P79240 and previous config saved to /var/cache/conftool/dbconfig/20250716-140858-root.json
  • 14:06 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T399249)', diff saved to https://phabricator.wikimedia.org/P79239 and previous config saved to /var/cache/conftool/dbconfig/20250716-135641-marostegui.json
  • 13:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P79238 and previous config saved to /var/cache/conftool/dbconfig/20250716-135352-root.json
  • 13:51 tappof@dns1004: END - running authdns-update
  • 13:50 tappof@dns1004: START - running authdns-update
  • 13:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:37 bking@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:36 bking@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:35 bking@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:33 bking@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 13:32 bking@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:32 bking@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:32 bking@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:29 bking@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 13:29 bking@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T399249)', diff saved to https://phabricator.wikimedia.org/P79237 and previous config saved to /var/cache/conftool/dbconfig/20250716-132854-marostegui.json
  • 13:26 bking@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:26 bking@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:26 bking@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:19 bking@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:19 bking@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P79236 and previous config saved to /var/cache/conftool/dbconfig/20250716-131347-marostegui.json
  • 13:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79235 and previous config saved to /var/cache/conftool/dbconfig/20250716-130551-root.json
  • 12:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P79234 and previous config saved to /var/cache/conftool/dbconfig/20250716-125840-marostegui.json
  • 12:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79232 and previous config saved to /var/cache/conftool/dbconfig/20250716-125045-root.json
  • 12:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T399249)', diff saved to https://phabricator.wikimedia.org/P79231 and previous config saved to /var/cache/conftool/dbconfig/20250716-124333-marostegui.json
  • 12:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79230 and previous config saved to /var/cache/conftool/dbconfig/20250716-124332-root.json
  • 12:43 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:43 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:36 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:36 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:36 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:36 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:35 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79229 and previous config saved to /var/cache/conftool/dbconfig/20250716-123540-root.json
  • 12:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79228 and previous config saved to /var/cache/conftool/dbconfig/20250716-122827-root.json
  • 12:26 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:26 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:25 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on backup1007.eqiad.wmnet with reason: Stop minio
  • 12:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79227 and previous config saved to /var/cache/conftool/dbconfig/20250716-122034-root.json
  • 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T399249)', diff saved to https://phabricator.wikimedia.org/P79226 and previous config saved to /var/cache/conftool/dbconfig/20250716-121900-marostegui.json
  • 12:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 12:16 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:16 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79225 and previous config saved to /var/cache/conftool/dbconfig/20250716-121322-root.json
  • 12:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1157 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79224 and previous config saved to /var/cache/conftool/dbconfig/20250716-121000-marostegui.json
  • 12:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 11:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79223 and previous config saved to /var/cache/conftool/dbconfig/20250716-115816-root.json
  • 11:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T399249)', diff saved to https://phabricator.wikimedia.org/P79222 and previous config saved to /var/cache/conftool/dbconfig/20250716-115131-marostegui.json
  • 11:49 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:49 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:47 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2190 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79221 and previous config saved to /var/cache/conftool/dbconfig/20250716-114637-marostegui.json
  • 11:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 11:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P79220 and previous config saved to /var/cache/conftool/dbconfig/20250716-113624-marostegui.json
  • 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P79219 and previous config saved to /var/cache/conftool/dbconfig/20250716-112117-marostegui.json
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T399249)', diff saved to https://phabricator.wikimedia.org/P79218 and previous config saved to /var/cache/conftool/dbconfig/20250716-110610-marostegui.json
  • 10:49 btullis@dns1004: END - running authdns-update
  • 10:48 btullis@dns1004: START - running authdns-update
  • 10:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T399249)', diff saved to https://phabricator.wikimedia.org/P79217 and previous config saved to /var/cache/conftool/dbconfig/20250716-104139-marostegui.json
  • 10:41 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 10:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T399249)', diff saved to https://phabricator.wikimedia.org/P79216 and previous config saved to /var/cache/conftool/dbconfig/20250716-104117-marostegui.json
  • 10:30 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P79215 and previous config saved to /var/cache/conftool/dbconfig/20250716-102609-marostegui.json
  • 10:24 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 10:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2177 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79214 and previous config saved to /var/cache/conftool/dbconfig/20250716-102009-root.json
  • 10:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: maintenance
  • 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P79213 and previous config saved to /var/cache/conftool/dbconfig/20250716-101102-marostegui.json
  • 10:10 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 10:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: maintenance
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2177 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79212 and previous config saved to /var/cache/conftool/dbconfig/20250716-100504-root.json
  • 10:04 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 10:02 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bookworm
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T399249)', diff saved to https://phabricator.wikimedia.org/P79211 and previous config saved to /var/cache/conftool/dbconfig/20250716-095554-marostegui.json
  • 09:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2177 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79210 and previous config saved to /var/cache/conftool/dbconfig/20250716-094958-root.json
  • 09:38 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2177 (re)pooling @ 25%: 10', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250716-093448-root.json
  • 09:34 jelto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
  • 09:30 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: maintenance
  • 09:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T399249)', diff saved to https://phabricator.wikimedia.org/P79209 and previous config saved to /var/cache/conftool/dbconfig/20250716-092942-marostegui.json
  • 09:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 09:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T399249)', diff saved to https://phabricator.wikimedia.org/P79208 and previous config saved to /var/cache/conftool/dbconfig/20250716-092919-marostegui.json
  • 09:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2177 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79207 and previous config saved to /var/cache/conftool/dbconfig/20250716-092420-marostegui.json
  • 09:24 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79206 and previous config saved to /var/cache/conftool/dbconfig/20250716-092131-root.json
  • 09:18 jelto@cumin1003: START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bookworm
  • 09:16 jelto@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gitlab1004.wikimedia.org with OS bookworm
  • 09:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P79205 and previous config saved to /var/cache/conftool/dbconfig/20250716-091413-marostegui.json
  • 09:09 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:07 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:07 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79204 and previous config saved to /var/cache/conftool/dbconfig/20250716-090625-root.json
  • 09:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:04 jelto@cumin1003: START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bookworm
  • 09:01 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1189.eqiad.wmnet
  • 09:01 jelto@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gitlab1004.wikimedia.org with OS bookworm
  • 08:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250716-085901-marostegui.json
  • 08:53 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1189.eqiad.wmnet
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79202 and previous config saved to /var/cache/conftool/dbconfig/20250716-085119-root.json
  • 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P79201 and previous config saved to /var/cache/conftool/dbconfig/20250716-085015-root.json
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T399249)', diff saved to https://phabricator.wikimedia.org/P79200 and previous config saved to /var/cache/conftool/dbconfig/20250716-084354-marostegui.json
  • 08:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79198 and previous config saved to /var/cache/conftool/dbconfig/20250716-083614-root.json
  • 08:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P79196 and previous config saved to /var/cache/conftool/dbconfig/20250716-083509-root.json
  • 08:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2156 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79193 and previous config saved to /var/cache/conftool/dbconfig/20250716-082530-marostegui.json
  • 08:25 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 08:23 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup1007.eqiad.wmnet with reason: Stop minio
  • 08:21 jelto@cumin1003: START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bookworm
  • 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P79192 and previous config saved to /var/cache/conftool/dbconfig/20250716-082004-root.json
  • 08:17 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T399249)', diff saved to https://phabricator.wikimedia.org/P79190 and previous config saved to /var/cache/conftool/dbconfig/20250716-081615-marostegui.json
  • 08:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2214.codfw.wmnet with reason: maintenance
  • 08:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 08:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T399249)', diff saved to https://phabricator.wikimedia.org/P79189 and previous config saved to /var/cache/conftool/dbconfig/20250716-081553-marostegui.json
  • 08:15 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bookworm
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2214 T399533', diff saved to https://phabricator.wikimedia.org/P79187 and previous config saved to /var/cache/conftool/dbconfig/20250716-081350-marostegui.json
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2229 to s6 primary T399533', diff saved to https://phabricator.wikimedia.org/P79186 and previous config saved to /var/cache/conftool/dbconfig/20250716-081302-marostegui.json
  • 08:12 marostegui: Starting s6 codfw failover from db2214 to db2229 - T399533
  • 08:11 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s6 T399533
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2229 with weight 0 T399533', diff saved to https://phabricator.wikimedia.org/P79185 and previous config saved to /var/cache/conftool/dbconfig/20250716-080639-root.json
  • 08:04 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P79184 and previous config saved to /var/cache/conftool/dbconfig/20250716-080458-root.json
  • 08:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P79183 and previous config saved to /var/cache/conftool/dbconfig/20250716-080046-marostegui.json
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2241 T399456', diff saved to https://phabricator.wikimedia.org/P79182 and previous config saved to /var/cache/conftool/dbconfig/20250716-075534-marostegui.json
  • 07:55 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:54 marostegui: Starting x3 codfw failover from db2241 to db2162 - T399456
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2162 to x3 primary T399456', diff saved to https://phabricator.wikimedia.org/P79181 and previous config saved to /var/cache/conftool/dbconfig/20250716-075448-marostegui.json
  • 07:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 07:50 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
  • 07:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Primary switchover x3 T399456
  • 07:49 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2162 with weight 0 T399456', diff saved to https://phabricator.wikimedia.org/P79180 and previous config saved to /var/cache/conftool/dbconfig/20250716-074931-root.json
  • 07:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79179 and previous config saved to /var/cache/conftool/dbconfig/20250716-074855-root.json
  • 07:46 jelto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P79178 and previous config saved to /var/cache/conftool/dbconfig/20250716-074538-marostegui.json
  • 07:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79177 and previous config saved to /var/cache/conftool/dbconfig/20250716-073349-root.json
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T399249)', diff saved to https://phabricator.wikimedia.org/P79176 and previous config saved to /var/cache/conftool/dbconfig/20250716-073031-marostegui.json
  • 07:29 jelto@cumin1003: START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bookworm
  • 07:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79175 and previous config saved to /var/cache/conftool/dbconfig/20250716-072205-root.json
  • 07:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79174 and previous config saved to /var/cache/conftool/dbconfig/20250716-071844-root.json
  • 07:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79173 and previous config saved to /var/cache/conftool/dbconfig/20250716-070659-root.json
  • 07:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79172 and previous config saved to /var/cache/conftool/dbconfig/20250716-070338-root.json
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T399249)', diff saved to https://phabricator.wikimedia.org/P79171 and previous config saved to /var/cache/conftool/dbconfig/20250716-070130-marostegui.json
  • 07:01 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:01 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T399249)', diff saved to https://phabricator.wikimedia.org/P79170 and previous config saved to /var/cache/conftool/dbconfig/20250716-070101-marostegui.json
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1257 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79169 and previous config saved to /var/cache/conftool/dbconfig/20250716-065626-marostegui.json
  • 06:56 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1257.eqiad.wmnet with reason: Maintenance
  • 06:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79168 and previous config saved to /var/cache/conftool/dbconfig/20250716-065152-root.json
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79167 and previous config saved to /var/cache/conftool/dbconfig/20250716-064705-root.json
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P79166 and previous config saved to /var/cache/conftool/dbconfig/20250716-064553-marostegui.json
  • 06:44 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Upgrade x3 codfw master
  • 06:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Upgrade s6 codfw master
  • 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79165 and previous config saved to /var/cache/conftool/dbconfig/20250716-063646-root.json
  • 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79164 and previous config saved to /var/cache/conftool/dbconfig/20250716-063159-root.json
  • 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P79163 and previous config saved to /var/cache/conftool/dbconfig/20250716-063046-marostegui.json
  • 06:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2149 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79162 and previous config saved to /var/cache/conftool/dbconfig/20250716-062537-marostegui.json
  • 06:25 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 06:19 marostegui: Poweroff pc2015 for 10G migration T378715
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79161 and previous config saved to /var/cache/conftool/dbconfig/20250716-061653-root.json
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T399249)', diff saved to https://phabricator.wikimedia.org/P79160 and previous config saved to /var/cache/conftool/dbconfig/20250716-061539-marostegui.json
  • 06:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on pc2015.codfw.wmnet,pc1015.eqiad.wmnet with reason: maintenance
  • 06:03 marostegui: Restart mariadb on pc5 T399540
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79159 and previous config saved to /var/cache/conftool/dbconfig/20250716-060148-root.json
  • 06:00 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 05:59 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 05:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1256 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79157 and previous config saved to /var/cache/conftool/dbconfig/20250716-055408-marostegui.json
  • 05:54 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1256.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1159 (T399249)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250716-054645-marostegui.json
  • 05:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 03:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T399249)', diff saved to https://phabricator.wikimedia.org/P79156 and previous config saved to /var/cache/conftool/dbconfig/20250716-033133-marostegui.json
  • 03:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P79155 and previous config saved to /var/cache/conftool/dbconfig/20250716-031626-marostegui.json
  • 03:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P79154 and previous config saved to /var/cache/conftool/dbconfig/20250716-030119-marostegui.json
  • 02:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T399249)', diff saved to https://phabricator.wikimedia.org/P79153 and previous config saved to /var/cache/conftool/dbconfig/20250716-024611-marostegui.json
  • 02:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2228 (T399249)', diff saved to https://phabricator.wikimedia.org/P79152 and previous config saved to /var/cache/conftool/dbconfig/20250716-021956-marostegui.json
  • 02:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2228.codfw.wmnet with reason: Maintenance
  • 02:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T399249)', diff saved to https://phabricator.wikimedia.org/P79151 and previous config saved to /var/cache/conftool/dbconfig/20250716-021933-marostegui.json
  • 02:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P79150 and previous config saved to /var/cache/conftool/dbconfig/20250716-020426-marostegui.json
  • 01:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P79149 and previous config saved to /var/cache/conftool/dbconfig/20250716-014918-marostegui.json
  • 01:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T399249)', diff saved to https://phabricator.wikimedia.org/P79148 and previous config saved to /var/cache/conftool/dbconfig/20250716-013410-marostegui.json
  • 01:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2223 (T399249)', diff saved to https://phabricator.wikimedia.org/P79147 and previous config saved to /var/cache/conftool/dbconfig/20250716-010617-marostegui.json
  • 01:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2223.codfw.wmnet with reason: Maintenance
  • 01:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T399249)', diff saved to https://phabricator.wikimedia.org/P79146 and previous config saved to /var/cache/conftool/dbconfig/20250716-010554-marostegui.json
  • 00:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P79145 and previous config saved to /var/cache/conftool/dbconfig/20250716-005047-marostegui.json
  • 00:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P79144 and previous config saved to /var/cache/conftool/dbconfig/20250716-003539-marostegui.json
  • 00:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T399249)', diff saved to https://phabricator.wikimedia.org/P79143 and previous config saved to /var/cache/conftool/dbconfig/20250716-002031-marostegui.json

2025-07-15

  • 23:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T399249)', diff saved to https://phabricator.wikimedia.org/P79142 and previous config saved to /var/cache/conftool/dbconfig/20250715-235236-marostegui.json
  • 23:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 23:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 23:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T399249)', diff saved to https://phabricator.wikimedia.org/P79141 and previous config saved to /var/cache/conftool/dbconfig/20250715-232640-marostegui.json
  • 23:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P79139 and previous config saved to /var/cache/conftool/dbconfig/20250715-231132-marostegui.json
  • 22:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P79138 and previous config saved to /var/cache/conftool/dbconfig/20250715-225624-marostegui.json
  • 22:50 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 22:49 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 22:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T399249)', diff saved to https://phabricator.wikimedia.org/P79137 and previous config saved to /var/cache/conftool/dbconfig/20250715-224117-marostegui.json
  • 22:27 swfrench-wmf: reprepro include php-excimer_1.2.5-1+wmf11u1 php-imagick_3.7.0-13+wmf11u1 php-luasandbox_4.1.2-1+wmf11u1 php-memcached_3.3.0-1+wmf11u1 php-pcov_1.0.12-1+wmf11u1 php-redis_6.2.0-1+wmf11u1 php-uuid_1.3.0-1+wmf11u1 php-wmerrors_2.0.0-1+wmf11u1 php-yaml_2.2.4-1+wmf11u1 wikidiff2_1.14.1-2+wmf11u1 xdebug_3.4.4-1+wmf11u1 in component/php83 - T398245
  • 22:20 zabe@deploy1003: Finished scap sync-world: Backport for extension-list: Undeploy Interwiki (step 3) (T399636) (duration: 08m 17s)
  • 22:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T399249)', diff saved to https://phabricator.wikimedia.org/P79136 and previous config saved to /var/cache/conftool/dbconfig/20250715-221606-marostegui.json
  • 22:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 22:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T399249)', diff saved to https://phabricator.wikimedia.org/P79135 and previous config saved to /var/cache/conftool/dbconfig/20250715-221543-marostegui.json
  • 22:15 zabe@deploy1003: zabe: Continuing with sync
  • 22:14 zabe@deploy1003: zabe: Backport for extension-list: Undeploy Interwiki (step 3) (T399636) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:12 zabe@deploy1003: Started scap sync-world: Backport for extension-list: Undeploy Interwiki (step 3) (T399636)
  • 22:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P79134 and previous config saved to /var/cache/conftool/dbconfig/20250715-220036-marostegui.json
  • 21:57 zabe@deploy1003: Finished scap sync-world: Backport for IS: Undeploy Interwiki (step 2) (T399636) (duration: 08m 42s)
  • 21:52 zabe@deploy1003: zabe: Continuing with sync
  • 21:51 zabe@deploy1003: zabe: Backport for IS: Undeploy Interwiki (step 2) (T399636) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:49 zabe@deploy1003: Started scap sync-world: Backport for IS: Undeploy Interwiki (step 2) (T399636)
  • 21:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P79133 and previous config saved to /var/cache/conftool/dbconfig/20250715-214528-marostegui.json
  • 21:40 zabe@deploy1003: Finished scap sync-world: Backport for CS: Undeploy Interwiki (step 1) (T399636) (duration: 08m 55s)
  • 21:35 zabe@deploy1003: zabe: Continuing with sync
  • 21:33 zabe@deploy1003: zabe: Backport for CS: Undeploy Interwiki (step 1) (T399636) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:31 zabe@deploy1003: Started scap sync-world: Backport for CS: Undeploy Interwiki (step 1) (T399636)
  • 21:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T399249)', diff saved to https://phabricator.wikimedia.org/P79132 and previous config saved to /var/cache/conftool/dbconfig/20250715-213021-marostegui.json
  • 21:28 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 21:17 zabe@deploy1003: Finished scap sync-world: Backport for Also join linktarget on namespace to allow index usage, Also join linktarget on namespace to allow index usage (duration: 08m 12s)
  • 21:12 zabe@deploy1003: zabe: Continuing with sync
  • 21:11 zabe@deploy1003: zabe: Backport for Also join linktarget on namespace to allow index usage, Also join linktarget on namespace to allow index usage synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:10 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 21:09 zabe@deploy1003: Started scap sync-world: Backport for Also join linktarget on namespace to allow index usage, Also join linktarget on namespace to allow index usage
  • 21:07 eileen: civicrm upgraded from 521d0dbe to 71b58ed5
  • 21:07 eileen: civicrm upgraded from 521d0dbe to 71b58ed5
  • 21:06 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 21:05 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1012.eqiad.wmnet with OS bullseye
  • 21:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T399249)', diff saved to https://phabricator.wikimedia.org/P79131 and previous config saved to /var/cache/conftool/dbconfig/20250715-210251-marostegui.json
  • 21:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 21:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T399249)', diff saved to https://phabricator.wikimedia.org/P79130 and previous config saved to /var/cache/conftool/dbconfig/20250715-210240-marostegui.json
  • 20:50 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 20:50 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 20:48 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1012.eqiad.wmnet with reason: host reimage
  • 20:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P79129 and previous config saved to /var/cache/conftool/dbconfig/20250715-204732-marostegui.json
  • 20:44 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1012.eqiad.wmnet with reason: host reimage
  • 20:39 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 20:33 zabe@deploy1003: Finished scap sync-world: Backport for Revert^2 "initialiseSettings: set wgSecurePollUseMediaWikiNamespace = true for enwiki", Undeploy Readers Use Cases Survey (T398870) (duration: 08m 27s)
  • 20:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P79128 and previous config saved to /var/cache/conftool/dbconfig/20250715-203224-marostegui.json
  • 20:30 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 20:28 zabe@deploy1003: dani, zabe: Continuing with sync
  • 20:27 zabe@deploy1003: dani, zabe: Backport for Revert^2 "initialiseSettings: set wgSecurePollUseMediaWikiNamespace = true for enwiki", Undeploy Readers Use Cases Survey (T398870) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:25 zabe@deploy1003: Started scap sync-world: Backport for Revert^2 "initialiseSettings: set wgSecurePollUseMediaWikiNamespace = true for enwiki", Undeploy Readers Use Cases Survey (T398870)
  • 20:23 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 20:22 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 20:21 zabe@deploy1003: Sync cancelled.
  • 20:19 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1012.eqiad.wmnet with OS bullseye
  • 20:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T399249)', diff saved to https://phabricator.wikimedia.org/P79127 and previous config saved to /var/cache/conftool/dbconfig/20250715-201715-marostegui.json
  • 20:09 zabe@deploy1003: novemlinguae, zabe: Backport for Revert "initialiseSettings: set wgSecurePollUseMediaWikiNamespace = true for enwiki" (T398080 T399372) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:07 zabe@deploy1003: Started scap sync-world: Backport for Revert "initialiseSettings: set wgSecurePollUseMediaWikiNamespace = true for enwiki" (T398080 T399372)
  • 20:07 jgleeson: SmashPig upgraded from 82dbf9ff to 0bc0c1ec
  • 20:05 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 20:05 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1012.eqiad.wmnet with OS bullseye
  • 19:53 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 19:53 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1012.eqiad.wmnet with OS bullseye
  • 19:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T399249)', diff saved to https://phabricator.wikimedia.org/P79125 and previous config saved to /var/cache/conftool/dbconfig/20250715-194704-marostegui.json
  • 19:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T399249)', diff saved to https://phabricator.wikimedia.org/P79124 and previous config saved to /var/cache/conftool/dbconfig/20250715-194642-marostegui.json
  • 19:42 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 19:41 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1012.eqiad.wmnet with OS bullseye
  • 19:33 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 19:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P79123 and previous config saved to /var/cache/conftool/dbconfig/20250715-193134-marostegui.json
  • 19:20 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1012.eqiad.wmnet with OS bullseye
  • 19:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P79122 and previous config saved to /var/cache/conftool/dbconfig/20250715-191627-marostegui.json
  • 19:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T399249)', diff saved to https://phabricator.wikimedia.org/P79121 and previous config saved to /var/cache/conftool/dbconfig/20250715-190120-marostegui.json
  • 18:52 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host aqs1012.eqiad.wmnet with OS bullseye
  • 18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T399249)', diff saved to https://phabricator.wikimedia.org/P79120 and previous config saved to /var/cache/conftool/dbconfig/20250715-183047-marostegui.json
  • 18:30 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 18:19 dancy@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.10 refs T392180
  • 18:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bookworm
  • 18:10 inflatador: bking@build2001 /srv/deployment/docker-pkg/venv/bin/docker-pkg -c /etc/production-images/config.yaml build images/ --select '*flink*' T398159
  • 18:01 swfrench@deploy1003: Finished scap sync-world: Stop building buster-based webserver flavour images - T378128 (duration: 02m 21s)
  • 17:58 swfrench@deploy1003: Started scap sync-world: Stop building buster-based webserver flavour images - T378128
  • 17:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage
  • 17:51 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage
  • 17:49 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bookworm
  • 17:34 swfrench@deploy1003: Finished scap sync-world: Rebuild to pick up new php8.1 production image (duration: 34m 16s)
  • 17:34 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm
  • 17:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 17:14 brett@cumin2002: START - Cookbook sre.hosts.provision for host lvs1017.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 17:09 brett@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017
  • 17:09 brett@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017
  • 17:09 fceratto@cumin1002: dbctl commit (dc=all): 'Set es1032 back as master', diff saved to https://phabricator.wikimedia.org/P79119 and previous config saved to /var/cache/conftool/dbconfig/20250715-170919-fceratto.json
  • 17:07 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:07 fceratto@cumin1002: dbctl commit (dc=all): 'Pooling in after update es1032', diff saved to https://phabricator.wikimedia.org/P79118 and previous config saved to /var/cache/conftool/dbconfig/20250715-170724-fceratto.json
  • 17:04 brett@cumin2002: START - Cookbook sre.dns.netbox
  • 17:04 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 17:04 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 17:01 swfrench@deploy1003: Started scap sync-world: Rebuild to pick up new php8.1 production image
  • 16:59 fceratto@cumin1002: dbctl commit (dc=all): 'update es1032', diff saved to https://phabricator.wikimedia.org/P79117 and previous config saved to /var/cache/conftool/dbconfig/20250715-165930-fceratto.json
  • 16:58 brett@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1017
  • 16:57 brett@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs1017
  • 16:40 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 16:36 mutante: downtiming es1032 for 3 days - expired downtime for T391921?
  • 16:36 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on es1032.eqiad.wmnet with reason: T391921
  • 16:33 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm
  • 16:21 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 16:10 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db[2160,2234].codfw.wmnet,db[1217,1250].eqiad.wmnet
  • 16:10 jynus@cumin1003: START - Cookbook sre.hosts.remove-downtime for db[2160,2234].codfw.wmnet,db[1217,1250].eqiad.wmnet
  • 15:52 btullis@dns1004: END - running authdns-update
  • 15:52 btullis@dns1004: START - running authdns-update
  • 15:47 jynus: start replica @ db1217:m3, db2160:m3 T370266
  • 15:42 mutante: phabricator version upgrade finished
  • 15:29 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 15:28 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 15:14 btullis@dns1004: END - running authdns-update
  • 15:13 btullis@dns1004: START - running authdns-update
  • 15:12 brennen@deploy1003: Finished deploy [phabricator/deployment@ed8270c]: deploy phab1004 for T370266 (duration: 00m 30s)
  • 15:12 brennen@deploy1003: Started deploy [phabricator/deployment@ed8270c]: deploy phab1004 for T370266
  • 15:11 mutante: phabricator version upgrade in progress - expect short downtime
  • 15:09 brennen@deploy1003: Finished deploy [phabricator/deployment@ed8270c]: test deploy phab2002 for T370266 (duration: 00m 38s)
  • 15:09 brennen@deploy1003: Started deploy [phabricator/deployment@ed8270c]: test deploy phab2002 for T370266
  • 15:03 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet with reason: version upgrade
  • 15:03 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab1004.eqiad.wmnet with reason: version upgrade
  • 15:02 jynus: stop replica @ db1217:m3, db2160:m3 T370266
  • 15:00 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db[2160,2234].codfw.wmnet,db[1217,1250].eqiad.wmnet with reason: Phorge upgrade
  • 14:57 dancy@deploy1003: Installation of scap version "4.188.2" completed for 1 hosts
  • 14:56 dancy@deploy1003: Installing scap version "4.188.2" for 1 host(s)
  • 14:36 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:36 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:33 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:33 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:33 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:30 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 14:25 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 14:23 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 14:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 14:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T399249)', diff saved to https://phabricator.wikimedia.org/P79114 and previous config saved to /var/cache/conftool/dbconfig/20250715-142234-marostegui.json
  • 14:22 swfrench-wmf: reprepro include php8.1_8.1.33-1+wmf11u1 in component/php81
  • 14:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P79113 and previous config saved to /var/cache/conftool/dbconfig/20250715-140726-marostegui.json
  • 14:06 swfrench-wmf: reprepro include php8.3_8.3.23-1+wmf11u2 in component/php83 - T398245
  • 14:06 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 13:55 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 13:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P79112 and previous config saved to /var/cache/conftool/dbconfig/20250715-135219-marostegui.json
  • 13:43 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:41 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:37 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bullseye
  • 13:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T399249)', diff saved to https://phabricator.wikimedia.org/P79111 and previous config saved to /var/cache/conftool/dbconfig/20250715-133712-marostegui.json
  • 13:33 hashar@deploy1003: Finished deploy [releng/jenkins-deploy@ea02eb9] (releasing): jenkins-rel: update plugins to address vulnerabilities - T399154 (duration: 00m 35s)
  • 13:32 hashar@deploy1003: Started deploy [releng/jenkins-deploy@ea02eb9] (releasing): jenkins-rel: update plugins to address vulnerabilities - T399154
  • 13:32 hashar@deploy1003: Finished deploy [releng/jenkins-deploy@ea02eb9] (releasing): jenkins-rel: update plugins to address vulnerabilities - T399154 (duration: 00m 53s)
  • 13:31 hashar@deploy1003: Started deploy [releng/jenkins-deploy@ea02eb9] (releasing): jenkins-rel: update plugins to address vulnerabilities - T399154
  • 13:31 hashar@deploy1003: Finished deploy [releng/jenkins-deploy@ea02eb9] (releasing): jenkins-rel: update plugins to address vulnerabilities - T399154 (duration: 01m 43s)
  • 13:29 hashar@deploy1003: Started deploy [releng/jenkins-deploy@ea02eb9] (releasing): jenkins-rel: update plugins to address vulnerabilities - T399154
  • 13:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T399249)', diff saved to https://phabricator.wikimedia.org/P79110 and previous config saved to /var/cache/conftool/dbconfig/20250715-131450-marostegui.json
  • 13:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 12:52 marostegui@cumin1002: dbctl commit (dc=all): 'Remove weight from the master T395771', diff saved to https://phabricator.wikimedia.org/P79109 and previous config saved to /var/cache/conftool/dbconfig/20250715-125157-marostegui.json
  • 12:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 139009
  • 12:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 12:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T399249)', diff saved to https://phabricator.wikimedia.org/P79108 and previous config saved to /var/cache/conftool/dbconfig/20250715-123357-marostegui.json
  • 12:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 139009
  • 12:23 XioNoX: update AS14907 RIPE import/export policies
  • 12:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P79105 and previous config saved to /var/cache/conftool/dbconfig/20250715-121849-marostegui.json
  • 12:17 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36351
  • 12:16 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:16 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:15 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:14 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:14 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P79102 and previous config saved to /var/cache/conftool/dbconfig/20250715-120340-marostegui.json
  • 11:51 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet
  • 11:51 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 36351
  • 11:50 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16347
  • 11:50 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 16347
  • 11:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T399249)', diff saved to https://phabricator.wikimedia.org/P79101 and previous config saved to /var/cache/conftool/dbconfig/20250715-114833-marostegui.json
  • 11:45 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet
  • 11:41 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2006.codfw.wmnet
  • 11:41 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet
  • 11:34 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp2006.codfw.wmnet
  • 11:34 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
  • 11:26 jynus: restart atftp daemon @ install2004, it had crashed
  • 11:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T399249)', diff saved to https://phabricator.wikimedia.org/P79100 and previous config saved to /var/cache/conftool/dbconfig/20250715-112225-marostegui.json
  • 11:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 11:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T399249)', diff saved to https://phabricator.wikimedia.org/P79099 and previous config saved to /var/cache/conftool/dbconfig/20250715-112202-marostegui.json
  • 11:17 zabe@deploy1003: Finished scap sync-world: Backport for Set categorylinks to read new on jawiki and ruwiki (T397912) (duration: 08m 46s)
  • 11:11 zabe@deploy1003: zabe: Continuing with sync
  • 11:10 zabe@deploy1003: zabe: Backport for Set categorylinks to read new on jawiki and ruwiki (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:08 zabe@deploy1003: Started scap sync-world: Backport for Set categorylinks to read new on jawiki and ruwiki (T397912)
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P79098 and previous config saved to /var/cache/conftool/dbconfig/20250715-110655-marostegui.json
  • 11:03 fceratto@cumin1002: dbctl commit (dc=eqiad): 'Configure db1259', diff saved to https://phabricator.wikimedia.org/P79097 and previous config saved to /var/cache/conftool/dbconfig/20250715-110322-fceratto.json
  • 10:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P79096 and previous config saved to /var/cache/conftool/dbconfig/20250715-105148-marostegui.json
  • 10:48 mszabo@deploy1003: Finished scap sync-world: Backport for Register mediawiki.product_metrics.special_create_account stream (T394744) (duration: 15m 19s)
  • 10:40 mszabo@deploy1003: mszabo: Continuing with sync
  • 10:36 mszabo@deploy1003: mszabo: Backport for Register mediawiki.product_metrics.special_create_account stream (T394744) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T399249)', diff saved to https://phabricator.wikimedia.org/P79093 and previous config saved to /var/cache/conftool/dbconfig/20250715-103641-marostegui.json
  • 10:32 mszabo@deploy1003: Started scap sync-world: Backport for Register mediawiki.product_metrics.special_create_account stream (T394744)
  • 10:28 mszabo@deploy1003: Sync cancelled.
  • 10:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet
  • 10:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2005.codfw.wmnet
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79090 and previous config saved to /var/cache/conftool/dbconfig/20250715-102500-root.json
  • 10:23 mszabo@deploy1003: mszabo: Backport for Configure Special:CreateAccount instrument (T394744) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:22 moritzm: installing debian-archive-keyring updates from Bookworm point release
  • 10:20 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
  • 10:19 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp2005.codfw.wmnet
  • 10:19 mszabo@deploy1003: Started scap sync-world: Backport for Configure Special:CreateAccount instrument (T394744)
  • 10:17 XioNoX: magru: setup BGP to Ufinet - T389767
  • 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T399249)', diff saved to https://phabricator.wikimedia.org/P79088 and previous config saved to /var/cache/conftool/dbconfig/20250715-101135-marostegui.json
  • 10:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T399249)', diff saved to https://phabricator.wikimedia.org/P79087 and previous config saved to /var/cache/conftool/dbconfig/20250715-101113-marostegui.json
  • 10:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79086 and previous config saved to /var/cache/conftool/dbconfig/20250715-100955-root.json
  • 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P79085 and previous config saved to /var/cache/conftool/dbconfig/20250715-100607-root.json
  • 10:06 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:05 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:05 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:05 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:04 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:04 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:56 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:56 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P79084 and previous config saved to /var/cache/conftool/dbconfig/20250715-095605-marostegui.json
  • 09:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P79083 and previous config saved to /var/cache/conftool/dbconfig/20250715-095449-root.json
  • 09:51 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=inference,name=eqiad
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P79082 and previous config saved to /var/cache/conftool/dbconfig/20250715-095101-root.json
  • 09:48 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:48 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 09:47 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 09:47 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:42 btullis@cumin1003: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
  • 09:41 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 09:41 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P79080 and previous config saved to /var/cache/conftool/dbconfig/20250715-094058-marostegui.json
  • 09:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P79079 and previous config saved to /var/cache/conftool/dbconfig/20250715-093943-root.json
  • 09:39 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 09:39 marostegui: Restart mariadb on pc3 T399540
  • 09:39 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 09:38 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:38 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 09:38 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 09:36 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P79076 and previous config saved to /var/cache/conftool/dbconfig/20250715-093556-root.json
  • 09:34 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:33 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1258 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P79074 and previous config saved to /var/cache/conftool/dbconfig/20250715-093200-marostegui.json
  • 09:31 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1258.eqiad.wmnet with reason: Maintenance
  • 09:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:30 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:29 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:28 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
  • 09:27 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T399249)', diff saved to https://phabricator.wikimedia.org/P79073 and previous config saved to /var/cache/conftool/dbconfig/20250715-092551-marostegui.json
  • 09:24 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:21 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P79072 and previous config saved to /var/cache/conftool/dbconfig/20250715-092050-root.json
  • 09:19 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
  • 09:19 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 09:19 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 09:18 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 09:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:17 marostegui: Restart mariadb on pc2 T399540
  • 09:16 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 09:16 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 09:14 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1246 T399449', diff saved to https://phabricator.wikimedia.org/P79068 and previous config saved to /var/cache/conftool/dbconfig/20250715-091328-marostegui.json
  • 09:13 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 09:12 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1003.wikimedia.org with OS bookworm
  • 09:11 btullis@cumin1003: START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
  • 09:10 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 09:09 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1185.eqiad.wmnet onto db1230.eqiad.wmnet
  • 09:09 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1185 gradually with 4 steps - Pool db1185.eqiad.wmnet in after cloning
  • 09:01 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 09:01 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T399249)', diff saved to https://phabricator.wikimedia.org/P79062 and previous config saved to /var/cache/conftool/dbconfig/20250715-090055-marostegui.json
  • 09:00 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T399249)', diff saved to https://phabricator.wikimedia.org/P79061 and previous config saved to /var/cache/conftool/dbconfig/20250715-090021-marostegui.json
  • 08:54 marostegui: Restart mariadb on pc1 T399540
  • 08:53 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 08:53 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 08:48 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
  • 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P79058 and previous config saved to /var/cache/conftool/dbconfig/20250715-084513-marostegui.json
  • 08:43 jelto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
  • 08:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P79054 and previous config saved to /var/cache/conftool/dbconfig/20250715-083006-marostegui.json
  • 08:27 elukey@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:27 elukey@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:27 jelto@cumin1003: START - Cookbook sre.hosts.reimage for host gitlab1003.wikimedia.org with OS bookworm
  • 08:26 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=inference,name=eqiad
  • 08:23 marostegui@cumin1002: START - Cookbook sre.mysql.pool db1185 gradually with 4 steps - Pool db1185.eqiad.wmnet in after cloning
  • 08:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T399249)', diff saved to https://phabricator.wikimedia.org/P79051 and previous config saved to /var/cache/conftool/dbconfig/20250715-081458-marostegui.json
  • 07:51 XioNoX: more Bird test on ganeti2034 & testvm2006 - T362392
  • 07:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T399249)', diff saved to https://phabricator.wikimedia.org/P79049 and previous config saved to /var/cache/conftool/dbconfig/20250715-074851-marostegui.json
  • 07:48 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T399249)', diff saved to https://phabricator.wikimedia.org/P79048 and previous config saved to /var/cache/conftool/dbconfig/20250715-074829-marostegui.json
  • 07:38 vgutierrez: use GTS alt chain for the measure cert on cp[7013-7016] - T398596
  • 07:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P79047 and previous config saved to /var/cache/conftool/dbconfig/20250715-073322-marostegui.json
  • 07:20 moritzm: installing rubygems security updates
  • 07:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P79046 and previous config saved to /var/cache/conftool/dbconfig/20250715-071813-marostegui.json
  • 07:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T399249)', diff saved to https://phabricator.wikimedia.org/P79045 and previous config saved to /var/cache/conftool/dbconfig/20250715-070305-marostegui.json
  • 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T399249)', diff saved to https://phabricator.wikimedia.org/P79044 and previous config saved to /var/cache/conftool/dbconfig/20250715-063651-marostegui.json
  • 06:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1185 - Depool db1185.eqiad.wmnet to then clone it to db1230.eqiad.wmnet - marostegui@cumin1002
  • 06:29 marostegui@cumin1002: START - Cookbook sre.mysql.depool db1185 - Depool db1185.eqiad.wmnet to then clone it to db1230.eqiad.wmnet - marostegui@cumin1002
  • 06:29 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1185.eqiad.wmnet onto db1230.eqiad.wmnet
  • 06:29 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 2394 hosts
  • 06:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1230.eqiad.wmnet with reason: maintenance
  • 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1230 T399446', diff saved to https://phabricator.wikimedia.org/P79042 and previous config saved to /var/cache/conftool/dbconfig/20250715-060600-root.json
  • 06:05 marostegui@dns1006: END - running authdns-update
  • 06:04 marostegui@dns1006: START - running authdns-update
  • 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1210 to s5 primary and set section read-write T399446', diff saved to https://phabricator.wikimedia.org/P79041 and previous config saved to /var/cache/conftool/dbconfig/20250715-060223-marostegui.json
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T399446', diff saved to https://phabricator.wikimedia.org/P79040 and previous config saved to /var/cache/conftool/dbconfig/20250715-060114-root.json
  • 05:54 marostegui: Starting s5 eqiad failover from db1230 to db1210 - T399446
  • 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1210 with weight 0 T399446', diff saved to https://phabricator.wikimedia.org/P79039 and previous config saved to /var/cache/conftool/dbconfig/20250715-055011-root.json
  • 05:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s5 T399446
  • 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.7 (duration: 01m 42s)
  • 03:48 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.10 refs T392180 (duration: 45m 36s)
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.10 refs T392180

2025-07-14

  • 23:53 zabe@deploy1003: Finished scap sync-world: Backport for Disable categorylinks read new on wikis which depend on missing index (duration: 09m 09s)
  • 23:47 zabe@deploy1003: zabe: Continuing with sync
  • 23:45 zabe@deploy1003: zabe: Backport for Disable categorylinks read new on wikis which depend on missing index synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:44 zabe@deploy1003: Started scap sync-world: Backport for Disable categorylinks read new on wikis which depend on missing index
  • 23:35 zabe@deploy1003: Finished scap sync-world: Backport for Set categorylinks to read new on more wikis (T397912) (duration: 08m 26s)
  • 23:29 zabe@deploy1003: zabe: Continuing with sync
  • 23:28 zabe@deploy1003: zabe: Backport for Set categorylinks to read new on more wikis (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:27 zabe@deploy1003: Started scap sync-world: Backport for Set categorylinks to read new on more wikis (T397912)
  • 22:28 ryankemper@cumin1003: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 21:28 ryankemper@cumin1003: START - Cookbook sre.wdqs.restart
  • 20:57 dancy@deploy1003: Installation of scap version "4.188.2" completed for 2 hosts
  • 20:56 dancy@deploy1003: Installing scap version "4.188.2" for 2 host(s)
  • 20:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bookworm
  • 20:49 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 20:46 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Readers Use Cases Survey: Set token param name (T398870) (duration: 09m 30s)
  • 20:45 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
  • 20:40 dreamyjazz@deploy1003: dani, dreamyjazz: Continuing with sync
  • 20:38 dreamyjazz@deploy1003: dani, dreamyjazz: Backport for Readers Use Cases Survey: Set token param name (T398870) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:36 dreamyjazz@deploy1003: Started scap sync-world: Backport for Readers Use Cases Survey: Set token param name (T398870)
  • 20:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage
  • 20:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage
  • 20:24 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Set hCaptcha config (T382148) (duration: 13m 14s)
  • 20:18 dreamyjazz@deploy1003: dreamyjazz, reedy: Continuing with sync
  • 20:13 dreamyjazz@deploy1003: dreamyjazz, reedy: Backport for Set hCaptcha config (T382148) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:11 dreamyjazz@deploy1003: Started scap sync-world: Backport for Set hCaptcha config (T382148)
  • 20:08 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bookworm
  • 20:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1017.eqiad.wmnet with OS bullseye
  • 19:58 dancy@deploy1003: Installing scap version "4.188.0" for 1 host(s)
  • 19:42 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 18:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T399249)', diff saved to https://phabricator.wikimedia.org/P79037 and previous config saved to /var/cache/conftool/dbconfig/20250714-185800-marostegui.json
  • 18:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P79036 and previous config saved to /var/cache/conftool/dbconfig/20250714-184253-marostegui.json
  • 18:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P79035 and previous config saved to /var/cache/conftool/dbconfig/20250714-182745-marostegui.json
  • 18:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T399249)', diff saved to https://phabricator.wikimedia.org/P79034 and previous config saved to /var/cache/conftool/dbconfig/20250714-181238-marostegui.json
  • 17:59 eevans@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on aqs1012.eqiad.wmnet with reason: Drive replacement
  • 17:51 sukhe: sudo cumin -b31 "A:cp" "run-puppet-agent --enable 'merging CR 1167686'"
  • 17:49 sukhe: sudo cumin "A:cp" "disable-puppet 'merging CR 1167686'"
  • 17:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2229 (T399249)', diff saved to https://phabricator.wikimedia.org/P79032 and previous config saved to /var/cache/conftool/dbconfig/20250714-173554-marostegui.json
  • 17:35 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 17:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T399249)', diff saved to https://phabricator.wikimedia.org/P79031 and previous config saved to /var/cache/conftool/dbconfig/20250714-173531-marostegui.json
  • 17:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P79029 and previous config saved to /var/cache/conftool/dbconfig/20250714-170517-marostegui.json
  • 16:50 dancy@deploy1003: Installation of scap version "4.188.1" completed for 2 hosts
  • 16:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T399249)', diff saved to https://phabricator.wikimedia.org/P79028 and previous config saved to /var/cache/conftool/dbconfig/20250714-165009-marostegui.json
  • 16:48 dancy@deploy1003: Installing scap version "4.188.1" for 2 host(s)
  • 16:39 fceratto@cumin1002: dbctl restore of MediaWiki config (dc=all) from a
  • 16:32 dancy@deploy1003: Installing scap version "4.188.0" for 1 host(s)
  • 16:27 dancy@deploy1003: Installing scap version "4.188.0" for 180 host(s)
  • 16:25 dancy@deploy1003: Installing scap version "4.188.0" for 2 host(s)
  • 16:24 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 16:24 dancy@deploy1003: Installing scap version "4.188.0" for 180 host(s)
  • 16:19 zabe@deploy1003: Finished scap sync-world: Backport for Fix join conditions in categorylinks read new code (T399431) (duration: 08m 04s)
  • 16:14 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 16:14 zabe@deploy1003: zabe: Continuing with sync
  • 16:13 zabe@deploy1003: zabe: Backport for Fix join conditions in categorylinks read new code (T399431) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:11 zabe@deploy1003: Started scap sync-world: Backport for Fix join conditions in categorylinks read new code (T399431)
  • 16:06 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 16:01 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2001
  • 16:01 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest2001
  • 15:56 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 15:46 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2001
  • 15:46 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2001
  • 15:46 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1259 gradually with 4 steps - Pooling in
  • 15:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T399249)', diff saved to https://phabricator.wikimedia.org/P79025 and previous config saved to /var/cache/conftool/dbconfig/20250714-154346-marostegui.json
  • 15:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 15:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T399249)', diff saved to https://phabricator.wikimedia.org/P79024 and previous config saved to /var/cache/conftool/dbconfig/20250714-154322-marostegui.json
  • 15:37 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 15:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P79022 and previous config saved to /var/cache/conftool/dbconfig/20250714-152815-marostegui.json
  • 15:27 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: sync
  • 15:22 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=tegola-vector-tiles,name=codfw
  • 15:21 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=tegola,name=codfw
  • 15:17 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2005-dev.codfw.wmnet with OS bookworm
  • 15:15 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1259 gradually with 4 steps - Pooling in
  • 15:15 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1259 gradually with 4 steps - Pooling in
  • 15:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P79020 and previous config saved to /var/cache/conftool/dbconfig/20250714-151308-marostegui.json
  • 15:06 sukhe@dns1004: END - running authdns-update
  • 15:05 sukhe@dns1004: START - running authdns-update
  • 15:04 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7002.magru.wmnet [reason: repooling after reboot]
  • 15:02 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1259 gradually with 4 steps - Pooling in
  • 15:01 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1259.eqiad.wmnet
  • 15:01 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for db1259.eqiad.wmnet
  • 14:59 btullis@cumin1003: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T399249)', diff saved to https://phabricator.wikimedia.org/P79018 and previous config saved to /var/cache/conftool/dbconfig/20250714-145800-marostegui.json
  • 14:55 sukhe: sudo cumin 'O:alerting_host' 'run-puppet-agent' :T399114
  • 14:54 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
  • 14:50 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
  • 14:33 sukhe: sudo cumin "A:cp" "disable-puppet 'merging CR 1167695'"
  • 14:32 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1259 gradually with 4 steps - Pooling in
  • 14:32 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1259 gradually with 4 steps - Pooling in
  • 14:31 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1259 gradually with 4 steps - Pooling in
  • 14:31 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2005-dev.codfw.wmnet with OS bookworm
  • 14:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T399249)', diff saved to https://phabricator.wikimedia.org/P79017 and previous config saved to /var/cache/conftool/dbconfig/20250714-142103-marostegui.json
  • 14:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 14:08 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 14:08 btullis@cumin1003: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:07 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 14:06 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:05 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for typeahead: Add hook to augment api parameters (T397732), search: Augment typeahead url with test parameters (T397732) (duration: 09m 27s)
  • 13:59 lucaswerkmeister-wmde@deploy1003: ebernhardson, lucaswerkmeister-wmde: Continuing with sync
  • 13:58 lucaswerkmeister-wmde@deploy1003: ebernhardson, lucaswerkmeister-wmde: Backport for typeahead: Add hook to augment api parameters (T397732), search: Augment typeahead url with test parameters (T397732) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:56 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for typeahead: Add hook to augment api parameters (T397732), search: Augment typeahead url with test parameters (T397732)
  • 13:54 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1259 gradually with 4 steps - Pooling in
  • 13:50 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1259 - Pooling in
  • 13:50 fceratto@cumin1002: START - Cookbook sre.mysql.depool db1259 - Pooling in
  • 13:50 sukhe: sudo cumin -b11 "A:cp" "run-puppet-agent --enable 'merging CR 1167266'"
  • 13:47 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 13:47 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 13:44 elukey: roll restart eventgate-main pods to pick up a new stream - T381565
  • 13:44 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 13:44 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 13:42 btullis@cumin1003: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 13:40 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T399249)', diff saved to https://phabricator.wikimedia.org/P79016 and previous config saved to /var/cache/conftool/dbconfig/20250714-134026-marostegui.json
  • 13:36 sukhe: sudo cumin "A:cp" "disable-puppet 'merging CR 1167266'"
  • 13:35 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for EventStreamConfig: add the maps.tiles_change_bookworm stream (T381565) (duration: 07m 59s)
  • 13:30 lucaswerkmeister-wmde@deploy1003: elukey, lucaswerkmeister-wmde: Continuing with sync
  • 13:29 lucaswerkmeister-wmde@deploy1003: elukey, lucaswerkmeister-wmde: Backport for EventStreamConfig: add the maps.tiles_change_bookworm stream (T381565) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:27 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for EventStreamConfig: add the maps.tiles_change_bookworm stream (T381565)
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P79015 and previous config saved to /var/cache/conftool/dbconfig/20250714-132518-marostegui.json
  • 13:25 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [Growth]: make limiting add a link available to all wikis (T396382) (duration: 08m 28s)
  • 13:19 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, sgimeno: Continuing with sync
  • 13:18 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, sgimeno: Backport for [Growth]: make limiting add a link available to all wikis (T396382) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:17 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1259.eqiad.wmnet
  • 13:17 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1233 gradually with 4 steps - Pool db1233.eqiad.wmnet in after cloning
  • 13:16 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [Growth]: make limiting add a link available to all wikis (T396382)
  • 13:14 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Deploy Readers Use Cases Survey on enwiki (T398870) (duration: 10m 34s)
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P79013 and previous config saved to /var/cache/conftool/dbconfig/20250714-131011-marostegui.json
  • 13:09 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, dani: Continuing with sync
  • 13:08 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, dani: Backport for Deploy Readers Use Cases Survey on enwiki (T398870) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:04 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Deploy Readers Use Cases Survey on enwiki (T398870)
  • 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1207 (re)pooling @ 100%: Repooling in s5 for the first time T399430', diff saved to https://phabricator.wikimedia.org/P79011 and previous config saved to /var/cache/conftool/dbconfig/20250714-125512-root.json
  • 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T399249)', diff saved to https://phabricator.wikimedia.org/P79010 and previous config saved to /var/cache/conftool/dbconfig/20250714-125504-marostegui.json
  • 12:54 btullis@cumin1003: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 12:40 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2200.codfw.wmnet with reason: MariaDB package upgrade
  • 12:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1207 (re)pooling @ 90%: Repooling in s5 for the first time T399430', diff saved to https://phabricator.wikimedia.org/P79007 and previous config saved to /var/cache/conftool/dbconfig/20250714-124006-root.json
  • 12:38 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 12:38 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 12:34 kart_: machinetranslationt: Use s3 model storage for production (T335491)
  • 12:33 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 12:31 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1233 gradually with 4 steps - Pool db1233.eqiad.wmnet in after cloning
  • 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T399249)', diff saved to https://phabricator.wikimedia.org/P79005 and previous config saved to /var/cache/conftool/dbconfig/20250714-122914-marostegui.json
  • 12:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 12:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T399249)', diff saved to https://phabricator.wikimedia.org/P79004 and previous config saved to /var/cache/conftool/dbconfig/20250714-122852-marostegui.json
  • 12:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P79003 and previous config saved to /var/cache/conftool/dbconfig/20250714-122724-root.json
  • 12:27 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 12:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1207 (re)pooling @ 75%: Repooling in s5 for the first time T399430', diff saved to https://phabricator.wikimedia.org/P79002 and previous config saved to /var/cache/conftool/dbconfig/20250714-122500-root.json
  • 12:24 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 12:19 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 12:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P79001 and previous config saved to /var/cache/conftool/dbconfig/20250714-121344-marostegui.json
  • 12:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P79000 and previous config saved to /var/cache/conftool/dbconfig/20250714-121218-root.json
  • 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1207 (re)pooling @ 60%: Repooling in s5 for the first time T399430', diff saved to https://phabricator.wikimedia.org/P78999 and previous config saved to /var/cache/conftool/dbconfig/20250714-120955-root.json
  • 12:02 samtar@deploy1003: Finished scap sync-world: Backport for IS: Set wgTemplateDataEnableCategoryBrowser default enabled (T391064) (duration: 35m 30s)
  • 11:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P78998 and previous config saved to /var/cache/conftool/dbconfig/20250714-115836-marostegui.json
  • 11:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78997 and previous config saved to /var/cache/conftool/dbconfig/20250714-115713-root.json
  • 11:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1207 (re)pooling @ 50%: Repooling in s5 for the first time T399430', diff saved to https://phabricator.wikimedia.org/P78996 and previous config saved to /var/cache/conftool/dbconfig/20250714-115449-root.json
  • 11:48 samtar@deploy1003: samtar: Continuing with sync
  • 11:48 samtar@deploy1003: samtar: Backport for IS: Set wgTemplateDataEnableCategoryBrowser default enabled (T391064) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2244 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78995 and previous config saved to /var/cache/conftool/dbconfig/20250714-114350-root.json
  • 11:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T399249)', diff saved to https://phabricator.wikimedia.org/P78994 and previous config saved to /var/cache/conftool/dbconfig/20250714-114329-marostegui.json
  • 11:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78993 and previous config saved to /var/cache/conftool/dbconfig/20250714-114207-root.json
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1207 (re)pooling @ 30%: Repooling in s5 for the first time T399430', diff saved to https://phabricator.wikimedia.org/P78992 and previous config saved to /var/cache/conftool/dbconfig/20250714-113943-root.json
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2162 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78991 and previous config saved to /var/cache/conftool/dbconfig/20250714-113418-marostegui.json
  • 11:34 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2244 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78990 and previous config saved to /var/cache/conftool/dbconfig/20250714-112844-root.json
  • 11:26 samtar@deploy1003: Started scap sync-world: Backport for IS: Set wgTemplateDataEnableCategoryBrowser default enabled (T391064)
  • 11:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1207 (re)pooling @ 25%: Repooling in s5 for the first time T399430', diff saved to https://phabricator.wikimedia.org/P78989 and previous config saved to /var/cache/conftool/dbconfig/20250714-112438-root.json
  • 11:18 marostegui@cumin1002: dbctl commit (dc=all): 'es1047 (re)pooling @ 100%: Pooling for the first time in es6 T395771', diff saved to https://phabricator.wikimedia.org/P78988 and previous config saved to /var/cache/conftool/dbconfig/20250714-111808-root.json
  • 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T399249)', diff saved to https://phabricator.wikimedia.org/P78987 and previous config saved to /var/cache/conftool/dbconfig/20250714-111410-marostegui.json
  • 11:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T399249)', diff saved to https://phabricator.wikimedia.org/P78986 and previous config saved to /var/cache/conftool/dbconfig/20250714-111346-marostegui.json
  • 11:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2244 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78985 and previous config saved to /var/cache/conftool/dbconfig/20250714-111339-root.json
  • 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1207 (re)pooling @ 10%: Repooling in s5 for the first time T399430', diff saved to https://phabricator.wikimedia.org/P78984 and previous config saved to /var/cache/conftool/dbconfig/20250714-110932-root.json
  • 11:03 marostegui@cumin1002: dbctl commit (dc=all): 'es1047 (re)pooling @ 90%: Pooling for the first time in es6 T395771', diff saved to https://phabricator.wikimedia.org/P78983 and previous config saved to /var/cache/conftool/dbconfig/20250714-110302-root.json
  • 10:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P78982 and previous config saved to /var/cache/conftool/dbconfig/20250714-105839-marostegui.json
  • 10:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2244 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78981 and previous config saved to /var/cache/conftool/dbconfig/20250714-105833-root.json
  • 10:57 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts sretest2008.codfw.wmnet
  • 10:57 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:57 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 10:57 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2008.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 10:55 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1259.eqiad.wmnet
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1207 (re)pooling @ 5%: Repooling in s5 for the first time T399430', diff saved to https://phabricator.wikimedia.org/P78980 and previous config saved to /var/cache/conftool/dbconfig/20250714-105427-root.json
  • 10:54 fceratto@cumin1002: dbctl commit (dc=all): 'Set db1259 T393296', diff saved to https://phabricator.wikimedia.org/P78979 and previous config saved to /var/cache/conftool/dbconfig/20250714-105416-fceratto.json
  • 10:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2244 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78978 and previous config saved to /var/cache/conftool/dbconfig/20250714-105118-marostegui.json
  • 10:51 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2244.codfw.wmnet with reason: Maintenance
  • 10:51 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 10:49 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1233.eqiad.wmnet onto db1259.eqiad.wmnet
  • 10:49 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1233 - Depool db1233.eqiad.wmnet to then clone it to db1259.eqiad.wmnet - fceratto@cumin1002
  • 10:48 fceratto@cumin1002: START - Cookbook sre.mysql.depool db1233 - Depool db1233.eqiad.wmnet to then clone it to db1259.eqiad.wmnet - fceratto@cumin1002
  • 10:48 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1259.eqiad.wmnet
  • 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'es1047 (re)pooling @ 75%: Pooling for the first time in es6 T395771', diff saved to https://phabricator.wikimedia.org/P78976 and previous config saved to /var/cache/conftool/dbconfig/20250714-104756-root.json
  • 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78975 and previous config saved to /var/cache/conftool/dbconfig/20250714-104752-root.json
  • 10:43 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts sretest2008.codfw.wmnet
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P78974 and previous config saved to /var/cache/conftool/dbconfig/20250714-104332-marostegui.json
  • 10:42 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sretest2007.codfw.wmnet
  • 10:42 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:42 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2007.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 10:42 moritzm: installing glibc security updates on bullseye
  • 10:42 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2007.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1207 (re)pooling @ 1%: Repooling in s5 for the first time T399430', diff saved to https://phabricator.wikimedia.org/P78973 and previous config saved to /var/cache/conftool/dbconfig/20250714-103827-root.json
  • 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'Add db1207 to s5 depooled - T399430', diff saved to https://phabricator.wikimedia.org/P78972 and previous config saved to /var/cache/conftool/dbconfig/20250714-103600-marostegui.json
  • 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'es1047 (re)pooling @ 60%: Pooling for the first time in es6 T395771', diff saved to https://phabricator.wikimedia.org/P78971 and previous config saved to /var/cache/conftool/dbconfig/20250714-103251-root.json
  • 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78970 and previous config saved to /var/cache/conftool/dbconfig/20250714-103247-root.json
  • 10:31 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T399249)', diff saved to https://phabricator.wikimedia.org/P78969 and previous config saved to /var/cache/conftool/dbconfig/20250714-102824-marostegui.json
  • 10:17 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts sretest2007.codfw.wmnet
  • 10:17 marostegui@cumin1002: dbctl commit (dc=all): 'es1047 (re)pooling @ 50%: Pooling for the first time in es6 T395771', diff saved to https://phabricator.wikimedia.org/P78968 and previous config saved to /var/cache/conftool/dbconfig/20250714-101745-root.json
  • 10:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78967 and previous config saved to /var/cache/conftool/dbconfig/20250714-101741-root.json
  • 10:17 fceratto@cumin1002: dbctl commit (dc=all): 'Commit db1259 again', diff saved to https://phabricator.wikimedia.org/P78966 and previous config saved to /var/cache/conftool/dbconfig/20250714-101735-fceratto.json
  • 09:59 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 09:59 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 09:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78965 and previous config saved to /var/cache/conftool/dbconfig/20250714-095658-root.json
  • 09:56 fceratto@cumin1002: dbctl commit (dc=all): 'Add db1259 T393296', diff saved to https://phabricator.wikimedia.org/P78964 and previous config saved to /var/cache/conftool/dbconfig/20250714-095649-fceratto.json
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'es1047 (re)pooling @ 40%: Pooling for the first time in es6 T395771', diff saved to https://phabricator.wikimedia.org/P78963 and previous config saved to /var/cache/conftool/dbconfig/20250714-095520-root.json
  • 09:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 09:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 09:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78962 and previous config saved to /var/cache/conftool/dbconfig/20250714-094901-root.json
  • 09:48 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:48 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T399249)', diff saved to https://phabricator.wikimedia.org/P78961 and previous config saved to /var/cache/conftool/dbconfig/20250714-094646-marostegui.json
  • 09:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T399249)', diff saved to https://phabricator.wikimedia.org/P78960 and previous config saved to /var/cache/conftool/dbconfig/20250714-094621-marostegui.json
  • 09:46 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bookworm
  • 09:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78959 and previous config saved to /var/cache/conftool/dbconfig/20250714-094132-root.json
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1161 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78958 and previous config saved to /var/cache/conftool/dbconfig/20250714-094048-marostegui.json
  • 09:40 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'es1047 (re)pooling @ 25%: Pooling for the first time in es6 T395771', diff saved to https://phabricator.wikimedia.org/P78957 and previous config saved to /var/cache/conftool/dbconfig/20250714-094014-root.json
  • 09:36 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1200.eqiad.wmnet onto db1207.eqiad.wmnet
  • 09:36 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1200 gradually with 4 steps - Pool db1200.eqiad.wmnet in after cloning
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P78955 and previous config saved to /var/cache/conftool/dbconfig/20250714-093114-marostegui.json
  • 09:28 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 09:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 10 hosts with reason: Maintenance
  • 09:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78954 and previous config saved to /var/cache/conftool/dbconfig/20250714-092626-root.json
  • 09:25 jelto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 09:25 marostegui@cumin1002: dbctl commit (dc=all): 'es1047 (re)pooling @ 10%: Pooling for the first time in es6 T395771', diff saved to https://phabricator.wikimedia.org/P78953 and previous config saved to /var/cache/conftool/dbconfig/20250714-092508-root.json
  • 09:16 marostegui: Stop mariadb on db1154 for migration, there will be lag on s1, s3, s5, s8 and x3 T398928
  • 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P78951 and previous config saved to /var/cache/conftool/dbconfig/20250714-091605-marostegui.json
  • 09:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 9 hosts with reason: Maintenance
  • 09:13 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 09:12 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 09:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2243 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78950 and previous config saved to /var/cache/conftool/dbconfig/20250714-091121-root.json
  • 09:10 marostegui@cumin1002: dbctl commit (dc=all): 'es1047 (re)pooling @ 5%: Pooling for the first time in es6 T395771', diff saved to https://phabricator.wikimedia.org/P78949 and previous config saved to /var/cache/conftool/dbconfig/20250714-091003-root.json
  • 09:08 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 09:06 jelto@cumin1003: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bookworm
  • 09:04 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2243.codfw.wmnet with reason: Maintenance
  • 09:01 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2243.codfw.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T399249)', diff saved to https://phabricator.wikimedia.org/P78947 and previous config saved to /var/cache/conftool/dbconfig/20250714-090057-marostegui.json
  • 08:58 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2243 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78946 and previous config saved to /var/cache/conftool/dbconfig/20250714-085556-marostegui.json
  • 08:55 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2243.codfw.wmnet with reason: Maintenance
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'es1047 (re)pooling @ 1%: Pooling for the first time in es6 T395771', diff saved to https://phabricator.wikimedia.org/P78945 and previous config saved to /var/cache/conftool/dbconfig/20250714-085457-root.json
  • 08:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:51 marostegui@cumin1002: START - Cookbook sre.mysql.pool db1200 gradually with 4 steps - Pool db1200.eqiad.wmnet in after cloning
  • 08:50 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sretest1001.eqiad.wmnet
  • 08:50 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:50 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 08:49 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'Add es1047 to es6 depooled T395771', diff saved to https://phabricator.wikimedia.org/P78943 and previous config saved to /var/cache/conftool/dbconfig/20250714-084506-marostegui.json
  • 08:39 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 08:32 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts sretest1001.eqiad.wmnet
  • 08:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1259.eqiad.wmnet with reason: New host setup
  • 08:26 volans@cumin2002: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 93
  • 08:25 volans@cumin2002: START - Cookbook sre.network.debug for Netbox circuit ID 93
  • 08:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T399249)', diff saved to https://phabricator.wikimedia.org/P78942 and previous config saved to /var/cache/conftool/dbconfig/20250714-081939-marostegui.json
  • 08:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 08:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T399249)', diff saved to https://phabricator.wikimedia.org/P78941 and previous config saved to /var/cache/conftool/dbconfig/20250714-081917-marostegui.json
  • 08:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P78940 and previous config saved to /var/cache/conftool/dbconfig/20250714-080409-marostegui.json
  • 07:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P78939 and previous config saved to /var/cache/conftool/dbconfig/20250714-074902-marostegui.json
  • 07:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T399249)', diff saved to https://phabricator.wikimedia.org/P78938 and previous config saved to /var/cache/conftool/dbconfig/20250714-073354-marostegui.json
  • 07:21 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 07:20 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 07:19 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
  • 07:08 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
  • 06:54 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dale Zhou out of all services on: 2395 hosts
  • 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T399249)', diff saved to https://phabricator.wikimedia.org/P78937 and previous config saved to /var/cache/conftool/dbconfig/20250714-065240-marostegui.json
  • 06:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 06:41 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1200 - Depool db1200.eqiad.wmnet to then clone it to db1207.eqiad.wmnet - marostegui@cumin1002
  • 06:41 marostegui@cumin1002: START - Cookbook sre.mysql.depool db1200 - Depool db1200.eqiad.wmnet to then clone it to db1207.eqiad.wmnet - marostegui@cumin1002
  • 06:41 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1200.eqiad.wmnet onto db1207.eqiad.wmnet
  • 06:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 06:23 marostegui: Failover m1 from db1207 to db1213 - T399172
  • 06:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2232].codfw.wmnet,db[1207,1213,1217].eqiad.wmnet with reason: Primary switchover m1 T399172
  • 05:50 amastilovic@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 05:49 amastilovic@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 05:45 amastilovic@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 05:44 amastilovic@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 05:40 amastilovic@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 05:39 amastilovic@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 05:35 amastilovic@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 05:34 amastilovic@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply

2025-07-13

  • 18:04 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 17:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 17:36 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 17:17 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 14:11 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 13:47 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 13:42 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 13:24 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 13:23 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudnet2006-dev.codfw.wmnet with OS bullseye
  • 13:14 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS bullseye

2025-07-12

  • 21:04 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1259.eqiad.wmnet with OS bookworm
  • 21:04 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 21:04 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 20:48 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1259.eqiad.wmnet with reason: host reimage
  • 20:42 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1259.eqiad.wmnet with reason: host reimage
  • 20:25 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host db1259.eqiad.wmnet with OS bookworm
  • 20:25 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1259.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:11 vriley@cumin1002: START - Cookbook sre.hosts.provision for host db1259.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:10 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1259.eqiad.wmnet with OS bookworm
  • 19:53 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host db1259.eqiad.wmnet with OS bookworm
  • 19:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1259.eqiad.wmnet with OS bookworm
  • 08:47 moritzm: restarted Tomcat on idp1004

2025-07-11

  • 22:26 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host db1259.eqiad.wmnet with OS bookworm
  • 22:10 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1259.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host db1259.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:49 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1259
  • 21:48 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host db1259
  • 21:46 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:46 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db1259 - vriley@cumin1002"
  • 21:46 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db1259 - vriley@cumin1002"
  • 21:43 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 18:24 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1008.eqiad.wmnet with OS bullseye
  • 18:23 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1007.eqiad.wmnet with OS bullseye
  • 18:20 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 18:09 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
  • 18:07 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
  • 18:06 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
  • 18:03 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
  • 18:03 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 17:57 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 17:48 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye
  • 17:45 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1007.eqiad.wmnet with OS bullseye
  • 17:39 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 17:32 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 17:11 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 17:10 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqsin [reason: done testing issues with primary arelion link, T399221]
  • 17:10 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site eqsin [reason: done testing issues with primary arelion link, T399221]
  • 16:51 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 16:51 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 16:46 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1036.eqiad.wmnet with OS bullseye
  • 16:31 topranks: drain Arelion CCT from codfw to eqsin - still see minor packet loss which is affecting purged T399221
  • 16:19 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 16:16 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 15:56 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bullseye
  • 15:54 topranks: un-drain Arelion CCT from codfw to eqsin T399221
  • 15:44 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 15:44 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 15:39 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:38 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bullseye
  • 15:38 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:36 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:35 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:28 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqsin [reason: testing issues with primary arelion link, T399221]
  • 15:28 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: testing issues with primary arelion link, T399221]
  • 15:06 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 14:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78933 and previous config saved to /var/cache/conftool/dbconfig/20250711-145205-root.json
  • 14:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
  • 14:44 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
  • 14:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78932 and previous config saved to /var/cache/conftool/dbconfig/20250711-143659-root.json
  • 14:27 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 14:25 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 14:25 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 14:24 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 14:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78931 and previous config saved to /var/cache/conftool/dbconfig/20250711-142154-root.json
  • 14:10 akosiaris: sudo swapoff /dev/md1 on cloudcephosd1036 T399281
  • 14:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78930 and previous config saved to /var/cache/conftool/dbconfig/20250711-140648-root.json
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2242 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78929 and previous config saved to /var/cache/conftool/dbconfig/20250711-135919-marostegui.json
  • 13:59 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2242.codfw.wmnet with reason: Maintenance
  • 13:55 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 13:48 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 13:48 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 13:41 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 13:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78927 and previous config saved to /var/cache/conftool/dbconfig/20250711-133539-root.json
  • 13:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78925 and previous config saved to /var/cache/conftool/dbconfig/20250711-132034-root.json
  • 13:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1050.eqiad.wmnet with OS bookworm
  • 13:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1049.eqiad.wmnet with OS bookworm
  • 13:11 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 13:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1051.eqiad.wmnet with OS bookworm
  • 13:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78924 and previous config saved to /var/cache/conftool/dbconfig/20250711-130528-root.json
  • 13:03 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1034 gradually with 4 steps - Pooling in
  • 13:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 12:57 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 12:57 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 12:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 12:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 12:52 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS trixie
  • 12:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78921 and previous config saved to /var/cache/conftool/dbconfig/20250711-125022-root.json
  • 12:49 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 12:49 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 12:49 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 12:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2187 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78919 and previous config saved to /var/cache/conftool/dbconfig/20250711-124249-marostegui.json
  • 12:42 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:38 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 12:33 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 12:30 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) es1032 - Depooling RO host
  • 12:30 fceratto@cumin1002: START - Cookbook sre.mysql.depool es1032 - Depooling RO host
  • 12:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1050.eqiad.wmnet with OS bookworm
  • 12:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bookworm
  • 12:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1049.eqiad.wmnet with OS bookworm
  • 12:28 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) es1032 - Depooling RO host
  • 12:28 fceratto@cumin1002: START - Cookbook sre.mysql.depool es1032 - Depooling RO host
  • 12:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1032.eqiad.wmnet with reason: Maintenance
  • 12:24 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:24 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:22 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:22 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:20 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 12:20 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS trixie
  • 12:20 fceratto@cumin1002: START - Cookbook sre.mysql.pool es1034 gradually with 4 steps - Pooling in
  • 12:20 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) es1034 gradually with 4 steps - Pooling in
  • 12:20 fceratto@cumin1002: START - Cookbook sre.mysql.pool es1034 gradually with 4 steps - Pooling in
  • 12:20 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) es1034 gradually with 4 steps - Pooling in
  • 12:20 fceratto@cumin1002: START - Cookbook sre.mysql.pool es1034 gradually with 4 steps - Pooling in
  • 12:19 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:19 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:17 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 12:17 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1034.eqiad.wmnet
  • 12:17 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for es1034.eqiad.wmnet
  • 12:06 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:06 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:04 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:04 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:03 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es1034.eqiad.wmnet
  • 12:01 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:01 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:52 fceratto@cumin1002: START - Cookbook sre.hosts.reboot-single for host es1034.eqiad.wmnet
  • 11:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78916 and previous config saved to /var/cache/conftool/dbconfig/20250711-114439-root.json
  • 11:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depool es1034 for upgrade', diff saved to https://phabricator.wikimedia.org/P78915 and previous config saved to /var/cache/conftool/dbconfig/20250711-113532-fceratto.json
  • 11:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1034.eqiad.wmnet with reason: Maintenance
  • 11:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1032.eqiad.wmnet with reason: Maintenance
  • 11:30 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1031.eqiad.wmnet
  • 11:30 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for es1031.eqiad.wmnet
  • 11:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 11:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78914 and previous config saved to /var/cache/conftool/dbconfig/20250711-112933-root.json
  • 11:26 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 11:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1031.eqiad.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78913 and previous config saved to /var/cache/conftool/dbconfig/20250711-111428-root.json
  • 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78912 and previous config saved to /var/cache/conftool/dbconfig/20250711-105922-root.json
  • 10:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78911 and previous config saved to /var/cache/conftool/dbconfig/20250711-105039-root.json
  • 10:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78910 and previous config saved to /var/cache/conftool/dbconfig/20250711-103533-root.json
  • 10:32 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 10:32 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 10:31 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 10:31 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 10:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:26 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1003.eqiad.wmnet with OS trixie
  • 10:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78909 and previous config saved to /var/cache/conftool/dbconfig/20250711-102027-root.json
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78908 and previous config saved to /var/cache/conftool/dbconfig/20250711-100522-root.json
  • 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2192', diff saved to https://phabricator.wikimedia.org/P78907 and previous config saved to /var/cache/conftool/dbconfig/20250711-100106-root.json
  • 10:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78906 and previous config saved to /var/cache/conftool/dbconfig/20250711-100033-root.json
  • 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78905 and previous config saved to /var/cache/conftool/dbconfig/20250711-094527-root.json
  • 09:39 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2192 T399280', diff saved to https://phabricator.wikimedia.org/P78904 and previous config saved to /var/cache/conftool/dbconfig/20250711-093115-root.json
  • 09:30 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2213 to s5 primary T399280', diff saved to https://phabricator.wikimedia.org/P78903 and previous config saved to /var/cache/conftool/dbconfig/20250711-093006-marostegui.json
  • 09:29 marostegui: Starting s5 codfw failover from db2192 to db2213 - T399280
  • 09:27 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS trixie
  • 09:25 moritzm: imported perccli for trixie-wikimedia T391083
  • 09:18 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2213 from API/vslow/dump T399280', diff saved to https://phabricator.wikimedia.org/P78902 and previous config saved to /var/cache/conftool/dbconfig/20250711-091812-root.json
  • 09:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s5 T399280
  • 09:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78901 and previous config saved to /var/cache/conftool/dbconfig/20250711-091242-root.json
  • 09:04 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1003.eqiad.wmnet with OS trixie
  • 08:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78900 and previous config saved to /var/cache/conftool/dbconfig/20250711-085736-root.json
  • 08:51 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 08:51 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 08:51 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 08:51 elukey@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78899 and previous config saved to /var/cache/conftool/dbconfig/20250711-084230-root.json
  • 08:42 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 08:41 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 08:34 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetserver2003.codfw.wmnet
  • 08:34 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:34 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetserver2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 08:33 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetserver2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 08:30 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 08:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2223 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78898 and previous config saved to /var/cache/conftool/dbconfig/20250711-082725-root.json
  • 08:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2223 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78897 and previous config saved to /var/cache/conftool/dbconfig/20250711-081953-marostegui.json
  • 08:19 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2223.codfw.wmnet with reason: Maintenance
  • 08:18 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts puppetserver2003.codfw.wmnet
  • 08:02 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 07:56 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 07:36 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS trixie
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78896 and previous config saved to /var/cache/conftool/dbconfig/20250711-072439-root.json
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78895 and previous config saved to /var/cache/conftool/dbconfig/20250711-070933-root.json
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78894 and previous config saved to /var/cache/conftool/dbconfig/20250711-065428-root.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78893 and previous config saved to /var/cache/conftool/dbconfig/20250711-063922-root.json
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2213 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78892 and previous config saved to /var/cache/conftool/dbconfig/20250711-063156-marostegui.json
  • 06:31 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 03:35 andrew@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1041.eqiad.wmnet
  • 03:35 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1041.eqiad.wmnet
  • 03:21 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1041.eqiad.wmnet
  • 03:13 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1041.eqiad.wmnet
  • 02:30 andrew@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1040.eqiad.wmnet
  • 02:30 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1040.eqiad.wmnet
  • 02:17 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1040.eqiad.wmnet
  • 02:13 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1040.eqiad.wmnet
  • 02:11 andrew@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet
  • 02:11 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet
  • 02:10 andrew@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet
  • 02:10 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet
  • 02:09 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet
  • 02:01 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1039.eqiad.wmnet
  • 01:17 andrew@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1038.eqiad.wmnet
  • 01:17 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1038.eqiad.wmnet
  • 01:03 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1038.eqiad.wmnet
  • 00:57 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1038.eqiad.wmnet
  • 00:55 krinkle@deploy1003: Finished scap sync-world: Backport for beta: Change FileRepo zone URL to upload.wikimedia.beta.wmcloud.org (T289318) (duration: 11m 30s)
  • 00:50 krinkle@deploy1003: krinkle: Continuing with sync
  • 00:47 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1037.eqiad.wmnet with OS bookworm
  • 00:45 krinkle@deploy1003: krinkle: Backport for beta: Change FileRepo zone URL to upload.wikimedia.beta.wmcloud.org (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:43 krinkle@deploy1003: Started scap sync-world: Backport for beta: Change FileRepo zone URL to upload.wikimedia.beta.wmcloud.org (T289318)
  • 00:34 krinkle@deploy1003: Finished scap sync-world: Backport for beta: Move beta wikipedia canonical to beta.wmcloud.org (T289318) (duration: 12m 13s)
  • 00:29 krinkle@deploy1003: krinkle: Continuing with sync
  • 00:27 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
  • 00:24 krinkle@deploy1003: krinkle: Backport for beta: Move beta wikipedia canonical to beta.wmcloud.org (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:22 krinkle@deploy1003: Started scap sync-world: Backport for beta: Move beta wikipedia canonical to beta.wmcloud.org (T289318)
  • 00:21 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage

2025-07-10

  • 23:59 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bookworm
  • 23:57 andrew@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1037.eqiad.wmnet
  • 23:57 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1037.eqiad.wmnet
  • 23:44 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1037.eqiad.wmnet
  • 23:03 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1037.eqiad.wmnet
  • 23:02 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1036.eqiad.wmnet with OS bookworm
  • 22:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 22:39 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 22:30 zabe@deploy1003: Finished scap sync-world: Backport for Fix categorylinks read new query for excluded categories (T385890) (duration: 07m 59s)
  • 22:25 zabe@deploy1003: zabe: Continuing with sync
  • 22:24 zabe@deploy1003: zabe: Backport for Fix categorylinks read new query for excluded categories (T385890) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:22 zabe@deploy1003: Started scap sync-world: Backport for Fix categorylinks read new query for excluded categories (T385890)
  • 22:16 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bookworm
  • 22:13 andrew@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1036.eqiad.wmnet
  • 22:13 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1036.eqiad.wmnet
  • 22:00 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1036.eqiad.wmnet
  • 21:55 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1036.eqiad.wmnet
  • 21:32 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1035.eqiad.wmnet with OS bookworm
  • 21:12 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 21:06 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 20:54 jforrester@deploy1003: Finished scap sync-world: Backport for Use `sul` dblist in InitialiseSettings (duration: 11m 43s)
  • 20:48 jforrester@deploy1003: jforrester, bd808: Continuing with sync
  • 20:44 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bookworm
  • 20:44 jforrester@deploy1003: jforrester, bd808: Backport for Use `sul` dblist in InitialiseSettings synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:42 andrew@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:42 jforrester@deploy1003: Started scap sync-world: Backport for Use `sul` dblist in InitialiseSettings
  • 20:41 andrew@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:39 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:39 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1035.eqiad.wmnet
  • 20:25 root@cumin1003: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:25 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:25 robh@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1035.eqiad.wmnet
  • 20:24 root@cumin1003: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:24 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:21 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 20:14 aqu@deploy1003: Finished deploy [airflow-dags/analytics@c558ea4]: Artifactct analytics / main (duration: 00m 43s)
  • 20:13 aqu@deploy1003: Started deploy [airflow-dags/analytics@c558ea4]: Artifactct analytics / main
  • 20:12 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@c558ea4]: Artifactct analytics-test (duration: 00m 13s)
  • 20:12 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@c558ea4]: Artifactct analytics-test
  • 19:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 19:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 19:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 19:07 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 19:07 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 19:05 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 19:05 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 19:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 19:01 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 19:01 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 19:00 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 19:00 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 19:00 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1035.eqiad.wmnet']
  • 19:00 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035.eqiad.wmnet']
  • 18:59 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 18:58 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:58 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:50 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:50 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:50 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:50 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:49 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:49 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:48 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:47 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqsin [reason: arelion drained; traffic is going through ulsfo to codfw, T399221]
  • 18:47 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site eqsin [reason: arelion drained; traffic is going through ulsfo to codfw, T399221]
  • 18:44 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:44 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:44 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:44 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:43 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:43 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:43 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:43 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:42 andrew@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:42 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1035.eqiad.wmnet
  • 18:39 sukhe: clearing varnish and ATS cache on cp5017 before repooling eqsin: T399221
  • 18:39 sukhe: sukhe@cp5017:~$ sudo systemctl stop trafficserver.service && sudo traffic_server -C clear_cache && sudo systemctl start trafficserver.service: T399221
  • 18:39 sukhe: sukhe@cp5017:~$ sudo systemctl stop trafficserver.service && sudo traffic_server -C clear_cache && sudo systemctl start trafficserver.service
  • 18:28 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 18:28 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 18:19 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 18:18 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:57 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78890 and previous config saved to /var/cache/conftool/dbconfig/20250710-175730-root.json
  • 17:56 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:55 andrew@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:55 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:42 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:42 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78889 and previous config saved to /var/cache/conftool/dbconfig/20250710-174225-root.json
  • 17:33 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:28 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 17:27 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78887 and previous config saved to /var/cache/conftool/dbconfig/20250710-172719-root.json
  • 17:25 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1049.eqiad.wmnet
  • 17:25 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1049.eqiad.wmnet
  • 17:22 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1049.eqiad.wmnet']
  • 17:22 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1049.eqiad.wmnet']
  • 17:12 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78886 and previous config saved to /var/cache/conftool/dbconfig/20250710-171214-root.json
  • 17:05 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bookworm
  • 16:50 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Checking sanitization for wikis mediawikiwiki, testwiki in section s3
  • 16:24 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:22 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:21 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:20 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:58 hnowlan@dns1004: END - running authdns-update
  • 15:57 hnowlan@dns1004: START - running authdns-update
  • 15:54 xcollazo: refreshed YARN queues definition in production via https://phabricator.wikimedia.org/T399013#10992686
  • 15:52 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 15:40 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 15:35 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis mediawikiwiki, testwiki in section s3
  • 15:32 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
  • 15:30 hnowlan@dns1004: END - running authdns-update
  • 15:29 hnowlan@dns1004: START - running authdns-update
  • 15:25 volans: upgrade spicerack to 11.3.0 on cumin100[2-3]
  • 15:21 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2006.codfw.wmnet with OS bookworm
  • 15:20 aikochou@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 15:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 15:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 15:11 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on gerrit2003.wikimedia.org with reason: maintenance
  • 15:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 15:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 14:56 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1008.eqiad.wmnet with OS bookworm
  • 14:54 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2006.codfw.wmnet with OS bookworm
  • 14:54 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 14:41 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqsin [reason: no reason specified, no task ID specified]
  • 14:41 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: no reason specified, no task ID specified]
  • 14:41 vgutierrez: depooling eqsin
  • 14:38 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
  • 14:34 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
  • 14:33 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 14:31 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 14:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 14:30 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 14:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78884 and previous config saved to /var/cache/conftool/dbconfig/20250710-142707-root.json
  • 14:24 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 14:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 14:16 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bookworm
  • 14:15 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 14:12 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 14:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78883 and previous config saved to /var/cache/conftool/dbconfig/20250710-141202-root.json
  • 14:04 andrew@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1008.eqiad.wmnet']
  • 14:03 elukey@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 14:03 elukey@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
  • 14:03 elukey@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:02 elukey@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:01 volans@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
  • 14:00 volans@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1002.eqiad.wmnet
  • 13:58 volans@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
  • 13:57 vgutierrez: restarting varnish and ATS in cp5017
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78882 and previous config saved to /var/cache/conftool/dbconfig/20250710-135656-root.json
  • 13:52 hashar: UTC afternoon backport window completed
  • 13:51 hashar@deploy1003: Finished scap sync-world: Backport for fix(StructuredTask): wrong order in resolving a deferred (duration: 11m 10s)
  • 13:51 volans@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1002.eqiad.wmnet
  • 13:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance
  • 13:49 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1008.eqiad.wmnet']
  • 13:48 volans@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest1001.eqiad.wmnet
  • 13:47 klausman@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=inference,name=codfw
  • 13:46 volans: upgrade spicerack on cumin2002 to 11.3.0
  • 13:46 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2006.codfw.wmnet with OS trixie
  • 13:46 hashar@deploy1003: migr, hashar: Continuing with sync
  • 13:42 hashar@deploy1003: migr, hashar: Backport for fix(StructuredTask): wrong order in resolving a deferred synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2211 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78881 and previous config saved to /var/cache/conftool/dbconfig/20250710-134150-root.json
  • 13:40 hashar@deploy1003: Started scap sync-world: Backport for fix(StructuredTask): wrong order in resolving a deferred
  • 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2211 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78880 and previous config saved to /var/cache/conftool/dbconfig/20250710-133418-marostegui.json
  • 13:34 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 13:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78879 and previous config saved to /var/cache/conftool/dbconfig/20250710-133047-root.json
  • 13:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78878 and previous config saved to /var/cache/conftool/dbconfig/20250710-131541-root.json
  • 13:08 klausman@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:06 moritzm: installing ICU security updates
  • 13:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78876 and previous config saved to /var/cache/conftool/dbconfig/20250710-130036-root.json
  • 12:59 klausman@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:52 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78875 and previous config saved to /var/cache/conftool/dbconfig/20250710-124530-root.json
  • 12:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78874 and previous config saved to /var/cache/conftool/dbconfig/20250710-124051-root.json
  • 12:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 2.8.15 upgrade (T398720)
  • 12:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2171 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78873 and previous config saved to /var/cache/conftool/dbconfig/20250710-123809-marostegui.json
  • 12:38 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 12:35 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - 2.8.15 upgrade (T398720)
  • 12:32 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1007.eqiad.wmnet with OS bookworm
  • 12:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78872 and previous config saved to /var/cache/conftool/dbconfig/20250710-122545-root.json
  • 12:25 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 12:17 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.sanitize-wiki (exit_code=97) Managing sanitization for wikis mediawikiwiki, testwiki in section s3
  • 12:15 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis mediawikiwiki, testwiki in section s3
  • 12:14 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
  • 12:11 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
  • 12:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78871 and previous config saved to /var/cache/conftool/dbconfig/20250710-121039-root.json
  • 12:06 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis mediawikiwiki, testwiki in section s5
  • 12:01 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - 2.8.15 upgrade (T398720)
  • 12:00 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 2.8.15 upgrade (T398720)
  • 11:57 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams and A:cp - 2.8.15 upgrade (T398720)
  • 11:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78870 and previous config saved to /var/cache/conftool/dbconfig/20250710-115534-root.json
  • 11:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams and A:cp - 2.8.15 upgrade (T398720)
  • 11:53 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1007.eqiad.wmnet with OS bookworm
  • 11:52 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis mediawikiwiki, testwiki in section s5
  • 11:51 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis mediawikiwiki, testwiki in section s5
  • 11:49 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1007.eqiad.wmnet with OS bookworm
  • 11:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1200 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78869 and previous config saved to /var/cache/conftool/dbconfig/20250710-114739-marostegui.json
  • 11:47 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 11:46 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis mediawikiwiki, testwiki in section s5
  • 11:44 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1007.eqiad.wmnet with OS bookworm
  • 11:41 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1007.eqiad.wmnet with OS bookworm
  • 11:39 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
  • 11:35 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 11:35 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 11:35 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 11:34 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 11:33 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
  • 11:30 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 11:30 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1007.eqiad.wmnet with OS bookworm
  • 11:30 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 11:29 andrew@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1007.eqiad.wmnet']
  • 11:24 vgutierrez: rolling restart of purged in eqsin
  • 11:21 andrew@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1007.eqiad.wmnet']
  • 11:14 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams and A:cp - 2.8.15 upgrade (T398720)
  • 11:09 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams and A:cp - 2.8.15 upgrade (T398720)
  • 11:06 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 11:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1039.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1039', diff saved to https://phabricator.wikimedia.org/P78867 and previous config saved to /var/cache/conftool/dbconfig/20250710-110408-marostegui.json
  • 10:33 elukey: kafka preferred-replica-election on kafka-main2010
  • 10:05 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 10:05 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 10:05 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 10:05 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 10:05 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 10:05 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 10:05 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 10:04 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 10:04 vgutierrez: resetting eqiad.resource-topic offsets for cp5017 consumer group
  • 09:45 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 09:45 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 09:44 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 09:44 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 09:43 moritzm: installing initramfs-tools bugfix updates from Bookworm point release
  • 09:15 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2240 gradually with 4 steps - Pooling in
  • 09:15 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2240 gradually with 4 steps - Pooling in
  • 09:14 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2161 gradually with 4 steps - Pooling in
  • 09:14 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2161 gradually with 4 steps - Pooling in
  • 09:12 fceratto@cumin1002: dbctl commit (dc=all): 'Update db2240 T397163', diff saved to https://phabricator.wikimedia.org/P78865 and previous config saved to /var/cache/conftool/dbconfig/20250710-091250-fceratto.json
  • 09:05 vgutierrez: restarting purged on cp5017
  • 09:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:53 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw and A:cp - 2.8.15 upgrade (T398720)
  • 08:51 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp - 2.8.15 upgrade (T398720)
  • 08:45 moritzm: installing setuptools security updates
  • 08:40 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 08:40 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2178 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78863 and previous config saved to /var/cache/conftool/dbconfig/20250710-083719-root.json
  • 08:31 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 08:30 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2178 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78861 and previous config saved to /var/cache/conftool/dbconfig/20250710-082213-root.json
  • 08:15 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp - 2.8.15 upgrade (T398720)
  • 08:12 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw and A:cp - 2.8.15 upgrade (T398720)
  • 08:11 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.9 refs T392179
  • 08:10 moritzm: installing containerd security updates
  • 08:07 klausman@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=inference,name=codfw
  • 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2178 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78860 and previous config saved to /var/cache/conftool/dbconfig/20250710-080708-root.json
  • 08:07 klausman@cumin1002: conftool action : get/pooled; selector: dnsdisc=inference,name=codfw
  • 08:05 klausman: Depooling Liftwing prod in codfw so we can roll out some changes that restart all services (cf. T398533)
  • 08:00 moritzm: installing python-urllib3 security updates
  • 07:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1047.eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2178 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78859 and previous config saved to /var/cache/conftool/dbconfig/20250710-075202-root.json
  • 07:51 vgutierrez: switching to upload cert globally on upload CDN cluster - T394484
  • 07:47 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: sync
  • 07:44 elukey@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: sync
  • 07:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2178 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78858 and previous config saved to /var/cache/conftool/dbconfig/20250710-074432-marostegui.json
  • 07:44 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 07:44 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru and A:cp - 2.8.15 upgrade (T398720)
  • 07:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru and A:cp - 2.8.15 upgrade (T398720)
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78857 and previous config saved to /var/cache/conftool/dbconfig/20250710-073907-root.json
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2228 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78856 and previous config saved to /var/cache/conftool/dbconfig/20250710-073123-root.json
  • 07:29 hashar: Restarting CI Jenkins
  • 07:25 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78855 and previous config saved to /var/cache/conftool/dbconfig/20250710-072401-root.json
  • 07:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2228 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78853 and previous config saved to /var/cache/conftool/dbconfig/20250710-071616-root.json
  • 07:10 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78852 and previous config saved to /var/cache/conftool/dbconfig/20250710-070855-root.json
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2228 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78851 and previous config saved to /var/cache/conftool/dbconfig/20250710-070111-root.json
  • 07:00 moritzm: installing libbpf security updates
  • 06:59 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru and A:cp - 2.8.15 upgrade (T398720)
  • 06:59 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 06:58 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 06:58 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 06:55 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78849 and previous config saved to /var/cache/conftool/dbconfig/20250710-065350-root.json
  • 06:52 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 06:52 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru and A:cp - 2.8.15 upgrade (T398720)
  • 06:49 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 06:47 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2228 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78848 and previous config saved to /var/cache/conftool/dbconfig/20250710-064605-root.json
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1210 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78847 and previous config saved to /var/cache/conftool/dbconfig/20250710-064558-marostegui.json
  • 06:45 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 06:44 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 06:44 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 06:39 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2228.codfw.wmnet with reason: Maintenance
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2228 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78846 and previous config saved to /var/cache/conftool/dbconfig/20250710-063535-marostegui.json
  • 06:35 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2228.codfw.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance
  • 05:54 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:38 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 05:22 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Bugfixes - oblivian@cumin1003"
  • 05:22 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes - oblivian@cumin1003
  • 05:21 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes - oblivian@cumin1003
  • 05:21 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Bugfixes - oblivian@cumin1003"
  • 04:58 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 04:57 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 04:56 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 04:55 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 04:32 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 04:32 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 04:29 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 04:18 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 04:17 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 04:16 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 04:16 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 04:14 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 04:13 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 04:12 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 04:10 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 03:58 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 03:55 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 03:37 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 03:36 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 03:01 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 03:01 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 02:53 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 02:46 root@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:39 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:37 root@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:29 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:29 root@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:28 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:11 root@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:04 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:04 root@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:03 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:03 root@cumin1003: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 02:03 root@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1006.eqiad.wmnet']
  • 01:55 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudcephosd1006.eqiad.wmnet
  • 01:53 andrew@cumin1003: START - Cookbook sre.hosts.dhcp for host cloudcephosd1006.eqiad.wmnet
  • 01:53 andrew@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 01:39 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 00:42 andrew@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1006.eqiad.wmnet with OS bookworm

2025-07-09

  • 23:21 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bookworm
  • 22:29 dreamyjazz@deploy1003: Finished scap sync-world: Backport for ukwiki: allow bureaucrats to assign and remove temporary-account-viewer group (T398738) (duration: 10m 18s)
  • 22:23 dreamyjazz@deploy1003: dreamyjazz, dreamrimmer: Continuing with sync
  • 22:21 dreamyjazz@deploy1003: dreamyjazz, dreamrimmer: Backport for ukwiki: allow bureaucrats to assign and remove temporary-account-viewer group (T398738) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:18 dreamyjazz@deploy1003: Started scap sync-world: Backport for ukwiki: allow bureaucrats to assign and remove temporary-account-viewer group (T398738)
  • 21:46 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1048.eqiad.wmnet with OS bookworm
  • 21:46 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 21:45 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 21:28 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1048.eqiad.wmnet with reason: host reimage
  • 21:24 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1048.eqiad.wmnet with reason: host reimage
  • 21:15 dancy@deploy1003: Finished scap sync-world: Backport for Add DEPRECATED_LANGUAGE_CODE_MAPPING to wgInterlanguageLinkCodeMap (T248352) (duration: 10m 25s)
  • 21:10 dancy@deploy1003: dancy, fomafix: Continuing with sync
  • 21:07 dancy@deploy1003: dancy, fomafix: Backport for Add DEPRECATED_LANGUAGE_CODE_MAPPING to wgInterlanguageLinkCodeMap (T248352) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:05 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host es1048.eqiad.wmnet with OS bookworm
  • 21:05 dancy@deploy1003: Started scap sync-world: Backport for Add DEPRECATED_LANGUAGE_CODE_MAPPING to wgInterlanguageLinkCodeMap (T248352)
  • 20:55 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host es1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 20:36 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:35 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:34 vriley@cumin1002: START - Cookbook sre.hosts.provision for host es1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 20:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:28 jforrester@deploy1003: Finished scap sync-world: Backport for Pre-deploy Readers Use Cases Survey on enwiki (T398870) (duration: 11m 00s)
  • 20:23 jforrester@deploy1003: jforrester, dani: Continuing with sync
  • 20:19 jforrester@deploy1003: jforrester, dani: Backport for Pre-deploy Readers Use Cases Survey on enwiki (T398870) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:17 jforrester@deploy1003: Started scap sync-world: Backport for Pre-deploy Readers Use Cases Survey on enwiki (T398870)
  • 20:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 20:13 James_F: jforrester@deploy1003:~$ echo 'https://en.wikipedia.org/static/favicon/wikifunctions.ico' | mwscript-k8s --attach purgeList.php -- --wiki enwiki # T326094
  • 20:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 20:13 jforrester@deploy1003: Finished scap sync-world: Backport for shwiki: Add bs, hr and sr as import sources (T399113), Remove white outline from Wikifunctions favicon (T326094) (duration: 08m 52s)
  • 20:10 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1047.eqiad.wmnet with OS bookworm
  • 20:10 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 20:10 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 20:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 20:08 jforrester@deploy1003: jforrester, jhsoby, aleksandar: Continuing with sync
  • 20:07 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 20:06 jforrester@deploy1003: jforrester, jhsoby, aleksandar: Backport for shwiki: Add bs, hr and sr as import sources (T399113), Remove white outline from Wikifunctions favicon (T326094) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:04 jforrester@deploy1003: Started scap sync-world: Backport for shwiki: Add bs, hr and sr as import sources (T399113), Remove white outline from Wikifunctions favicon (T326094)
  • 19:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 19:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 19:51 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 19:51 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 19:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:47 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1047.eqiad.wmnet with reason: host reimage
  • 19:43 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1017.eqiad.wmnet with OS bullseye
  • 19:42 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1047.eqiad.wmnet with reason: host reimage
  • 19:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:24 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host es1047.eqiad.wmnet with OS bookworm
  • 19:21 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:16 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:16 tchin@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:16 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:16 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:16 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:16 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:16 vriley@cumin1002: START - Cookbook sre.hosts.provision for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:12 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:10 vriley@cumin1002: START - Cookbook sre.hosts.provision for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:45 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Upgrade
  • 18:42 sukhe: re-adding ocsp from deployment-prep: commit 3307286: T399114: will remove after Puppet removal
  • 18:40 sukhe: removing ocsp from deployment-prep: commit 3307286: T399114
  • 18:35 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Upgrade
  • 18:33 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Upgrade
  • 18:23 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Upgrade
  • 18:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 17:34 sukhe: re-enabling Puppet on P{ganeti7002* or ganeti7003*}: it was left disabled there during rollout of CR 1166222 by sukhe
  • 16:50 bking@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for wdqs2023.codfw.wmnet: Renew puppet certificate - bking@cumin1002
  • 16:17 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:17 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1004.eqiad.wmnet with OS bookworm
  • 16:15 elukey@cumin1003: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:08 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 16:08 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 16:07 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 16:07 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 16:07 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 16:06 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 16:06 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 16:02 vgutierrez: switching esams, eqsin and drmrs to Let's Encrypt unified/upload certs - T398596
  • 15:57 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
  • 15:54 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
  • 15:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:43 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:43 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:40 volans: uploaded spicerack_11.3.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
  • 15:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:33 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bookworm
  • 15:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:23 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: cirrussearch@eqiad
  • 15:23 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
  • 15:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:22 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:20 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
  • 15:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:12 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 15:10 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: cirrussearch@eqiad
  • 15:09 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for alias: cirrussearch-eqiad-omega@eqiad
  • 15:05 zabe@deploy1003: Finished scap sync-world: Backport for Revert^2 "Enable categorylinks read new on a few large wikis" (duration: 08m 11s)
  • 15:04 moritzm: installing abseil security updates
  • 15:04 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: cirrussearch-eqiad-omega@eqiad
  • 15:03 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for alias: cirrussearch-eqiad-psi@eqiad
  • 15:00 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:00 zabe@deploy1003: zabe: Continuing with sync
  • 14:59 zabe@deploy1003: zabe: Backport for Revert^2 "Enable categorylinks read new on a few large wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:57 zabe@deploy1003: Started scap sync-world: Backport for Revert^2 "Enable categorylinks read new on a few large wikis"
  • 14:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:57 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: cirrussearch-eqiad-psi@eqiad
  • 14:49 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:49 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for rgw.codfw.dpe.anycast.wmnet - cmooney@cumin1003"
  • 14:49 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for rgw.codfw.dpe.anycast.wmnet - cmooney@cumin1003"
  • 14:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 14:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78842 and previous config saved to /var/cache/conftool/dbconfig/20250709-144440-root.json
  • 14:43 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 14:42 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78841 and previous config saved to /var/cache/conftool/dbconfig/20250709-144250-root.json
  • 14:41 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: cirrussearch@codfw
  • 14:41 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
  • 14:40 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
  • 14:35 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2004-dev.codfw.wmnet with OS bookworm
  • 14:34 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
  • 14:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 14:30 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: cirrussearch@codfw
  • 14:30 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for alias: cirrussearch-codfw-omega@codfw
  • 14:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78840 and previous config saved to /var/cache/conftool/dbconfig/20250709-142934-root.json
  • 14:27 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78839 and previous config saved to /var/cache/conftool/dbconfig/20250709-142744-root.json
  • 14:24 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: cirrussearch-codfw-omega@codfw
  • 14:23 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for alias: cirrussearch-codfw-psi@codfw
  • 14:23 moritzm: installing bash updates from Bookworm point release
  • 14:17 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
  • 14:17 ecarg@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:17 ecarg@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:17 ecarg@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:16 ecarg@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:16 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: cirrussearch-codfw-psi@codfw
  • 14:15 ecarg@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:14 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
  • 14:14 ecarg@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78838 and previous config saved to /var/cache/conftool/dbconfig/20250709-141428-root.json
  • 14:12 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78837 and previous config saved to /var/cache/conftool/dbconfig/20250709-141238-root.json
  • 14:11 ecarg@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 ecarg@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:10 ecarg@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 ecarg@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:08 ecarg@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:07 ecarg@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • {{safesubst:SAL entry|1=14:01 zabe@deploy1003: Finished scap sync-world: Backport for ApiQueryCategoryMembers: Try stop forcing index in read new code (T399037), ApiQueryCategoryMembers: Try stop forcing index in read new code (T399037), Fix categorylinks read new code for excluding categories (T398861 T398939), [[gerrit:1167569|Fix categorylinks read new code for excluding categories (T3988}}
  • 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78836 and previous config saved to /var/cache/conftool/dbconfig/20250709-135923-root.json
  • 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 13:58 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 13:58 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 13:57 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78835 and previous config saved to /var/cache/conftool/dbconfig/20250709-135732-root.json
  • 13:57 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 13:57 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 13:57 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:56 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 13:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 13:55 zabe@deploy1003: zabe: Continuing with sync
  • 13:55 sukhe@dns1004: END - running authdns-update
  • 13:55 sukhe@dns1004: START - running authdns-update
  • 13:54 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7002.wikimedia.org,service=authdns-update [reason: testing alert]
  • 13:54 zabe@deploy1003: zabe: Backport for ApiQueryCategoryMembers: Try stop forcing index in read new code (T399037), ApiQueryCategoryMembers: Try stop forcing index in read new code (T399037), Fix categorylinks read new code for excluding categories (T398861 T398939), Fix categorylinks read new code for excluding categories (T398861 T398939) synced
  • 13:54 hnowlan: delete three wedged thumbor pods showing signs of T374350
  • 13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 13:53 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=helm-charts.*,name=eqiad
  • 13:53 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2004-dev.codfw.wmnet with OS bookworm
  • 13:53 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum1001.eqiad.wmnet
  • {{safesubst:SAL entry|1=13:52 zabe@deploy1003: Started scap sync-world: Backport for ApiQueryCategoryMembers: Try stop forcing index in read new code (T399037), ApiQueryCategoryMembers: Try stop forcing index in read new code (T399037), Fix categorylinks read new code for excluding categories (T398861 T398939), [[gerrit:1167569|Fix categorylinks read new code for excluding categories (T39886}}
  • 13:51 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host chartmuseum1001.eqiad.wmnet
  • 13:50 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=helm-charts.*,name=eqiad
  • 13:50 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7002.wikimedia.org,service=authdns-update [reason: testing alert]
  • 13:50 claime: Depooling chartmuseum in eqiad
  • 13:50 cgoubert@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=helm-charts.*,name=codfw
  • 13:50 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum2001.codfw.wmnet
  • 13:49 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host es1044.eqiad.wmnet
  • 13:46 vgutierrez: deploy measure/measure-goog certs in the upload CDN cluster - T394484
  • 13:46 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host chartmuseum2001.codfw.wmnet
  • 13:46 cgoubert@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=helm-charts.*,name=codfw
  • 13:45 claime: Depooling chartmuseum in codfw
  • 13:45 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:45 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:44 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:44 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es1041.eqiad.wmnet
  • 13:42 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1003.eqiad.wmnet
  • 13:41 Daimona: mwscript-k8s --comment="T397270" -f --file /srv/mediawiki/php-1.45.0-wmf.9/extensions/CampaignEvents/maintenance/countryExceptionMappings.csv -- CampaignEvents:UpdateCountriesColumn --wiki test2wiki --exceptions countryExceptionMappings.csv
  • 13:40 Daimona: mwscript-k8s --comment="T397270" -f --file /srv/mediawiki/php-1.45.0-wmf.9/extensions/CampaignEvents/maintenance/countryExceptionMappings.csv -- CampaignEvents:UpdateCountriesColumn --wiki testwiki --exceptions countryExceptionMappings.csv
  • 13:39 Daimona: mwscript-k8s --comment="T397270" -f --file /srv/mediawiki/php-1.45.0-wmf.9/extensions/CampaignEvents/maintenance/countryExceptionMappings.csv -- CampaignEvents:UpdateCountriesColumn --wiki officewiki --exceptions countryExceptionMappings.csv
  • 13:38 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk1003.eqiad.wmnet
  • 13:38 marostegui@cumin1002: START - Cookbook sre.hosts.reboot-single for host es1044.eqiad.wmnet
  • 13:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1044 for upgrade', diff saved to https://phabricator.wikimedia.org/P78829 and previous config saved to /var/cache/conftool/dbconfig/20250709-133639-marostegui.json
  • 13:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1044.eqiad.wmnet with reason: Maintenance
  • 13:36 Daimona: mwscript-k8s --comment="T397270" -f --file /srv/mediawiki/php-1.45.0-wmf.9/extensions/CampaignEvents/maintenance/countryExceptionMappings.csv -- CampaignEvents:UpdateCountriesColumn --wiki metawiki --exceptions countryExceptionMappings.csv
  • 13:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1002.eqiad.wmnet
  • 13:31 marostegui@cumin1002: START - Cookbook sre.hosts.reboot-single for host es1041.eqiad.wmnet
  • 13:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1041.eqiad.wmnet with reason: Maintenance
  • 13:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es1041.eqiad.wmnet
  • 13:30 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk1002.eqiad.wmnet
  • 13:26 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp - 2.8.15 upgrade (T398720)
  • 13:25 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp - 2.8.15 upgrade (T398720)
  • 13:24 marostegui@cumin1002: START - Cookbook sre.hosts.reboot-single for host es1041.eqiad.wmnet
  • 13:21 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1041.eqiad.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1041', diff saved to https://phabricator.wikimedia.org/P78828 and previous config saved to /var/cache/conftool/dbconfig/20250709-132111-marostegui.json
  • 13:20 sgimeno@deploy1003: Finished scap sync-world: Backport for Add new script to update old freetext country data new schema (T397270), Growth: Enable limiting Add Link for dewiki (T396382) (duration: 10m 07s)
  • 13:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:15 sgimeno@deploy1003: mhorsey, sgimeno, migr: Continuing with sync
  • 13:12 sgimeno@deploy1003: mhorsey, sgimeno, migr: Backport for Add new script to update old freetext country data new schema (T397270), Growth: Enable limiting Add Link for dewiki (T396382) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:11 brouberol@cumin1003: END (ERROR) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=97) rolling restart_daemons on A:kafka-test-eqiad
  • 13:10 sgimeno@deploy1003: Started scap sync-world: Backport for Add new script to update old freetext country data new schema (T397270), Growth: Enable limiting Add Link for dewiki (T396382)
  • 13:09 brouberol@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
  • 13:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 12:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 12:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1001.eqiad.wmnet
  • 12:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1049.eqiad.wmnet with OS bullseye
  • 12:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 12:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:54 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:54 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk1001.eqiad.wmnet
  • 12:54 moritzm: installing jetty9 security updates
  • 12:50 hashar@deploy1003: Finished deploy [gerrit/gerrit@9666238]: Add readonly plugin - T387833 (duration: 00m 10s)
  • 12:50 hashar@deploy1003: Started deploy [gerrit/gerrit@9666238]: Add readonly plugin - T387833
  • 12:48 hashar@deploy1003: Finished deploy [gerrit/gerrit@9666238]: Add readonly plugin - T387833 (duration: 00m 11s)
  • 12:48 hashar@deploy1003: Started deploy [gerrit/gerrit@9666238]: Add readonly plugin - T387833
  • 12:44 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1051
  • 12:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1051
  • 12:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 12:39 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1051
  • 12:39 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1051
  • 12:39 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1050
  • 12:39 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1050
  • 12:39 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp - 2.8.15 upgrade (T398720)
  • 12:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 12:37 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp - 2.8.15 upgrade (T398720)
  • 12:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 12:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 12:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp[4052].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 12:25 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp[4052].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 12:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1049.eqiad.wmnet with OS bullseye
  • 12:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 12:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1049.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:14 moritzm: installing openjdk-17 security updates
  • 12:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:09 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 12:08 brouberol@cumin1003: END (ERROR) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=97) rolling restart_daemons on A:kafka-jumbo-eqiad
  • 12:08 moritzm: installing nginx security updates
  • 12:07 brouberol@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
  • 12:07 brouberol@cumin1003: END (FAIL) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=1) rolling restart_daemons on A:kafka-jumbo-eqiad
  • 12:03 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 12:02 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 12:02 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 12:02 brouberol@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
  • 12:01 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 11:57 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:57 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns cloudcephosd1048,49 - jclark@cumin1002"
  • 11:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns cloudcephosd1048,49 - jclark@cumin1002"
  • 11:57 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:56 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:56 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:56 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:55 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:55 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:54 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 11:53 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 11:52 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 11:52 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 11:51 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 11:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1049.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:50 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:49 cgoubert@deploy1003: Finished scap sync-world: Backport for PS.php: Restore poolcounter config post-reboot (T395240) (duration: 08m 39s)
  • 11:49 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2003.codfw.wmnet
  • 11:48 fabfur: puppet enabled again on A:cp (T399071)
  • 11:45 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
  • 11:44 cgoubert@deploy1003: cgoubert: Continuing with sync
  • 11:43 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk2003.codfw.wmnet
  • 11:43 cgoubert@deploy1003: cgoubert: Backport for PS.php: Restore poolcounter config post-reboot (T395240) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:42 cmooney@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1073
  • 11:41 cmooney@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1073
  • 11:41 cgoubert@deploy1003: Started scap sync-world: Backport for PS.php: Restore poolcounter config post-reboot (T395240)
  • 11:38 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1006.eqiad.wmnet
  • 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'pool pc1', diff saved to https://phabricator.wikimedia.org/P78826 and previous config saved to /var/cache/conftool/dbconfig/20250709-113831-marostegui.json
  • 11:37 marostegui@cumin1002: dbctl commit (dc=all): 'depool pc1', diff saved to https://phabricator.wikimedia.org/P78824 and previous config saved to /var/cache/conftool/dbconfig/20250709-113717-marostegui.json
  • 11:37 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2002.codfw.wmnet
  • 11:35 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2005.codfw.wmnet
  • 11:34 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host poolcounter1006.eqiad.wmnet
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'pool pc2011', diff saved to https://phabricator.wikimedia.org/P78823 and previous config saved to /var/cache/conftool/dbconfig/20250709-113413-marostegui.json
  • 11:33 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'depool pc2011', diff saved to https://phabricator.wikimedia.org/P78821 and previous config saved to /var/cache/conftool/dbconfig/20250709-113322-marostegui.json
  • 11:33 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk2002.codfw.wmnet
  • 11:32 slyngshede@cumin1003: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on P{cp[4052].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 11:32 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:31 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host poolcounter2005.codfw.wmnet
  • 11:31 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp[4052].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 11:29 cgoubert@deploy1003: Finished scap sync-world: Backport for PS.php: Disable primary poolcounters for reboot (T395240) (duration: 08m 19s)
  • 11:28 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:28 marostegui@cumin1002: START - Cookbook sre.mysql.parsercache
  • 11:24 cgoubert@deploy1003: cgoubert: Continuing with sync
  • 11:23 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
  • 11:23 cgoubert@deploy1003: cgoubert: Backport for PS.php: Disable primary poolcounters for reboot (T395240) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:21 cgoubert@deploy1003: Started scap sync-world: Backport for PS.php: Disable primary poolcounters for reboot (T395240)
  • 11:14 slyngshede@cumin1003: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-ulsfo and not P{cp[4037,4045].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 11:13 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1007.eqiad.wmnet
  • 11:10 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host poolcounter1007.eqiad.wmnet
  • 11:09 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2006.codfw.wmnet
  • 11:09 fabfur: disable puppet on A:cp to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/1167530
  • 11:06 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host poolcounter2006.codfw.wmnet
  • 11:05 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 11:02 cgoubert@deploy1003: Finished scap sync-world: Backport for PS.php: Disable secondary poolcounters for reboot (T395240) (duration: 09m 30s)
  • 10:59 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 10:55 cgoubert@deploy1003: cgoubert: Continuing with sync
  • 10:54 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:54 cgoubert@deploy1003: cgoubert: Backport for PS.php: Disable secondary poolcounters for reboot (T395240) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:53 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@3a0cdd4]: bump image suggestions to v1.8.0 (duration: 00m 48s)
  • 10:52 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@3a0cdd4]: bump image suggestions to v1.8.0
  • 10:52 cgoubert@deploy1003: Started scap sync-world: Backport for PS.php: Disable secondary poolcounters for reboot (T395240)
  • 10:51 elukey@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:51 elukey@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:49 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 10:39 btullis@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cephosd1001.eqiad.wmnet
  • 10:38 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cephosd1001.eqiad.wmnet
  • 10:38 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 10:37 claime: Restoring memory limits on mw-cron - T395436 - T395465
  • 10:36 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 10:30 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2001.codfw.wmnet
  • 10:24 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2001.codfw.wmnet
  • 10:20 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk2001.codfw.wmnet
  • 10:14 claime: Cutting off access to mwmaint servers - T397017
  • 10:13 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host cephosd2001.codfw.wmnet
  • 10:13 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cephosd2001.codfw.wmnet
  • 10:06 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-ulsfo and not P{cp[4037,4045].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 10:04 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 10:04 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 10:03 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:58 slyngshede@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp[4037,4045].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 09:56 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:48 slyngshede@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp[4037,4045].ulsfo.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 09:45 moritzm: installing Zookeeper security updates on zk-flink
  • 09:23 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host cephosd2001.codfw.wmnet
  • 09:21 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:19 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver1001.eqiad.wmnet
  • 09:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetserver1001.eqiad.wmnet
  • 09:01 slyngs: Upgrade completed Netbox v4.0.11 T397300
  • 08:42 slyngshede@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.11 to production - slyngshede@cumin1003
  • 08:35 btullis@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cephosd2001.codfw.wmnet
  • 08:34 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cephosd2001.codfw.wmnet
  • 08:29 slyngshede@cumin1003: START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.11 to production - slyngshede@cumin1003
  • 08:28 slyngshede@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.11 to production - slyngshede@cumin1002
  • 08:21 slyngshede@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.11 to production - slyngshede@cumin1002
  • 08:20 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.9 refs T392179
  • 08:18 slyngs: Deploying Netbox v4.0.11 to production T397300
  • 08:17 slyngshede@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 08:17 slyngshede@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 08:09 aklapper@deploy1003: Finished scap sync-world: Backport for Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925) (duration: 08m 21s)
  • 08:04 aklapper@deploy1003: zabe, aklapper: Continuing with sync
  • 08:03 aklapper@deploy1003: zabe, aklapper: Backport for Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:01 aklapper@deploy1003: Started scap sync-world: Backport for Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925)
  • 07:58 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 07:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 07:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 07:50 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 07:50 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 07:42 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99)
  • 07:42 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 07:42 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.parsercache (exit_code=99)
  • 07:42 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 07:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1036.eqiad.wmnet with reason: Maintenance
  • 07:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1036', diff saved to https://phabricator.wikimedia.org/P78817 and previous config saved to /var/cache/conftool/dbconfig/20250709-073458-marostegui.json
  • 07:32 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 07:32 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 07:31 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 07:31 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 07:31 kartik@deploy1003: Finished scap sync-world: Backport for CX: Add virtual-cx-shared DatabaseVirtualDomains (T348513) (duration: 25m 21s)
  • 07:31 moritzm: installing nginx security updates
  • 07:26 kartik@deploy1003: kartik, abi: Continuing with sync
  • 07:23 elukey: upload python3-docker-report 0.0.16 to bookworm-wikimedia
  • 07:23 elukey: upload python3-docker-report to bookworm-wikimedia
  • 07:20 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: sync
  • 07:08 kartik@deploy1003: kartik, abi: Backport for CX: Add virtual-cx-shared DatabaseVirtualDomains (T348513) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:05 kartik@deploy1003: Started scap sync-world: Backport for CX: Add virtual-cx-shared DatabaseVirtualDomains (T348513)
  • 07:05 elukey@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: sync
  • 06:47 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2232].codfw.wmnet,db[1207,1217].eqiad.wmnet with reason: migration to mariadb 10.11
  • 06:36 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:29 marostegui: Failover m3 from db1213 to db1250 - T398818
  • 06:21 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2234].codfw.wmnet,db[1213,1217,1250].eqiad.wmnet with reason: m3 master switchover T398818
  • 06:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2234].codfw.wmnet,db[1213,1217,1250].eqiad.wmnet with reason: m3 master switchover T398818
  • 06:13 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:58 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 04:23 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 04:23 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply

2025-07-08

  • 23:58 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 23:58 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 23:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1048.eqiad.wmnet with OS bookworm
  • 23:43 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 23:43 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 23:34 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 23:19 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1048.eqiad.wmnet with reason: host reimage
  • 23:15 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1048.eqiad.wmnet with reason: host reimage
  • 23:09 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1047.eqiad.wmnet with OS bookworm
  • 23:09 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 23:06 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 22:53 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host es1048.eqiad.wmnet with OS bookworm
  • 22:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1047.eqiad.wmnet with reason: host reimage
  • 22:38 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1047.eqiad.wmnet with reason: host reimage
  • 22:27 zabe@deploy1003: Finished scap sync-world: Backport for Revert "Enable categorylinks read new on a few large wikis" (duration: 08m 38s)
  • 22:21 zabe@deploy1003: zabe: Continuing with sync
  • 22:20 zabe@deploy1003: zabe: Backport for Revert "Enable categorylinks read new on a few large wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:18 zabe@deploy1003: Started scap sync-world: Backport for Revert "Enable categorylinks read new on a few large wikis"
  • 22:18 zabe@deploy1003: Finished scap sync-world: Backport for Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925) (duration: 08m 33s)
  • 22:16 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host es1047.eqiad.wmnet with OS bookworm
  • 22:13 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 22:13 zabe@deploy1003: zabe: Continuing with sync
  • 22:12 zabe@deploy1003: zabe: Backport for Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:09 zabe@deploy1003: Started scap sync-world: Backport for Remove stdClass type hint from ApiFeedContributions::feedItem() for now (T398925)
  • 22:08 zabe@deploy1003: Finished scap sync-world: Backport for Enable categorylinks read new on a few large wikis (T397912) (duration: 08m 19s)
  • 22:03 zabe@deploy1003: zabe: Continuing with sync
  • 22:02 zabe@deploy1003: zabe: Backport for Enable categorylinks read new on a few large wikis (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:01 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:00 zabe@deploy1003: Started scap sync-world: Backport for Enable categorylinks read new on a few large wikis (T397912)
  • 22:00 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:56 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2005-dev.codfw.wmnet with OS bookworm
  • 21:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:51 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 21:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 21:47 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1048
  • 21:47 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1048
  • 21:47 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1049
  • 21:47 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1049
  • 21:45 vriley@cumin1002: START - Cookbook sre.hosts.provision for host es1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:44 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:41 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:40 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:38 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1048
  • 21:38 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1048
  • 21:38 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1049
  • 21:38 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1049
  • 21:38 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2005-dev.codfw.wmnet with reason: host reimage
  • 21:38 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephosd1051
  • 21:38 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1051
  • 21:37 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cloudcephosd1048,49 - jclark@cumin1002"
  • 21:37 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update cloudcephosd1048,49 - jclark@cumin1002"
  • 21:34 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:33 vriley@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:33 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2005-dev.codfw.wmnet with reason: host reimage
  • 21:31 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:31 vriley@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:31 vriley@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1048 - vriley@cumin1002"
  • 21:28 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1048 - vriley@cumin1002"
  • 21:27 vriley@cumin1002: START - Cookbook sre.hosts.provision for host es1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:26 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1047
  • 21:24 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host es1047
  • 21:24 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:23 vriley@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:21 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:20 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:20 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1047 - vriley@cumin1002"
  • 21:20 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt es1047 - vriley@cumin1002"
  • 21:16 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:16 vriley@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:13 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:13 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2005-dev.codfw.wmnet with OS bookworm
  • 21:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 20:15 sbassett: Deployed security mitigation update for T395468
  • 19:43 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 19:42 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 19:42 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 19:41 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 19:41 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 19:40 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 18:59 bking@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for wdqs2022.codfw.wmnet: Renew puppet certificate - bking@cumin1002
  • 18:39 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-eqsin
  • 18:34 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-codfw
  • 18:28 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@52ec646]: T394526 (duration: 01m 35s)
  • 18:26 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@52ec646]: T394526
  • 18:14 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-eqsin
  • 18:11 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2006-dev.codfw.wmnet with OS bookworm
  • 18:09 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-codfw
  • 18:07 btullis@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-ctrl2002.codfw.wmnet
  • 18:07 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-ctrl2002.codfw.wmnet with OS bookworm
  • 17:58 btullis@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-ctrl2001.codfw.wmnet
  • 17:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-ctrl2001.codfw.wmnet with OS bookworm
  • 17:53 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2006-dev.codfw.wmnet with reason: host reimage
  • 17:53 ebernhardson@deploy1003: Finished deploy [airflow-dags/search@5c0689d]: sync rdf-spark-tools 0.3.158 artifacts (duration: 00m 19s)
  • 17:52 ebernhardson@deploy1003: Started deploy [airflow-dags/search@5c0689d]: sync rdf-spark-tools 0.3.158 artifacts
  • 17:50 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2006-dev.codfw.wmnet with reason: host reimage
  • 17:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl2002.codfw.wmnet with reason: host reimage
  • 17:43 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl2002.codfw.wmnet with reason: host reimage
  • 17:39 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl2001.codfw.wmnet with reason: host reimage
  • 17:33 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl2001.codfw.wmnet with reason: host reimage
  • 17:30 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2006-dev.codfw.wmnet with OS bookworm
  • 17:25 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-ctrl2002.codfw.wmnet with OS bookworm
  • 17:25 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-ctrl2002.codfw.wmnet - btullis@cumin1003"
  • 17:21 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 17:19 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-ctrl2002.codfw.wmnet - btullis@cumin1003"
  • 17:18 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl2002.codfw.wmnet on all recursors
  • 17:18 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl2002.codfw.wmnet on all recursors
  • 17:18 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:18 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-ctrl2002.codfw.wmnet - btullis@cumin1003"
  • 17:18 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-ctrl2002.codfw.wmnet - btullis@cumin1003"
  • 17:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-ctrl2001.codfw.wmnet with OS bookworm
  • 17:13 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-ctrl2001.codfw.wmnet - btullis@cumin1003"
  • 17:13 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-ctrl2001.codfw.wmnet - btullis@cumin1003"
  • 17:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:13 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-ctrl2001.codfw.wmnet on all recursors
  • 17:13 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-ctrl2001.codfw.wmnet on all recursors
  • 17:13 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:13 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-ctrl2001.codfw.wmnet - btullis@cumin1003"
  • 17:13 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 17:10 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-ctrl2001.codfw.wmnet - btullis@cumin1003"
  • 17:10 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:10 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 17:04 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:04 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:03 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 17:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:59 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ml-serve1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 16:58 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 16:53 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 16:52 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 16:48 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-eqiad
  • 16:43 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "feat: reverse deps - cdanis@cumin1002"
  • 16:43 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: reverse deps - cdanis@cumin1002
  • 16:43 cdanis@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: reverse deps - cdanis@cumin1002
  • 16:43 cdanis@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "feat: reverse deps - cdanis@cumin1002"
  • 16:42 dancy@deploy1003: Installation of scap version "4.187.0" completed for 2 hosts
  • 16:41 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 16:40 dancy@deploy1003: Installing scap version "4.187.0" for 2 host(s)
  • 16:39 mszabo@deploy1003: Finished scap sync-world: Backport for Revert "Add user-related link colors to LinkRenderer::getLinkClasses" (T392775 T398714 T398717 T398952), Revert "UserLinker: remove back compat with old arguments of UserLinkRenderer", UpdateMessageJobTest: Read expected transver from latest (T398904) (duration: 09m 10s)
  • 16:37 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "feat: reverse deps - cdanis@cumin1002"
  • 16:37 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: reverse deps - cdanis@cumin1002
  • 16:36 cdanis@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: reverse deps - cdanis@cumin1002
  • 16:36 cdanis@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "feat: reverse deps - cdanis@cumin1002"
  • 16:35 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 16:34 mszabo@deploy1003: tchanders, mszabo: Continuing with sync
  • 16:33 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 16:32 mszabo@deploy1003: tchanders, mszabo: Backport for Revert "Add user-related link colors to LinkRenderer::getLinkClasses" (T392775 T398714 T398717 T398952), Revert "UserLinker: remove back compat with old arguments of UserLinkRenderer", UpdateMessageJobTest: Read expected transver from latest (T398904) synced to the testservers (see https://wikitech.wikimedia.org
  • 16:31 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 16:30 mszabo@deploy1003: Started scap sync-world: Backport for Revert "Add user-related link colors to LinkRenderer::getLinkClasses" (T392775 T398714 T398717 T398952), Revert "UserLinker: remove back compat with old arguments of UserLinkRenderer", UpdateMessageJobTest: Read expected transver from latest (T398904)
  • 16:23 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2003.codfw.wmnet with OS bookworm
  • 16:23 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-eqiad
  • 16:22 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-drmrs
  • 16:21 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-eqsin and not P{cp[5017,5025].eqsin.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 16:12 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 16:11 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 16:07 btullis@cumin1003: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl2002.codfw.wmnet
  • 16:07 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 16:06 btullis@cumin1003: START - Cookbook sre.ganeti.makevm for new host dse-k8s-ctrl2001.codfw.wmnet
  • 15:57 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-drmrs
  • 15:48 btullis@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd2002.codfw.wmnet
  • 15:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd2002.codfw.wmnet with OS bookworm
  • 15:44 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-esams
  • 15:38 btullis@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd2003.codfw.wmnet
  • 15:38 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd2003.codfw.wmnet with OS bookworm
  • 15:31 bvibber@deploy1003: Finished scap sync-world: Backport for Support null values in data columns in transform output (T398597) (duration: 08m 52s)
  • 15:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd2002.codfw.wmnet with reason: host reimage
  • 15:27 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd2002.codfw.wmnet with reason: host reimage
  • 15:25 bvibber@deploy1003: bvibber: Continuing with sync
  • 15:24 bvibber@deploy1003: bvibber: Backport for Support null values in data columns in transform output (T398597) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:22 bvibber@deploy1003: Started scap sync-world: Backport for Support null values in data columns in transform output (T398597)
  • 15:21 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd2003.codfw.wmnet with reason: host reimage
  • 15:20 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-ulsfo
  • 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78815 and previous config saved to /var/cache/conftool/dbconfig/20250708-151939-root.json
  • 15:19 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-esams
  • 15:18 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on A:cp-magru
  • 15:18 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd2003.codfw.wmnet with reason: host reimage
  • 15:12 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-etcd2002.codfw.wmnet with OS bookworm
  • 15:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78814 and previous config saved to /var/cache/conftool/dbconfig/20250708-150434-root.json
  • 15:02 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2002.codfw.wmnet with OS bookworm
  • 14:57 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2002.codfw.wmnet - btullis@cumin1003"
  • 14:55 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-ulsfo
  • 14:53 sukhe@cumin1002: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on A:cp-magru
  • 14:53 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2002.codfw.wmnet - btullis@cumin1003"
  • 14:53 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd2002.codfw.wmnet on all recursors
  • 14:53 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd2002.codfw.wmnet on all recursors
  • 14:53 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:52 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-etcd2003.codfw.wmnet with OS bookworm
  • 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2003.codfw.wmnet - btullis@cumin1003"
  • 14:50 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2003.codfw.wmnet - btullis@cumin1003"
  • 14:50 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd2003.codfw.wmnet on all recursors
  • 14:50 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd2003.codfw.wmnet on all recursors
  • 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-etcd2003.codfw.wmnet - btullis@cumin1003"
  • 14:50 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-etcd2003.codfw.wmnet - btullis@cumin1003"
  • 14:50 pmiazga: Ran fixStuckGlobalRename.php for T398837
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78813 and previous config saved to /var/cache/conftool/dbconfig/20250708-144928-root.json
  • 14:47 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-eqsin and not P{cp[5017,5025].eqsin.wmnet} and A:cp - 2.8.15 upgrade (T398720)
  • 14:45 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:45 btullis@cumin1003: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd2003.codfw.wmnet
  • 14:41 moritzm: installing shadow security updates
  • 14:39 btullis@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dse-k8s-etcd2001.codfw.wmnet
  • 14:39 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd2001.codfw.wmnet with OS bookworm
  • 14:36 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cephosd2001.codfw.wmnet with OS bookworm
  • 14:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78812 and previous config saved to /var/cache/conftool/dbconfig/20250708-143422-root.json
  • 14:28 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1185 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78811 and previous config saved to /var/cache/conftool/dbconfig/20250708-142635-marostegui.json
  • 14:26 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 14:26 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:26 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:23 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:23 btullis@cumin1003: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd2002.codfw.wmnet
  • 14:21 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd2001.codfw.wmnet with reason: host reimage
  • 14:18 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd2001.codfw.wmnet with reason: host reimage
  • 13:53 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2002)
  • 13:53 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2002)
  • 13:50 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 13:50 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 13:40 moritzm: installing werkzeug security updates
  • 13:27 cmooney@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:27 cmooney@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new ML mega-hosts in eqiad - cmooney@cumin2002"
  • 13:23 cmooney@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new ML mega-hosts in eqiad - cmooney@cumin2002"
  • 13:20 cmooney@cumin2002: START - Cookbook sre.dns.netbox
  • 13:20 moritzm: restart clamav on VRTS to pick up ICU security updates
  • 13:18 moritzm: restarting Postfix on mx* and crm2001 to pick up ICU security updates
  • 13:17 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Do not log rate-limiting rules if it wouldn\'t be applied - oblivian@cumin1003"
  • 13:17 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Do not log rate-limiting rules if it wouldn\'t be applied - oblivian@cumin1003
  • 13:17 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Do not log rate-limiting rules if it wouldn\'t be applied - oblivian@cumin1003
  • 13:17 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Do not log rate-limiting rules if it wouldn\'t be applied - oblivian@cumin1003"
  • 13:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-etcd2001.codfw.wmnet with OS bookworm
  • 13:04 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbprov2005.codfw.wmnet,dbprov1005.eqiad.wmnet with reason: MariaDB package update
  • 12:59 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
  • 12:56 moritzm: installing ICU security updates on Bookworm
  • 12:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
  • 12:56 moritzm: installing ICU security updates
  • 12:55 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2001.codfw.wmnet - btullis@cumin1003"
  • 12:55 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dse-k8s-etcd2001.codfw.wmnet - btullis@cumin1003"
  • 12:54 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-etcd2001.codfw.wmnet on all recursors
  • 12:54 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-etcd2001.codfw.wmnet on all recursors
  • 12:54 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:54 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-etcd2001.codfw.wmnet - btullis@cumin1003"
  • 12:54 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dse-k8s-etcd2001.codfw.wmnet - btullis@cumin1003"
  • 12:54 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2002.codfw.wmnet with reason: host reimage
  • 12:52 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
  • 12:52 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2002.codfw.wmnet
  • 12:51 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:51 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:50 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2003.codfw.wmnet with reason: host reimage
  • 12:50 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:49 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:49 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1001.eqiad.wmnet
  • 12:49 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:48 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cephosd2001.codfw.wmnet with reason: host reimage
  • 12:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2002.codfw.wmnet
  • 12:46 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 12:46 btullis@cumin1003: START - Cookbook sre.ganeti.makevm for new host dse-k8s-etcd2001.codfw.wmnet
  • 12:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2001.codfw.wmnet
  • 12:44 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host moss-be1001.eqiad.wmnet
  • 12:43 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be1004.eqiad.wmnet
  • 12:40 moritzm: installing commons-beanutils security updates
  • 12:40 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2001.codfw.wmnet
  • 12:39 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host apus-be1004.eqiad.wmnet
  • 12:38 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1003.eqiad.wmnet
  • 12:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be2004.codfw.wmnet
  • 12:32 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cephosd2002
  • 12:32 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cephosd2002
  • 12:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host apus-be2004.codfw.wmnet
  • 12:31 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host moss-be1003.eqiad.wmnet
  • 12:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2003.codfw.wmnet
  • 12:30 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cephosd2002
  • 12:30 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cephosd2002.codfw.wmnet 235.32.192.10.in-addr.arpa 5.3.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:30 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache cephosd2002.codfw.wmnet 235.32.192.10.in-addr.arpa 5.3.2.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:30 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:30 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1002.eqiad.wmnet
  • 12:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cephosd2003
  • 12:28 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cephosd2003
  • 12:28 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 12:27 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cephosd2003
  • 12:27 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cephosd2003.codfw.wmnet 240.48.192.10.in-addr.arpa 0.4.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:27 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache cephosd2003.codfw.wmnet 240.48.192.10.in-addr.arpa 0.4.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:27 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:26 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cephosd2001
  • 12:26 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cephosd2001
  • 12:26 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:26 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:26 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2003.codfw.wmnet
  • 12:25 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host moss-be1002.eqiad.wmnet
  • 12:25 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 12:24 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cephosd2001
  • 12:24 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cephosd2001.codfw.wmnet 133.0.192.10.in-addr.arpa 3.3.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:24 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache cephosd2001.codfw.wmnet 133.0.192.10.in-addr.arpa 3.3.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 12:24 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:24 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cephosd2001 - btullis@cumin1003"
  • 12:24 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cephosd2001 - btullis@cumin1003"
  • 12:21 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host cephosd2003
  • 12:21 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 12:21 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cephosd2003.codfw.wmnet with OS bookworm
  • 12:21 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host cephosd2002
  • 12:20 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cephosd2002.codfw.wmnet with OS bookworm
  • 12:20 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host cephosd2001
  • 12:20 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host cephosd2001.codfw.wmnet with OS bookworm
  • 12:12 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 12:12 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 12:10 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 12:10 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 12:09 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 12:08 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 12:06 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 12:06 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas-eqiad
  • 12:03 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas-eqiad
  • 12:02 jmm@cumin1002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-eqiad
  • 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas-codfw
  • 12:00 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 12:00 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 12:00 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas-codfw
  • 11:59 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet
  • 11:54 btullis@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cephosd[2001-2003].codfw.wmnet with reason: Bootstrapping new ceph cluster
  • 11:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:52 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1002.eqiad.wmnet
  • 11:52 moritzm: restarting FPM on Phabricator nodes to pick up OpenSSL updates
  • 11:49 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet
  • 11:49 moritzm: restarting exim on Phabricator nodes to pick up OpenSSL updates
  • 11:44 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
  • 11:42 jmm@cumin1002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-eqiad
  • 11:39 jynus: upgrade db2201 mariadb package T394487
  • 11:37 jmm@cumin1002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:relforge
  • 11:36 jmm@cumin1002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:relforge
  • 11:35 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet
  • 11:35 hashar: Restarted Apache on gerrit1003 and gerrit2002
  • 11:31 zabe@deploy1003: Finished scap sync-world: Backport for Remove redundant group0 config for categorylinks, Set categorylinks to read new in cebwiki (T397912) (duration: 09m 35s)
  • 11:29 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
  • 11:29 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
  • 11:27 moritzm: restarting apache on mirror1001 to pick up openssl sec updates
  • 11:27 jmm@cumin1002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-codfw
  • 11:25 zabe@deploy1003: zabe: Continuing with sync
  • 11:24 zabe@deploy1003: zabe: Backport for Remove redundant group0 config for categorylinks, Set categorylinks to read new in cebwiki (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:24 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet
  • 11:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1159 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78807 and previous config saved to /var/cache/conftool/dbconfig/20250708-112344-root.json
  • 11:22 zabe@deploy1003: Started scap sync-world: Backport for Remove redundant group0 config for categorylinks, Set categorylinks to read new in cebwiki (T397912)
  • 11:20 jynus: upgrade db1216 mariadb package T394487
  • 11:15 moritzm: restarting slapd on seaborgium/serpens to pick up OpenSSL updates
  • 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 11:09 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 11:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1159 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78806 and previous config saved to /var/cache/conftool/dbconfig/20250708-110838-root.json
  • 11:07 jmm@cumin1002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-codfw
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P78805 and previous config saved to /var/cache/conftool/dbconfig/20250708-110656-root.json
  • 11:06 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 11:06 jmm@cumin1002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:cloudelastic
  • 11:04 jmm@cumin1002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:cloudelastic
  • 11:03 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2201.codfw.wmnet,db1216.eqiad.wmnet with reason: MariaDB package update
  • 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 11:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1003.eqiad.wmnet
  • 10:56 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host matomo1003.eqiad.wmnet
  • 10:56 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 10:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-cluster
  • 10:54 Emperor: reboot apus frontends in codfw T395240
  • 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 10:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1159 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78803 and previous config saved to /var/cache/conftool/dbconfig/20250708-105332-root.json
  • 10:52 ladsgroup@deploy1003: Finished scap sync-world: Backport for Fully get rid of tracking and updating pages (T398033), api-testing: Loosen the assert on max-age header, Fully get rid of tracking and updating pages (T398033) (duration: 09m 33s)
  • 10:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P78802 and previous config saved to /var/cache/conftool/dbconfig/20250708-105151-root.json
  • 10:49 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1005.eqiad.wmnet
  • 10:47 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 10:45 ladsgroup@deploy1003: ladsgroup: Backport for Fully get rid of tracking and updating pages (T398033), api-testing: Loosen the assert on max-age header, Fully get rid of tracking and updating pages (T398033) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:44 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-conf1005.eqiad.wmnet
  • 10:42 ladsgroup@deploy1003: Started scap sync-world: Backport for Fully get rid of tracking and updating pages (T398033), api-testing: Loosen the assert on max-age header, Fully get rid of tracking and updating pages (T398033)
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1159 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78801 and previous config saved to /var/cache/conftool/dbconfig/20250708-103826-root.json
  • 10:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-cluster
  • 10:37 Emperor: reboot apus frontends in eqiad T395240
  • 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P78800 and previous config saved to /var/cache/conftool/dbconfig/20250708-103645-root.json
  • 10:34 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1159 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78799 and previous config saved to /var/cache/conftool/dbconfig/20250708-103106-marostegui.json
  • 10:31 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 10:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1159 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78798 and previous config saved to /var/cache/conftool/dbconfig/20250708-102746-root.json
  • 10:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1004.eqiad.wmnet
  • 10:21 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-conf1004.eqiad.wmnet
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P78797 and previous config saved to /var/cache/conftool/dbconfig/20250708-102140-root.json
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1159 for migration to mariadb 10.11', diff saved to https://phabricator.wikimedia.org/P78796 and previous config saved to /var/cache/conftool/dbconfig/20250708-102114-marostegui.json
  • 10:21 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 10:20 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 10:14 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
  • 10:14 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
  • 10:12 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:11 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
  • 10:09 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
  • 10:07 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw
  • 10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2157', diff saved to https://phabricator.wikimedia.org/P78795 and previous config saved to /var/cache/conftool/dbconfig/20250708-100434-marostegui.json
  • 10:00 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 09:53 Amir1: dropping term store tables on s8 sanitarium master (T351820)
  • 09:51 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 09:51 moritzm: installling openssl security updates on Bullseye
  • 09:51 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:51 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 09:50 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:41 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 09:41 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:15 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.9 refs T392179
  • 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling reboot on A:schema-eqiad
  • 09:04 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling reboot on A:schema-eqiad
  • 08:59 aklapper@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.9 refs T392179 (duration: 43m 18s)
  • 08:52 moritzm: installing nginx security updates
  • 08:48 moritzm: installing Redis security updates
  • 08:30 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 08:30 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
  • 08:30 moritzm: created a stub user "bumpuid" to move the allocation of UIDs for accounted created in Wikimedia IDM to 100000+ T355663
  • 08:30 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 08:28 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 08:26 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 08:26 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
  • 08:16 aklapper@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.9 refs T392179
  • 08:11 moritzm: installing postgresql-15 security updates
  • 08:11 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:11 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:06 gmodena@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:06 gmodena@deploy1003: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:02 gmodena@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:01 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 07:55 fabfur: enabling puppet on A:cp (T329332)
  • 07:54 marostegui: Migrate s3 eqiad to SBR T383795
  • 07:45 fabfur: temporary disable puppet on A:cp to apply https://gerrit.wikimedia.org/r/1135643 (T329332)
  • 07:42 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 07:42 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 07:30 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 07:19 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 07:14 tchanders@deploy1003: Finished scap sync-world: Backport for temp accounts: Separate digits in user names with hyphens (T381845) (duration: 11m 02s)
  • 07:09 tchanders@deploy1003: tchanders: Continuing with sync
  • 07:05 tchanders@deploy1003: tchanders: Backport for temp accounts: Separate digits in user names with hyphens (T381845) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:03 tchanders@deploy1003: Started scap sync-world: Backport for temp accounts: Separate digits in user names with hyphens (T381845)
  • 06:35 moritzm: rebalance following reimages T382513
  • 06:31 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Revert - oblivian@cumin1003"
  • 06:31 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Revert - oblivian@cumin1003
  • 06:30 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Revert - oblivian@cumin1003
  • 06:30 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Revert - oblivian@cumin1003"
  • 06:15 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix varnis logging (take 2) - oblivian@cumin1003"
  • 06:14 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix varnis logging (take 2) - oblivian@cumin1003
  • 06:14 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix varnis logging (take 2) - oblivian@cumin1003
  • 06:14 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix varnis logging (take 2) - oblivian@cumin1003"
  • 05:52 marostegui: Migrate s3 codfw to SBR T383795
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78792 and previous config saved to /var/cache/conftool/dbconfig/20250708-054825-root.json
  • 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P78791 and previous config saved to /var/cache/conftool/dbconfig/20250708-054329-root.json
  • 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Reverty - oblivian@cumin1003"
  • 05:43 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Reverty - oblivian@cumin1003
  • 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Reverty - oblivian@cumin1003
  • 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Reverty - oblivian@cumin1003"
  • 05:42 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Reverty - oblivian@cumin1003"
  • 05:42 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Reverty - oblivian@cumin1003
  • 05:42 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Reverty - oblivian@cumin1003
  • 05:41 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Reverty - oblivian@cumin1003"
  • 05:35 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Feature: better logging of varnish rate-limits - oblivian@cumin1003"
  • 05:35 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: better logging of varnish rate-limits - oblivian@cumin1003
  • 05:35 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: better logging of varnish rate-limits - oblivian@cumin1003
  • 05:35 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Feature: better logging of varnish rate-limits - oblivian@cumin1003"
  • 05:33 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2003.wikimedia.org with reason: WIP
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78790 and previous config saved to /var/cache/conftool/dbconfig/20250708-053320-root.json
  • 05:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P78789 and previous config saved to /var/cache/conftool/dbconfig/20250708-052823-root.json
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78788 and previous config saved to /var/cache/conftool/dbconfig/20250708-051814-root.json
  • 05:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P78787 and previous config saved to /var/cache/conftool/dbconfig/20250708-051318-root.json
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78786 and previous config saved to /var/cache/conftool/dbconfig/20250708-050308-root.json
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P78785 and previous config saved to /var/cache/conftool/dbconfig/20250708-045812-root.json
  • 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1237 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P78784 and previous config saved to /var/cache/conftool/dbconfig/20250708-044803-root.json
  • 04:39 marostegui@dns1006: END - running authdns-update
  • 04:38 marostegui@dns1006: START - running authdns-update
  • 04:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1162 T398906', diff saved to https://phabricator.wikimedia.org/P78783 and previous config saved to /var/cache/conftool/dbconfig/20250708-043814-marostegui.json
  • 04:38 marostegui@dns1006: END - running authdns-update
  • 04:37 marostegui@dns1006: START - running authdns-update
  • 04:36 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1222 to s2 primary and set section read-write T398906', diff saved to https://phabricator.wikimedia.org/P78782 and previous config saved to /var/cache/conftool/dbconfig/20250708-043654-root.json
  • 04:36 marostegui@cumin1002: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T398906', diff saved to https://phabricator.wikimedia.org/P78781 and previous config saved to /var/cache/conftool/dbconfig/20250708-043628-root.json
  • 04:36 marostegui: Starting s2 eqiad failover from db1162 to db1222 - T398906
  • 04:26 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1222 with weight 0 T398906', diff saved to https://phabricator.wikimedia.org/P78780 and previous config saved to /var/cache/conftool/dbconfig/20250708-042646-root.json
  • 04:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 T398906
  • 04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.6 (duration: 04m 24s)
  • 02:42 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 02:23 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 02:20 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 01:59 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 00:21 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 00:20 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply

2025-07-07

  • 22:58 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 22:35 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 22:16 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 22:13 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 21:58 maryum: Deployed security fix for T397577
  • 21:52 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 21:36 maryum: Deployed security fix for T398636
  • 21:11 zabe@deploy1003: Finished scap sync-world: Backport for Straight join collation table to make sure it is last (T398860) (duration: 10m 33s)
  • 21:05 zabe@deploy1003: zabe: Continuing with sync
  • 21:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
  • 21:02 zabe@deploy1003: zabe: Backport for Straight join collation table to make sure it is last (T398860) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:01 ladsgroup@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 21:00 zabe@deploy1003: Started scap sync-world: Backport for Straight join collation table to make sure it is last (T398860)
  • 20:33 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 20:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast6003.wikimedia.org to drbd
  • 20:16 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 20:13 ebernhardson@deploy1003: Finished scap sync-world: Backport for cirrus: Start AB test of completion suggester fuzziness (T397732) (duration: 10m 28s)
  • 20:12 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2007-dev.codfw.wmnet with reason: host reimage
  • 20:11 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 20:10 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 20:07 ebernhardson@deploy1003: ebernhardson: Continuing with sync
  • 20:05 ebernhardson@deploy1003: ebernhardson: Backport for cirrus: Start AB test of completion suggester fuzziness (T397732) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:02 ebernhardson@deploy1003: Started scap sync-world: Backport for cirrus: Start AB test of completion suggester fuzziness (T397732)
  • 19:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast6003.wikimedia.org to drbd
  • 19:51 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd2007-dev.codfw.wmnet with OS bookworm
  • 19:39 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 19:38 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 19:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1179.eqiad.wmnet with OS bullseye
  • 19:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:31 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1179.eqiad.wmnet with reason: host reimage
  • 19:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1179.eqiad.wmnet with reason: host reimage
  • 18:59 bvibber@deploy1003: Finished scap sync-world: Backport for Fix for validation error display in transformed chart data (T398597) (duration: 08m 40s)
  • 18:58 sukhe: sukhe@cp7006:/var/run/confd-template$ sudo rm _etc_haproxy_conf.d_tls.cfg.err
  • 18:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1179.eqiad.wmnet with OS bullseye
  • 18:55 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 18:54 bvibber@deploy1003: bvibber: Continuing with sync
  • 18:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1179.eqiad.wmnet with OS bullseye
  • 18:53 bvibber@deploy1003: bvibber: Backport for Fix for validation error display in transformed chart data (T398597) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:51 bvibber@deploy1003: Started scap sync-world: Backport for Fix for validation error display in transformed chart data (T398597)
  • 18:40 zabe@deploy1003: Finished scap sync-world: Backport for Revert^2 "Set categorylinks to read new in medium wikis" (T397912) (duration: 09m 54s)
  • 18:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1179.eqiad.wmnet with OS bullseye
  • 18:35 zabe@deploy1003: zabe: Continuing with sync
  • 18:32 zabe@deploy1003: zabe: Backport for Revert^2 "Set categorylinks to read new in medium wikis" (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:31 zabe@deploy1003: Started scap sync-world: Backport for Revert^2 "Set categorylinks to read new in medium wikis" (T397912)
  • 18:12 zabe@deploy1003: Finished scap sync-world: Backport for Apply conditions to correct column (T398823) (duration: 11m 14s)
  • 18:10 urandom: bootstrapping Cassandra/sessionstore1006-a — T391544
  • 18:09 sukhe@dns1004: END - running authdns-update
  • 18:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
  • 18:08 sukhe@dns1004: START - running authdns-update
  • 18:06 zabe@deploy1003: zabe: Continuing with sync
  • 18:04 sukhe: [end] rolling upgrade of haproxy on A:dnsbox to 2.6.12-1+deb12u2
  • 18:04 sukhe: [emd] rolling upgrade of haproxy on A:dnsbox to 2.6.12-1+deb12u2
  • 18:03 zabe@deploy1003: zabe: Backport for Apply conditions to correct column (T398823) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:00 zabe@deploy1003: Started scap sync-world: Backport for Apply conditions to correct column (T398823)
  • 17:58 bking@cumin1002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 17:58 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 17:58 ladsgroup@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
  • 17:58 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitarium_restart (exit_code=99)
  • 17:57 ladsgroup@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
  • 17:45 sukhe: [start] rolling upgrade of haproxy on A:dnsbox to 2.6.12-1+deb12u2
  • 17:40 bking@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=search-omega*,name=eqiad
  • 17:40 bking@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=search-psi*,name=eqiad
  • 17:40 bking@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=search*,name=eqiad
  • 17:35 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 17:12 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 17:09 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
  • 17:07 bking@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=search-psi*,name=eqiad
  • 17:07 bking@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=search-omega*,name=eqiad
  • 17:06 bking@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=search*,name=eqiad
  • 17:05 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
  • 16:49 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 16:49 taavi@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2003-dev.codfw.wmnet
  • 16:49 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 16:43 taavi@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudgw2003-dev.codfw.wmnet
  • 16:38 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 16:20 eevans@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 16:16 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 16:15 bking@cumin1002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 16:08 elukey: kafka preferred-replica-election on kafka1011 to rebalance partition leaders on kafka-jumbo
  • 16:04 elukey: restart kafka on kafka1015 (forth and last node without restart in the previous cookbook run)
  • 16:02 elukey: restart kafka on kafka1014 (second node without restart in the previous cookbook run)
  • 16:00 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 15:59 elukey: restart kafka on kafka1013 (second node without restart in the previous cookbook run)
  • 15:56 elukey: restart kafka on kafka1012 (first node without restart in the previous cookbook run)
  • 15:55 elukey: kafka-preferred-replica on kafka-jumbo
  • 15:46 moritzm: installing busybox updates from Bookworm point release
  • 15:37 moritzm: installing zsh updates from Bookworm point release
  • 15:33 moritzm: installing postgresql security updates
  • 15:28 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling reboot on A:schema-codfw
  • 15:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thanos-be[2001-2004].codfw.wmnet
  • 15:25 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:25 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-be[2001-2004].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
  • 15:24 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-be[2001-2004].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
  • 15:24 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 15:24 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
  • 15:22 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 15:22 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
  • 15:22 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling reboot on A:schema-codfw
  • 15:21 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 15:18 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2006.codfw.wmnet
  • 15:18 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:18 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:17 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:17 elukey@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:12 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp2006.codfw.wmnet
  • 15:10 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts thanos-be[2001-2004].codfw.wmnet
  • 15:02 brouberol@cumin2002: END (FAIL) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=99) rolling restart_daemons on A:kafka-jumbo-eqiad
  • 15:02 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 15:00 sukhe: sudo cumin -b1 -s120 'A:dnsbox and not P{dns7001*}' "run-puppet-agent --enable 'merging CR 1166223'": T374619
  • 15:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl1002.eqiad.wmnet
  • 14:58 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: [done] testing CR 1166223: T374619]
  • 14:58 vgutierrez: switching lvs3010 to katran - T396561
  • 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl1002.eqiad.wmnet
  • 14:54 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing CR 1166223: T374619]
  • 14:47 sukhe: sudo cumin 'A:dnsbox' "disable-puppet 'merging CR 1166223'": rolling out prom metrics for anycast-hc: T374619
  • 14:46 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
  • 14:42 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1006.eqiad.wmnet with reason: host reimage
  • 14:41 vgutierrez: switching lvs6003 to katran - T396561
  • 14:39 sukhe: sudo cumin -b1 -s10 'A:wikidough' "run-puppet-agent --enable 'merging CR 1166838'"
  • 14:38 sukhe: sudo cumin -s1 -b10 'A:wikidough' "run-puppet-agent --enable 'merging CR 1166838'"
  • 14:32 sukhe: sudo cumin 'A:wikidough' "disable-puppet 'merging CR 1166838'"
  • 14:28 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: activate new plugins packages - bking@cumin1002 - T397227
  • 14:26 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 14:25 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 14:22 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 14:22 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 14:18 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
  • 14:14 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore1006.eqiad.wmnet with OS bullseye
  • 14:10 urandom: decommissioning Cassandra/sessionstore-a — T391544
  • 14:09 sukhe: sudo cumin -b1 -s10 'A:dnsbox' "run-puppet-agent --enable 'merging CR 1166210'"
  • 14:07 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2006-dev.codfw.wmnet with OS bookworm
  • 14:05 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 14:03 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@f79034f]: remove dumps 1.0 sensor from SLIS (duration: 00m 46s)
  • 14:02 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@f79034f]: remove dumps 1.0 sensor from SLIS
  • 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl1003.eqiad.wmnet
  • 13:54 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 18.0
  • 13:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl1003.eqiad.wmnet
  • 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl2002.codfw.wmnet
  • 13:51 sukhe: sudo cumin 'A:dnsbox' "disable-puppet 'merging CR 1166210'"
  • 13:49 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2006-dev.codfw.wmnet with reason: host reimage
  • 13:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl2002.codfw.wmnet
  • 13:47 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1069.eqiad.wmnet
  • 13:47 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1069.eqiad.wmnet
  • 13:46 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker1069.eqiad.wmnet
  • 13:46 cgoubert@cumin1003: START - Cookbook sre.hosts.remove-downtime for wikikube-worker1069.eqiad.wmnet
  • 13:45 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2006-dev.codfw.wmnet with reason: host reimage
  • 13:45 zabe@deploy1003: Finished scap sync-world: Backport for Revert "Set categorylinks to read new in medium wikis" (duration: 07m 59s)
  • 13:45 claime: homer "cr*eqiad*" commit 'wikikube-worker1069 back to active'
  • 13:42 cgoubert@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:40 zabe@deploy1003: zabe: Continuing with sync
  • 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl2003.codfw.wmnet
  • 13:39 cgoubert@cumin1003: START - Cookbook sre.dns.netbox
  • 13:39 zabe@deploy1003: zabe: Backport for Revert "Set categorylinks to read new in medium wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:39 cgoubert@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:37 zabe@deploy1003: Started scap sync-world: Backport for Revert "Set categorylinks to read new in medium wikis"
  • 13:36 cgoubert@cumin1003: START - Cookbook sre.dns.netbox
  • 13:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl2003.codfw.wmnet
  • 13:34 sukhe: sudo cumin -b11 'C:bird' "run-puppet-agent --enable 'merging CR 1166222'": NOOP change
  • 13:31 zabe@deploy1003: zabe: Continuing with sync
  • 13:30 zabe@deploy1003: zabe: Backport for Set categorylinks to read new in medium wikis (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:28 zabe@deploy1003: Started scap sync-world: Backport for Set categorylinks to read new in medium wikis (T397912)
  • 13:28 sukhe: sudo cumin 'C:bird' "disable-puppet 'merging CR 1166222'"
  • 13:27 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2006-dev.codfw.wmnet with OS bookworm
  • 13:26 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2005-dev.codfw.wmnet with OS bookworm
  • 13:19 ladsgroup@deploy1003: Finished scap sync-world: Backport for mrwiki: Correct draft namespace spelling (T398792) (duration: 09m 26s)
  • 13:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2146* gradually with 4 steps - Work done
  • 13:13 ladsgroup@deploy1003: ladsgroup, hamishz: Continuing with sync
  • 13:11 ladsgroup@deploy1003: ladsgroup, hamishz: Backport for mrwiki: Correct draft namespace spelling (T398792) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:09 ladsgroup@deploy1003: Started scap sync-world: Backport for mrwiki: Correct draft namespace spelling (T398792)
  • 13:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for Drop ability to use VueTest on a wiki (T357475) (duration: 37m 21s)
  • 13:07 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 13:05 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 12:59 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:59 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:58 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.topology-check (exit_code=99) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:58 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
  • 12:57 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2002)
  • 12:57 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2002)
  • 12:55 ladsgroup@deploy1003: ladsgroup, jforrester: Continuing with sync
  • 12:54 ladsgroup@deploy1003: ladsgroup, jforrester: Backport for Drop ability to use VueTest on a wiki (T357475) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:47 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2005-dev.codfw.wmnet with OS bookworm
  • 12:35 akosiaris@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2042.codfw.wmnet
  • 12:35 akosiaris@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2042.codfw.wmnet
  • 12:34 akosiaris@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2046.codfw.wmnet
  • 12:34 akosiaris@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2046.codfw.wmnet
  • 12:32 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2146* gradually with 4 steps - Work done
  • 12:30 ladsgroup@deploy1003: Started scap sync-world: Backport for Drop ability to use VueTest on a wiki (T357475)
  • 12:28 ladsgroup@deploy1003: Finished scap sync-world: Backport for Revert "Increase max db connection count before circuit breaking" (T398692) (duration: 08m 13s)
  • 12:22 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 12:21 ladsgroup@deploy1003: ladsgroup: Backport for Revert "Increase max db connection count before circuit breaking" (T398692) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:19 ladsgroup@deploy1003: Started scap sync-world: Backport for Revert "Increase max db connection count before circuit breaking" (T398692)
  • 12:18 ladsgroup@deploy1003: Finished scap sync-world: Backport for Use dblist for wikilove (duration: 12m 28s)
  • 12:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS trixie
  • 12:10 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6001.drmrs.wmnet to drbd
  • 12:08 ladsgroup@deploy1003: ladsgroup: Backport for Use dblist for wikilove synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:06 ladsgroup@deploy1003: Started scap sync-world: Backport for Use dblist for wikilove
  • 12:04 XioNoX: reboot lsw1-a8-codfw - T398433
  • 12:03 akosiaris@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2046.codfw.wmnet
  • 12:03 akosiaris@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2046.codfw.wmnet
  • 12:02 akosiaris@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2042.codfw.wmnet
  • 12:02 akosiaris@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2042.codfw.wmnet
  • 12:00 ladsgroup@deploy1003: Finished scap sync-world: Backport for Revert^2 "Clean up EventBus and jobs config" (duration: 35m 06s)
  • 12:00 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6001.drmrs.wmnet to drbd
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6001.wikimedia.org to drbd
  • 11:59 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2146.codfw.wmnet with reason: Just in case (T398433)
  • 11:56 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS trixie
  • 11:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db2146 T398433', diff saved to https://phabricator.wikimedia.org/P78771 and previous config saved to /var/cache/conftool/dbconfig/20250707-115457-ladsgroup.json
  • 11:51 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6001.wikimedia.org to drbd
  • 11:47 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 11:46 ladsgroup@deploy1003: ladsgroup: Backport for Revert^2 "Clean up EventBus and jobs config" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 11:42 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS trixie
  • 11:25 ladsgroup@deploy1003: Started scap sync-world: Backport for Revert^2 "Clean up EventBus and jobs config"
  • 11:10 root@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 11:06 root@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 11:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1250.eqiad.wmnet with reason: Maintenance
  • 11:02 moritzm: installing modsecurity-apache security updates
  • 10:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1217.eqiad.wmnet with reason: Maintenance
  • 10:45 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 10:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2234].codfw.wmnet with reason: Maintenance
  • 10:35 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 10:30 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 10:24 Emperor: remove swift-account-stats_machinetranslation:prod time & service from thanos-fe1004 T335491
  • 10:17 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 10:13 root@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 10:09 root@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 09:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 09:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 09:42 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 09:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 09:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1250.eqiad.wmnet with reason: Maintenance
  • 09:18 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 09:13 marostegui: Failover m2 from db1250 to db1228 - T397633
  • 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 09:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2233].codfw.wmnet,db[1217,1228,1250].eqiad.wmnet with reason: maintenance
  • 08:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1237.eqiad.wmnet with reason: Maintenance
  • 08:01 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Feature: logging of deny actions; add rename functionality - oblivian@cumin1003"
  • 08:01 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: logging of deny actions; add rename functionality - oblivian@cumin1003
  • 08:00 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: logging of deny actions; add rename functionality - oblivian@cumin1003
  • 08:00 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Feature: logging of deny actions; add rename functionality - oblivian@cumin1003"
  • 08:00 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1237.eqiad.wmnet with reason: Maintenance
  • 07:53 vgutierrez: repooling cp7006 with Ia82b93 applied - T397917
  • 07:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1237 T397612', diff saved to https://phabricator.wikimedia.org/P78763 and previous config saved to /var/cache/conftool/dbconfig/20250707-075308-root.json
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1220 to x1 primary and set section read-write T397612', diff saved to https://phabricator.wikimedia.org/P78762 and previous config saved to /var/cache/conftool/dbconfig/20250707-075254-root.json
  • 07:51 marostegui@dns1006: END - running authdns-update
  • 07:50 marostegui@dns1006: START - running authdns-update
  • 07:25 vgutierrez: depooling cp7006 to test Ia82b93 - T397917
  • 07:25 marostegui: Starting x1 eqiad failover from db1237 to db1220 - T397612
  • 07:22 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1220 with weight 0 T397612', diff saved to https://phabricator.wikimedia.org/P78760 and previous config saved to /var/cache/conftool/dbconfig/20250707-072157-root.json
  • 07:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Primary switchover x1 T397612
  • 07:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 07:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 07:04 vgutierrez: testing haproxy 2.8.15 in cp5017 and cp5025 - T398720
  • 06:29 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 06:29 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply

2025-07-04

  • 21:39 krinkle@deploy1003: Finished scap sync-world: Backport for beta: Change loginwiki/metawiki/auth canonical to beta.wmcloud.org (T289318) (duration: 18m 12s)
  • 21:33 krinkle@deploy1003: krinkle: Continuing with sync
  • 21:23 krinkle@deploy1003: krinkle: Backport for beta: Change loginwiki/metawiki/auth canonical to beta.wmcloud.org (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:21 krinkle@deploy1003: Started scap sync-world: Backport for beta: Change loginwiki/metawiki/auth canonical to beta.wmcloud.org (T289318)
  • 20:32 krinkle@deploy1003: Finished scap sync-world: Backport for beta: Include allowance for wmcloud.org in wgGraphAllowedDomains (T289318), beta: Change Beta wikidata canonical to beta.wmcloud.org (T289318) (duration: 94m 52s)
  • 20:26 krinkle@deploy1003: krinkle: Continuing with sync
  • 18:59 krinkle@deploy1003: krinkle: Backport for beta: Include allowance for wmcloud.org in wgGraphAllowedDomains (T289318), beta: Change Beta wikidata canonical to beta.wmcloud.org (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:57 krinkle@deploy1003: Started scap sync-world: Backport for beta: Include allowance for wmcloud.org in wgGraphAllowedDomains (T289318), beta: Change Beta wikidata canonical to beta.wmcloud.org (T289318)
  • 15:14 vgutierrez: fetch haproxy 2.8.15 on thirdparty/haproxy28 component for bullseye-wikimedia (apt.wm.o)
  • 14:46 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye
  • 14:40 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1179.eqiad.wmnet with OS bullseye
  • 14:36 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 14:29 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 14:20 vgutierrez: repooling cp7006
  • 14:20 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 14:12 vgutierrez: depooling cp7006 for testing purposes
  • 14:09 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 14:06 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1179.eqiad.wmnet with OS bullseye
  • 14:01 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 13:15 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 13:08 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 12:59 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 12:51 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 12:31 vgutierrez: repool cp7006
  • 12:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7006.magru.wmnet
  • 12:31 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp7006.magru.wmnet
  • 12:11 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@38ba3ec]: bump section topics to v1.8.0 (duration: 00m 49s)
  • 12:11 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@38ba3ec]: bump section topics to v1.8.0
  • 11:08 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v10.0.2 with ibgp function in plugin - cmooney@cumin1003
  • 11:05 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v10.0.2 with ibgp function in plugin - cmooney@cumin1003
  • 10:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 32 hosts with reason: maintenance
  • 10:51 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2006.codfw.wmnet with OS bookworm
  • 10:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2203,2212].codfw.wmnet with reason: Maintenance
  • 10:41 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2006.codfw.wmnet with OS bookworm
  • 10:27 cgoubert@deploy1003: Unlocked for deployment [ALL REPOSITORIES]: Dragonfly supernodes reboot (duration: 09m 07s)
  • 10:26 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
  • 10:23 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
  • 10:22 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode2001.codfw.wmnet
  • 10:18 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
  • 10:18 cgoubert@deploy1003: Locking from deployment [ALL REPOSITORIES]: Dragonfly supernodes reboot
  • 10:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:01 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backupmon1001.eqiad.wmnet: Renew puppet certificate - jynus@cumin1002
  • 09:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install6002.wikimedia.org to drbd
  • 09:07 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backupmon1001.eqiad.wmnet with reason: Maintenance and reboot
  • 08:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install6002.wikimedia.org to drbd
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6001.drmrs.wmnet to drbd
  • 08:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2077.codfw.wmnet
  • 08:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6001.drmrs.wmnet to drbd
  • 08:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2077.codfw.wmnet
  • 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti6001.drmrs.wmnet to cluster drmrs01 and group B12
  • 08:32 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti6001.drmrs.wmnet to cluster drmrs01 and group B12
  • 08:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
  • 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
  • 08:04 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2020.codfw.wmnet
  • 08:04 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:04 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6001.drmrs.wmnet with OS bookworm
  • 08:03 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2020.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 07:58 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 07:56 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: testing
  • 07:53 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts ganeti2020.codfw.wmnet
  • 07:53 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2019.codfw.wmnet
  • 07:53 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:53 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 07:53 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2019.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti6001.drmrs.wmnet with reason: host reimage
  • 07:42 vgutierrez: depooling cp7006 for testing purposes
  • 07:42 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 07:39 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti6001.drmrs.wmnet with reason: host reimage
  • 07:36 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts ganeti2019.codfw.wmnet
  • 07:21 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti6001.drmrs.wmnet with OS bookworm
  • 07:19 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti6001.drmrs.wmnet with reason: reimage
  • 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver2003.codfw.wmnet
  • 07:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetserver2003.codfw.wmnet
  • 06:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc1002.eqiad.wmnet
  • 06:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc1002.eqiad.wmnet
  • 06:32 moritzm: failover Ganeti master in drmrs01 to ganeti6003 T382513
  • 06:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6001.drmrs.wmnet to plain
  • 06:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6001.drmrs.wmnet to plain
  • 06:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6001.wikimedia.org to plain
  • 06:29 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6001.wikimedia.org to plain
  • 06:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6001.drmrs.wmnet to plain
  • 06:26 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6001.drmrs.wmnet to plain
  • 06:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc1001.eqiad.wmnet
  • 06:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install6002.wikimedia.org to plain
  • 06:24 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install6002.wikimedia.org to plain
  • 06:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc1001.eqiad.wmnet
  • 06:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3007.wikimedia.org
  • 06:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3007.wikimedia.org
  • 04:32 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 04:25 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:52 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:46 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:44 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:44 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:22 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 03:21 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1042
  • 03:20 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1042
  • 03:19 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:19 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [cloudcephosd1042] - vriley@cumin1002"
  • 03:19 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [cloudcephosd1042] - vriley@cumin1002"
  • 03:15 vriley@cumin1002: START - Cookbook sre.dns.netbox

2025-07-03

  • 21:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast6003.wikimedia.org to plain
  • 21:19 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast6003.wikimedia.org to plain
  • 21:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
  • 21:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
  • 21:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast6003.wikimedia.org to drbd
  • 21:16 zabe@deploy1003: Finished scap sync-world: Backport for special: Do not throw ErrorPageError from getRedirect() (T398167), Set categorylinks to read new on small wikis (T397912) (duration: 08m 37s)
  • 21:11 zabe@deploy1003: kharlan, zabe: Continuing with sync
  • 21:09 zabe@deploy1003: kharlan, zabe: Backport for special: Do not throw ErrorPageError from getRedirect() (T398167), Set categorylinks to read new on small wikis (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:08 zabe@deploy1003: Started scap sync-world: Backport for special: Do not throw ErrorPageError from getRedirect() (T398167), Set categorylinks to read new on small wikis (T397912)
  • 20:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org
  • 20:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast6003.wikimedia.org to drbd
  • 20:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org
  • 20:47 arlolra@deploy1003: Finished scap sync-world: Backport for Use FallbackContentHandler for undeployed JsonConfig content handlers (T124748), ExtensionDistributor: Mark 1.44 as stable; remove 1.42 as EOL (T390798 T389313) (duration: 08m 27s)
  • 20:41 arlolra@deploy1003: arlolra, matmarex: Continuing with sync
  • 20:40 arlolra@deploy1003: arlolra, matmarex: Backport for Use FallbackContentHandler for undeployed JsonConfig content handlers (T124748), ExtensionDistributor: Mark 1.44 as stable; remove 1.42 as EOL (T390798 T389313) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:38 arlolra@deploy1003: Started scap sync-world: Backport for Use FallbackContentHandler for undeployed JsonConfig content handlers (T124748), ExtensionDistributor: Mark 1.44 as stable; remove 1.42 as EOL (T390798 T389313)
  • 20:36 cscott@deploy1003: Finished scap sync-world: Backport for skin: Omit "rendered with" phrase when the message is disabled (T398616) (duration: 08m 30s)
  • 20:30 cscott@deploy1003: cscott: Continuing with sync
  • 20:29 cscott@deploy1003: cscott: Backport for skin: Omit "rendered with" phrase when the message is disabled (T398616) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:27 cscott@deploy1003: Started scap sync-world: Backport for skin: Omit "rendered with" phrase when the message is disabled (T398616)
  • 20:12 zabe@deploy1003: Finished scap sync-world: Backport for Use correct index on categorylinks (T385890) (duration: 08m 32s)
  • 20:06 zabe@deploy1003: zabe: Continuing with sync
  • 20:05 zabe@deploy1003: zabe: Backport for Use correct index on categorylinks (T385890) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:03 zabe@deploy1003: Started scap sync-world: Backport for Use correct index on categorylinks (T385890)
  • 19:36 joal@deploy1003: Finished deploy [airflow-dags/analytics@7ba4a7b]: BUGFIX - Synchronize artifact for airflow_dags/analytics (duration: 01m 02s)
  • 19:35 joal@deploy1003: Started deploy [airflow-dags/analytics@7ba4a7b]: BUGFIX - Synchronize artifact for airflow_dags/analytics
  • 19:34 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@7ba4a7b]: BUGFIX - Synchronize artifact for airflow_dags/analytics_test (duration: 00m 16s)
  • 19:34 joal@deploy1003: Started deploy [airflow-dags/analytics_test@7ba4a7b]: BUGFIX - Synchronize artifact for airflow_dags/analytics_test
  • 17:33 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1176.eqiad.wmnet with OS bullseye
  • 17:26 joal@deploy1003: Finished deploy [airflow-dags/analytics@9088e59]: Synchronize artifacts for airflow_dags/analytics (duration: 00m 40s)
  • 17:25 joal@deploy1003: Started deploy [airflow-dags/analytics@9088e59]: Synchronize artifacts for airflow_dags/analytics
  • 17:24 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@9088e59]: Synchronize artifacat for airflow_dags/analytics_test (duration: 00m 15s)
  • 17:24 joal@deploy1003: Started deploy [airflow-dags/analytics_test@9088e59]: Synchronize artifacat for airflow_dags/analytics_test
  • 17:18 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1176.eqiad.wmnet with reason: host reimage
  • 17:15 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1176.eqiad.wmnet with reason: host reimage
  • 17:13 cmooney@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 17:13 cmooney@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 17:13 cmooney@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 17:12 cmooney@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 17:01 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 16:32 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 16:32 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 16:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 16:31 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 16:11 vgutierrez: repooling cp7006
  • 16:09 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7006.magru.wmnet
  • 16:09 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp7006.magru.wmnet
  • 15:52 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 15:52 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 15:46 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 15:46 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 15:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 15:38 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1176.eqiad.wmnet with OS bullseye
  • 15:34 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: testing
  • 15:33 vgutierrez: depooling cp7006 for testing
  • 15:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T395241)', diff saved to https://phabricator.wikimedia.org/P78755 and previous config saved to /var/cache/conftool/dbconfig/20250703-153141-fceratto.json
  • 15:25 jmm@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:aux-worker-codfw
  • 15:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 15:22 vgutierrez: lvs5006 migrated to katran - T396561
  • 15:21 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs5006.eqsin.wmnet
  • 15:21 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs5006.eqsin.wmnet
  • 15:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P78754 and previous config saved to /var/cache/conftool/dbconfig/20250703-151633-fceratto.json
  • 15:10 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs5006.eqsin.wmnet with reason: katran migration
  • 15:04 jmm@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:aux-worker-codfw
  • 15:04 jmm@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:aux-worker-eqiad
  • 15:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P78753 and previous config saved to /var/cache/conftool/dbconfig/20250703-150126-fceratto.json
  • 14:56 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup1007.eqiad.wmnet: Renew puppet certificate - jynus@cumin1002
  • 14:55 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1005.eqiad.wmnet
  • 14:51 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host registry1005.eqiad.wmnet
  • 14:50 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup1006.eqiad.wmnet: Renew puppet certificate - jynus@cumin1002
  • 14:50 volans: uploaded debmonitor-server,python3-debmonitor_0.6.6 to apt.wikimedia.org bookworm-wikimedia
  • 14:49 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1004.eqiad.wmnet
  • 14:48 vgutierrez: repooling cp7006
  • 14:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T395241)', diff saved to https://phabricator.wikimedia.org/P78752 and previous config saved to /var/cache/conftool/dbconfig/20250703-144619-fceratto.json
  • 14:45 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host registry1004.eqiad.wmnet
  • 14:45 jmm@dns1004: END - running authdns-update
  • 14:44 jmm@dns1004: START - running authdns-update
  • 14:43 jmm@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:aux-worker-eqiad
  • 14:38 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2213 (T395241)', diff saved to https://phabricator.wikimedia.org/P78751 and previous config saved to /var/cache/conftool/dbconfig/20250703-143854-fceratto.json
  • 14:38 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 14:32 moritzm: installing bootstrap4 security updates
  • 14:23 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1176.eqiad.wmnet with OS bullseye
  • 14:17 vgutierrez: depooling cp7006 for testing
  • 14:09 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup1007.eqiad.wmnet with reason: Maintenance and reboot
  • 14:08 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup1006.eqiad.wmnet with reason: Maintenance and reboot
  • 14:05 moritzm: restarting clamav to pick up libxml security updates
  • 14:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
  • 13:59 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
  • 13:47 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1047.eqiad.wmnet
  • 13:46 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
  • 13:46 sukhe: sudo cumin 'A:wikidough' "disable-puppet 'merging CR 1163859'"
  • 13:45 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2005.codfw.wmnet
  • 13:40 moritzm: installing libxml2 security updates on bookworm
  • 13:40 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host registry2005.codfw.wmnet
  • 13:40 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
  • 13:39 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2005-dev.codfw.wmnet with OS bullseye
  • 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6001.drmrs.wmnet to drbd
  • 13:35 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host registry2004.codfw.wmnet
  • 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6001.drmrs.wmnet to drbd
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6001.wikimedia.org to drbd
  • 13:22 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 13:21 sukhe: sudo cumin -b11 'C:bird' "run-puppet-agent --enable 'merging CR 1163858'": NOOP change T374619
  • 13:20 TheresNoTime: done UTC afternoon backport window
  • 13:18 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2005-dev.codfw.wmnet with reason: host reimage
  • 13:18 samtar@deploy1003: Finished scap sync-world: Backport for InitialiseSettings: Enable wgTemplateDataEnableDiscovery as default (T377978), Allow abusefilter block action on plwikiquote (T398137) (duration: 14m 04s)
  • 13:18 sukhe: sudo cumin 'C:bird' "disable-puppet 'merging CR 1163858'": T374619
  • 13:17 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6001.wikimedia.org to drbd
  • 13:11 samtar@deploy1003: samtar, eggroll97: Continuing with sync
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6001.drmrs.wmnet to drbd
  • 13:08 samtar@deploy1003: samtar, eggroll97: Backport for InitialiseSettings: Enable wgTemplateDataEnableDiscovery as default (T377978), Allow abusefilter block action on plwikiquote (T398137) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:04 samtar@deploy1003: Started scap sync-world: Backport for InitialiseSettings: Enable wgTemplateDataEnableDiscovery as default (T377978), Allow abusefilter block action on plwikiquote (T398137)
  • 12:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6001.drmrs.wmnet to drbd
  • 12:59 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2005-dev.codfw.wmnet with OS bullseye
  • 12:54 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@09893e3]: bump section topics to v1.7.0 (duration: 03m 20s)
  • 12:51 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@09893e3]: bump section topics to v1.7.0
  • 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install6002.wikimedia.org to drbd
  • 11:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:45 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
  • 11:45 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
  • 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install6002.wikimedia.org to drbd
  • 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti6003.drmrs.wmnet to cluster drmrs01 and group B12
  • 11:37 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup1005.eqiad.wmnet: Renew puppet certificate - jynus@cumin1002
  • 11:35 jiji@deploy1003: Finished scap sync-world: T397907 - Upgrade Excimer to 1.2.5 in production (duration: 06m 59s)
  • 11:30 jiji@deploy1003: Started scap sync-world: T397907 - Upgrade Excimer to 1.2.5 in production
  • 11:29 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti6003.drmrs.wmnet to cluster drmrs01 and group B12
  • 11:27 jiji@deploy1003: Unlocked for deployment [ALL REPOSITORIES]: T397907 - Upgrade Excimer to 1.2.5 in production in progress, blocking deploys (duration: 44m 16s)
  • 11:26 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:21 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:21 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:21 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:17 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup1004.eqiad.wmnet: Renew puppet certificate - jynus@cumin1002
  • 11:16 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:15 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:15 effie: starting staged rollout of Excimer to 1.2.5, mw-api-ext
  • 11:15 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
  • 11:11 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:11 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for rgw.codfw.dpe.anycast.wmnet - cmooney@cumin1003"
  • 11:11 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entry for rgw.codfw.dpe.anycast.wmnet - cmooney@cumin1003"
  • 11:07 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:06 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:05 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:05 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 11:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 11:04 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:03 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:01 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 10:54 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:51 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:50 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 10:49 effie: starting staged rollout of Excimer to 1.2.5 mw-debug first, mw-api-int next
  • 10:47 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:44 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 10:43 jiji@deploy1003: Locking from deployment [ALL REPOSITORIES]: T397907 - Upgrade Excimer to 1.2.5 in production in progress, blocking deploys
  • 10:42 jiji@deploy1003: Stopping before sync operations
  • 10:26 jiji@deploy1003: Started scap sync-world: T397907 - Upgrade Excimer to 1.2.5 in production
  • 10:23 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup1005.eqiad.wmnet with reason: Maintenance and reboot
  • 10:12 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver2001.codfw.wmnet
  • 10:08 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:08 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetserver2001.codfw.wmnet
  • 10:05 volans@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on debmonitor2003.codfw.wmnet,debmonitor1003.eqiad.wmnet,debmonitor-dev2001.codfw.wmnet with reason: deploy new version
  • 10:00 volans: upgrading production debmonitor-server to the latest v0.6.5
  • 09:39 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2213 weights T398594', diff saved to https://phabricator.wikimedia.org/P78747 and previous config saved to /var/cache/conftool/dbconfig/20250703-093943-fceratto.json
  • 09:36 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2192 to s5 primary T398594', diff saved to https://phabricator.wikimedia.org/P78746 and previous config saved to /var/cache/conftool/dbconfig/20250703-093612-fceratto.json
  • 09:34 federico3: Starting s5 codfw failover from db2213 to db2192 - T398594
  • 09:31 vgutierrez: repooling cp7006
  • 09:30 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7006.magru.wmnet
  • 09:30 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp7006.magru.wmnet
  • 09:25 fceratto@cumin1002: dbctl commit (dc=all): 'Remove db2192 from API/vslow/dump T398594', diff saved to https://phabricator.wikimedia.org/P78745 and previous config saved to /var/cache/conftool/dbconfig/20250703-092522-fceratto.json
  • 09:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s5 T398594
  • 09:21 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup1004.eqiad.wmnet with reason: Maintenance and reboot
  • 09:21 fceratto@cumin1002: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1:00:00 on 22 hosts with reason: Primary switchover s5 T398593
  • 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6003.drmrs.wmnet with OS bookworm
  • 08:54 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1002.eqiad.wmnet
  • 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti6003.drmrs.wmnet with reason: host reimage
  • 08:50 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti6003.drmrs.wmnet with reason: host reimage
  • 08:48 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host krb1002.eqiad.wmnet
  • 08:42 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1048.eqiad.wmnet
  • 08:42 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet
  • 08:37 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet
  • 08:36 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1048.eqiad.wmnet
  • 08:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti6003.drmrs.wmnet with OS bookworm
  • 08:29 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti6003.drmrs.wmnet with reason: reimage
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast6003.wikimedia.org to plain
  • 08:26 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast6003.wikimedia.org to plain
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6001.drmrs.wmnet to plain
  • 08:23 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6001.drmrs.wmnet to plain
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6001.drmrs.wmnet to plain
  • 08:21 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6001.drmrs.wmnet to plain
  • 08:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6001.wikimedia.org to plain
  • 08:18 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6001.wikimedia.org to plain
  • 08:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:14 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.8 refs T392178
  • 08:13 volans: uploaded debmonitor-server,python3-debmonitor_0.6.5 to apt.wikimedia.org bookworm-wikimedia
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install6002.wikimedia.org to plain
  • 08:07 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install6002.wikimedia.org to plain
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus6002.drmrs.wmnet to drbd
  • 07:53 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 07:53 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 07:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repool pc4 T378715', diff saved to https://phabricator.wikimedia.org/P78744 and previous config saved to /var/cache/conftool/dbconfig/20250703-075225-ladsgroup.json
  • 07:52 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 07:51 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 07:50 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 07:49 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:42 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: haproxy testing
  • 07:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 07:39 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Feature: search in response reasons - oblivian@cumin1003"
  • 07:39 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: search in response reasons - oblivian@cumin1003
  • 07:38 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Feature: search in response reasons - oblivian@cumin1003
  • 07:38 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Feature: search in response reasons - oblivian@cumin1003"
  • 07:34 effie: upload php-excimer_1.2.5-1+wmf11u1
  • 07:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for codeFolding: fix folding <ref> (T398430) (duration: 12m 16s)
  • 07:21 ladsgroup@deploy1003: musikanimal, ladsgroup: Continuing with sync
  • 07:18 vgutierrez: depooling cp7006 for requestctl debugging
  • 07:16 ladsgroup@deploy1003: musikanimal, ladsgroup: Backport for codeFolding: fix folding <ref> (T398430) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:14 ladsgroup@deploy1003: Started scap sync-world: Backport for codeFolding: fix folding <ref> (T398430)
  • 07:03 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus6002.drmrs.wmnet to drbd
  • 07:02 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on prometheus6002.drmrs.wmnet with reason: switch disk type back to DRBD
  • 07:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6002.wikimedia.org to drbd
  • 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
  • 06:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
  • 06:47 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6002.wikimedia.org to drbd
  • 06:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6002.drmrs.wmnet to drbd
  • 06:34 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6002.drmrs.wmnet to drbd
  • 03:38 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
  • 03:22 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 03:18 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 03:06 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 01:56 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 01:56 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 01:56 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 01:56 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 01:56 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 01:56 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 01:56 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 01:55 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 01:55 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 01:55 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 01:55 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 01:55 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 01:54 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 01:53 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 01:53 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 01:53 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 00:09 swfrench-wmf: reprepro include php-msgpack_3.0.0-1+wmf11u1 in component/php83 - T398245
  • 00:08 swfrench-wmf: reprepro include php-igbinary_3.2.16-4+wmf11u1 in component/php83 - T398245
  • 00:03 swfrench-wmf: reprepro include php-apcu_5.1.24-1+wmf11u1 in component/php83 - T398245

2025-07-02

  • 23:40 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 23:38 tzatziki: removing 15 files for legal compliance
  • 23:25 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
  • 23:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 23:07 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 23:05 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 23:05 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 23:02 ryankemper: [WDQS] `ryankemper@wdqs2009:~$ sudo systemctl restart prometheus-blazegraph-exporter-wdqs-blazegraph.service`
  • 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:51 dancy@deploy1003: Installation of scap version "4.186.0" completed for 2 hosts
  • 22:49 dancy@deploy1003: Installing scap version "4.186.0" for 2 host(s)
  • 22:49 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 22:40 ryankemper: [WDQS] Restart wdqs-blazegraph on wdqs2009
  • 22:27 zabe@deploy1003: Finished scap sync-world: Backport for ApiQueryCategoryMembers: Use correct index for categorylinks (T385890 T398448) (duration: 09m 12s)
  • 22:21 zabe@deploy1003: zabe: Continuing with sync
  • 22:21 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:20 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 22:19 zabe@deploy1003: zabe: Backport for ApiQueryCategoryMembers: Use correct index for categorylinks (T385890 T398448) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:19 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 22:19 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:17 zabe@deploy1003: Started scap sync-world: Backport for ApiQueryCategoryMembers: Use correct index for categorylinks (T385890 T398448)
  • 22:12 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 22:07 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:02 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:59 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:55 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:49 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:23 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:23 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:22 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:22 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:20 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:20 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:16 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:15 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:14 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:13 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:12 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:11 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:05 krinkle@deploy1003: Finished scap sync-world: Backport for missing.php: Support beta suffix for auth.wikimedia error page (T289318) (duration: 29m 54s)
  • 20:59 krinkle@deploy1003: krinkle: Continuing with sync
  • 20:37 krinkle@deploy1003: krinkle: Backport for missing.php: Support beta suffix for auth.wikimedia error page (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:35 krinkle@deploy1003: Started scap sync-world: Backport for missing.php: Support beta suffix for auth.wikimedia error page (T289318)
  • 20:34 swfrench-wmf: reprepro include dh-php_5.5+wmf11u1 in component/php83 - T398245
  • 20:31 krinkle@deploy1003: Finished scap sync-world: Beta patches Iff58893f, I62b31535, I228d7766a57 (duration: 03m 06s)
  • 20:30 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 20:29 swfrench-wmf: reprepro include php-defaults_94+wmf11u1 in component/php83 - T398245
  • 20:28 krinkle@deploy1003: Started scap sync-world: Beta patches Iff58893f, I62b31535, I228d7766a57
  • 20:10 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 20:06 Krinkle: krinkle@deploy1003:/srv/mediawiki$ git remote rm gerrit -- Fix `jforrester@gerrit.wikimedia.org: Permission denied (publickey).` There were two remotes: $ git remote -v gerrit ssh://jforrester@gerrit origin ssh://gerrit.wikimedia.org:29418
  • 20:06 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 19:47 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 18:42 swfrench-wmf: reprepro include php8.3_8.3.22-1+wmf11u1 in component/php83 - T398245
  • 17:53 swfrench-wmf: reprepro update component/php83 with pcre2 10.42-1~wmf11+1 - T398245
  • 17:47 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2330.codfw.wmnet
  • 17:41 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2330.codfw.wmnet
  • 17:41 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2329.codfw.wmnet
  • 17:36 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2329.codfw.wmnet
  • 17:36 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2328.codfw.wmnet
  • 17:34 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2328.codfw.wmnet
  • 17:34 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2327.codfw.wmnet
  • 17:31 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2327.codfw.wmnet
  • 17:31 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2326.codfw.wmnet
  • 17:29 dzahn@dns1004: END - running authdns-update
  • 17:28 dzahn@dns1004: START - running authdns-update
  • 17:26 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2326.codfw.wmnet
  • 17:26 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2325.codfw.wmnet
  • 17:21 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2325.codfw.wmnet
  • 17:21 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2324.codfw.wmnet
  • 17:15 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2324.codfw.wmnet
  • 17:15 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2323.codfw.wmnet
  • 17:10 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2323.codfw.wmnet
  • 17:10 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2322.codfw.wmnet
  • 17:04 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2322.codfw.wmnet
  • 17:04 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2321.codfw.wmnet
  • 16:58 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2321.codfw.wmnet
  • 16:58 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2320.codfw.wmnet
  • 16:53 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2320.codfw.wmnet
  • 16:53 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2319.codfw.wmnet
  • 16:48 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2319.codfw.wmnet
  • 16:48 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2318.codfw.wmnet
  • 16:47 inflatador: bking@cumin1002 restarting cirrrussearch codfw T397227
  • 16:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 16:43 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 16:43 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2318.codfw.wmnet
  • 16:43 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2317.codfw.wmnet
  • 16:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 16:39 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 16:39 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2317.codfw.wmnet
  • 16:39 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2316.codfw.wmnet
  • 16:33 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2316.codfw.wmnet
  • 16:33 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2315.codfw.wmnet
  • 16:28 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2315.codfw.wmnet
  • 16:28 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2314.codfw.wmnet
  • 16:22 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2314.codfw.wmnet
  • 16:22 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2313.codfw.wmnet
  • 16:17 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2313.codfw.wmnet
  • 16:17 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2312.codfw.wmnet
  • 16:13 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 16:12 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2312.codfw.wmnet
  • 16:12 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2311.codfw.wmnet
  • 16:10 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
  • 16:10 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
  • 16:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 16:06 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2311.codfw.wmnet
  • 16:06 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2310.codfw.wmnet
  • 16:01 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2310.codfw.wmnet
  • 16:01 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2309.codfw.wmnet
  • 15:56 vgutierrez: switch lvs4010 to katran - 10.128.0.11
  • 15:56 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2309.codfw.wmnet
  • 15:56 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2308.codfw.wmnet
  • 15:55 jnuche@deploy1003: Finished scap sync-world: Backport for Rename EventRegistration::$meetingAddress to $address for cache compat (T398413) (duration: 08m 51s)
  • 15:53 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2308.codfw.wmnet
  • 15:53 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2307.codfw.wmnet
  • 15:49 jnuche@deploy1003: jnuche, daimona: Continuing with sync
  • 15:49 jnuche@deploy1003: jnuche, daimona: Backport for Rename EventRegistration::$meetingAddress to $address for cache compat (T398413) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:48 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2307.codfw.wmnet
  • 15:48 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2306.codfw.wmnet
  • 15:47 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs4010.ulsfo.wmnet with reason: katran migration
  • 15:46 jnuche@deploy1003: Started scap sync-world: Backport for Rename EventRegistration::$meetingAddress to $address for cache compat (T398413)
  • 15:42 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2306.codfw.wmnet
  • 15:42 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2305.codfw.wmnet
  • 15:38 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2305.codfw.wmnet
  • 15:38 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2304.codfw.wmnet
  • 15:33 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2304.codfw.wmnet
  • 15:33 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2303.codfw.wmnet
  • 15:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6002.drmrs.wmnet to drbd
  • 15:28 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2303.codfw.wmnet
  • 15:28 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2302.codfw.wmnet
  • 15:22 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2302.codfw.wmnet
  • 15:22 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2301.codfw.wmnet
  • 15:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6002.drmrs.wmnet to drbd
  • 15:17 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2301.codfw.wmnet
  • 15:17 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2300.codfw.wmnet
  • 15:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow6001.drmrs.wmnet to drbd
  • 15:15 vgutierrez: repool cp7006
  • 15:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc2014
  • 15:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host pc2014
  • 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:11 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2300.codfw.wmnet
  • 15:11 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2299.codfw.wmnet
  • 15:11 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:08 dancy@deploy1003: Installation of scap version "4.185.0" completed for 2 hosts
  • 15:06 jiji@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-ro,name=eqiad
  • 15:06 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2299.codfw.wmnet
  • 15:06 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2298.codfw.wmnet
  • 15:06 dancy@deploy1003: Installing scap version "4.185.0" for 2 host(s)
  • 15:05 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow6001.drmrs.wmnet to drbd
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti6002.drmrs.wmnet to cluster drmrs02 and group B13
  • 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti6002.drmrs.wmnet to cluster drmrs02 and group B13
  • 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
  • 15:01 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2298.codfw.wmnet
  • 15:01 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2297.codfw.wmnet
  • 15:00 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:57 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc2014
  • 14:56 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc2014
  • 14:55 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2297.codfw.wmnet
  • 14:55 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2296.codfw.wmnet
  • 14:55 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
  • 14:52 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6002.drmrs.wmnet with OS bookworm
  • 14:50 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2296.codfw.wmnet
  • 14:50 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2295.codfw.wmnet
  • 14:47 jiji@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=mw-api-ext-ro,name=eqiad
  • 14:45 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2295.codfw.wmnet
  • 14:44 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2294.codfw.wmnet
  • 14:42 godog: bounce thanos-store on titan1002
  • 14:40 oblivian@deploy1003: Finished scap sync-world: Backport for Revert "group1: Set categorylinks to read new" (duration: 08m 26s)
  • 14:39 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2294.codfw.wmnet
  • 14:39 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2293.codfw.wmnet
  • 14:39 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@1bb179b]: bump section topics to v1.6.0 (duration: 00m 47s)
  • 14:38 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@1bb179b]: bump section topics to v1.6.0
  • 14:38 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 14:38 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: activate new plugins packages - bking@cumin1002 - T397227
  • 14:36 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:35 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:35 oblivian@deploy1003: zabe, oblivian: Continuing with sync
  • 14:34 oblivian@deploy1003: zabe, oblivian: Backport for Revert "group1: Set categorylinks to read new" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:34 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2293.codfw.wmnet
  • 14:34 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2292.codfw.wmnet
  • 14:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti6002.drmrs.wmnet with reason: host reimage
  • 14:31 oblivian@deploy1003: Started scap sync-world: Backport for Revert "group1: Set categorylinks to read new"
  • 14:31 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1048.eqiad.wmnet
  • 14:31 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1048.eqiad.wmnet
  • 14:30 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:28 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2292.codfw.wmnet
  • 14:28 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2291.codfw.wmnet
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti6002.drmrs.wmnet with reason: host reimage
  • 14:23 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2291.codfw.wmnet
  • 14:23 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2290.codfw.wmnet
  • 14:18 zabe@deploy1003: Finished scap sync-world: retry revert (duration: 04m 27s)
  • 14:18 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2290.codfw.wmnet
  • 14:17 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2289.codfw.wmnet
  • 14:14 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: activate new plugins packages - bking@cumin1002 - T397227
  • 14:14 zabe@deploy1003: Started scap sync-world: retry revert
  • 14:12 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2289.codfw.wmnet
  • 14:12 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2288.codfw.wmnet
  • 14:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti6002.drmrs.wmnet with OS bookworm
  • 14:08 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2288.codfw.wmnet
  • 14:07 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2287.codfw.wmnet
  • 14:06 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti6002.drmrs.wmnet with reason: reimage
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus6002.drmrs.wmnet to plain
  • 14:03 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2287.codfw.wmnet
  • 14:02 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2286.codfw.wmnet
  • 14:01 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus6002.drmrs.wmnet to plain
  • 13:53 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: activate new plugins packages - bking@cumin1002 - T397227
  • 13:53 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: activate new plugins packages - bking@cumin1002 - T397227
  • 13:52 zabe@deploy1003: sync-world aborted: T397912 (duration: 04m 03s)
  • 13:48 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2283.codfw.wmnet
  • 13:41 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2282.codfw.wmnet
  • 13:40 zabe@deploy1003: Started scap sync-world: T397912
  • 13:39 _joe_: repooling cp7006, testing logging improvements
  • 13:37 vgutierrez: switch upload@eqsin to the new upload cert - T394484
  • 13:35 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2282.codfw.wmnet
  • 13:35 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2281.codfw.wmnet
  • 13:30 zabe@deploy1003: zabe: Continuing with sync
  • 13:30 moritzm: failover Ganeti master in drmrs02 to ganeti6004 T382513
  • 13:30 zabe@deploy1003: zabe: Backport for group1: Set categorylinks to read new (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:29 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2281.codfw.wmnet
  • 13:29 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2280.codfw.wmnet
  • 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
  • 13:27 zabe@deploy1003: Started scap sync-world: Backport for group1: Set categorylinks to read new (T397912)
  • 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6002.wikimedia.org to drbd
  • 13:24 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2280.codfw.wmnet
  • 13:24 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2279.codfw.wmnet
  • 13:21 _joe_: depooling cp7006 for testing
  • 13:18 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2279.codfw.wmnet
  • 13:18 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2278.codfw.wmnet
  • 13:18 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6002.wikimedia.org to drbd
  • 13:18 moritzm: installing rsyslog bugfix updates from Bookworm point release
  • 13:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6002.drmrs.wmnet to drbd
  • 13:17 samtar@deploy1003: Finished scap sync-world: Backport for Assign oathauth-verify-user to default bureaucrat (T265726), Add abusefilter-revert to sysops on testwiki (T398107) (duration: 11m 16s)
  • 13:13 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2278.codfw.wmnet
  • 13:13 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2277.codfw.wmnet
  • 13:13 jgreen@dns1004: END - running authdns-update
  • 13:11 jgreen@dns1004: START - running authdns-update
  • 13:11 samtar@deploy1003: samtar, eggroll97: Continuing with sync
  • 13:08 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6002.drmrs.wmnet to drbd
  • 13:08 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2277.codfw.wmnet
  • 13:08 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2276.codfw.wmnet
  • 13:08 samtar@deploy1003: samtar, eggroll97: Backport for Assign oathauth-verify-user to default bureaucrat (T265726), Add abusefilter-revert to sysops on testwiki (T398107) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6002.drmrs.wmnet to drbd
  • 13:05 samtar@deploy1003: Started scap sync-world: Backport for Assign oathauth-verify-user to default bureaucrat (T265726), Add abusefilter-revert to sysops on testwiki (T398107)
  • 13:02 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2276.codfw.wmnet
  • 13:02 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2275.codfw.wmnet
  • 12:58 urbanecm@deploy1003: Finished scap sync-world: Backport for [Growth] Move Impact limit configuration to ext-GrowthExperiments (T341599), [Growth] enwiki: Decrease wgGEUserImpactMaxEdits to 1000 (T398418 T341599) (duration: 09m 42s)
  • 12:57 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2275.codfw.wmnet
  • 12:57 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2274.codfw.wmnet
  • 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6002.drmrs.wmnet to drbd
  • 12:52 urbanecm@deploy1003: urbanecm: Continuing with sync
  • 12:52 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2274.codfw.wmnet
  • 12:52 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2273.codfw.wmnet
  • 12:51 urbanecm@deploy1003: urbanecm: Backport for [Growth] Move Impact limit configuration to ext-GrowthExperiments (T341599), [Growth] enwiki: Decrease wgGEUserImpactMaxEdits to 1000 (T398418 T341599) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:49 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] Move Impact limit configuration to ext-GrowthExperiments (T341599), [Growth] enwiki: Decrease wgGEUserImpactMaxEdits to 1000 (T398418 T341599)
  • 12:47 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2273.codfw.wmnet
  • 12:47 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2272.codfw.wmnet
  • 12:41 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2272.codfw.wmnet
  • 12:41 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2271.codfw.wmnet
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus6002.drmrs.wmnet to drbd
  • 12:36 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2271.codfw.wmnet
  • 12:36 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2270.codfw.wmnet
  • 12:30 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2270.codfw.wmnet
  • 12:30 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2269.codfw.wmnet
  • 12:25 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2269.codfw.wmnet
  • 12:25 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2268.codfw.wmnet
  • 12:20 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2268.codfw.wmnet
  • 12:20 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2267.codfw.wmnet
  • 12:14 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2267.codfw.wmnet
  • 12:14 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2266.codfw.wmnet
  • 12:14 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 12:10 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 12:10 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 12:09 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2266.codfw.wmnet
  • 12:09 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2265.codfw.wmnet
  • 12:08 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:08 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:07 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 12:06 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:04 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2265.codfw.wmnet
  • 12:04 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2264.codfw.wmnet
  • 11:58 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2264.codfw.wmnet
  • 11:58 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2263.codfw.wmnet
  • 11:53 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2263.codfw.wmnet
  • 11:52 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2262.codfw.wmnet
  • 11:47 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2262.codfw.wmnet
  • 11:47 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2261.codfw.wmnet
  • 11:47 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 11:42 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 11:42 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2261.codfw.wmnet
  • 11:42 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2260.codfw.wmnet
  • 11:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet
  • 11:38 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus6002.drmrs.wmnet to drbd
  • 11:37 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2260.codfw.wmnet
  • 11:37 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2259.codfw.wmnet
  • 11:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 37271
  • 11:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 37271
  • 11:33 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
  • 11:31 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2259.codfw.wmnet
  • 11:31 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2258.codfw.wmnet
  • 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow6001.drmrs.wmnet to drbd
  • 11:26 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2258.codfw.wmnet
  • 11:26 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2257.codfw.wmnet
  • 11:21 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2257.codfw.wmnet
  • 11:20 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2256.codfw.wmnet
  • 11:19 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow6001.drmrs.wmnet to drbd
  • 11:16 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.changedisk (exit_code=99) for changing disk type of netflow6001.drmrs.wmnet to drbd
  • 11:16 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow6001.drmrs.wmnet to drbd
  • 11:15 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2256.codfw.wmnet
  • 11:15 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2255.codfw.wmnet
  • 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti6004.drmrs.wmnet to cluster drmrs02 and group B13
  • 11:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti6004.drmrs.wmnet to cluster drmrs02 and group B13
  • 11:10 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2255.codfw.wmnet
  • 11:09 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2254.codfw.wmnet
  • 11:04 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2254.codfw.wmnet
  • 11:04 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2253.codfw.wmnet
  • 11:00 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2253.codfw.wmnet
  • 11:00 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2252.codfw.wmnet
  • 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 10:55 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2252.codfw.wmnet
  • 10:55 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2251.codfw.wmnet
  • 10:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 10:50 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2251.codfw.wmnet
  • 10:50 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
  • 10:49 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2250.codfw.wmnet
  • 10:49 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
  • 10:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37271
  • 10:48 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 37271
  • 10:47 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:47 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:47 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 137236
  • 10:47 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:46 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 137236
  • 10:44 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2250.codfw.wmnet
  • 10:44 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2249.codfw.wmnet
  • 10:44 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:43 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:43 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 10:42 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:40 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti6004.drmrs.wmnet with OS bookworm
  • 10:39 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2249.codfw.wmnet
  • 10:39 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker2248.codfw.wmnet
  • 10:35 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 10:35 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 10:33 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker2248.codfw.wmnet
  • 10:33 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:32 klausman@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:30 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
  • 10:28 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:28 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 10:28 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 10:27 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 10:27 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 10:26 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 10:21 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1092.eqiad.wmnet with OS bullseye
  • 10:21 mvernon@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1003"
  • 10:21 mvernon@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1003"
  • 10:18 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1093.eqiad.wmnet with OS bullseye
  • 10:18 mvernon@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1003"
  • 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti6004.drmrs.wmnet with reason: host reimage
  • 10:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti6004.drmrs.wmnet with reason: host reimage
  • 10:13 mvernon@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1003"
  • 10:08 kharlan@deploy1003: Finished scap sync-world: Backport for UserInfoCard: prevent default link behavior with "click" (T398323) (duration: 09m 52s)
  • 10:04 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts backup1001.eqiad.wmnet
  • 10:04 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:04 jynus@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: backup1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 10:04 jynus@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: backup1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 10:03 kharlan@deploy1003: kharlan: Continuing with sync
  • 10:02 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1092.eqiad.wmnet with reason: host reimage
  • 10:01 kharlan@deploy1003: kharlan: Backport for UserInfoCard: prevent default link behavior with "click" (T398323) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:00 jynus@cumin1002: START - Cookbook sre.dns.netbox
  • 09:58 kharlan@deploy1003: Started scap sync-world: Backport for UserInfoCard: prevent default link behavior with "click" (T398323)
  • 09:57 mvernon@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1092.eqiad.wmnet with reason: host reimage
  • 09:55 jynus@cumin1002: START - Cookbook sre.hosts.decommission for hosts backup1001.eqiad.wmnet
  • 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti6004.drmrs.wmnet with OS bookworm
  • 09:54 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1093.eqiad.wmnet with reason: host reimage
  • 09:53 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti6004.drmrs.wmnet with reason: reimage
  • 09:50 mvernon@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1093.eqiad.wmnet with reason: host reimage
  • 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh6002.wikimedia.org to plain
  • 09:49 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh6002.wikimedia.org to plain
  • 09:49 vgutierrez: acme-chief: stop issuing RSA certificates by default - T398020
  • 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum6002.drmrs.wmnet to plain
  • 09:47 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts backup2001.codfw.wmnet
  • 09:47 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:47 jynus@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: backup2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 09:47 jynus@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: backup2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 09:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum6002.drmrs.wmnet to plain
  • 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus6002.drmrs.wmnet to plain
  • 09:45 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus6002.drmrs.wmnet to plain
  • 09:44 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Bugfixes: api auth and bwlimit rules - oblivian@cumin1003"
  • 09:44 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes: api auth and bwlimit rules - oblivian@cumin1003
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir6002.drmrs.wmnet to plain
  • 09:43 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes: api auth and bwlimit rules - oblivian@cumin1003
  • 09:43 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Bugfixes: api auth and bwlimit rules - oblivian@cumin1003"
  • 09:42 jynus@cumin1002: START - Cookbook sre.dns.netbox
  • 09:42 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir6002.drmrs.wmnet to plain
  • 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow6001.drmrs.wmnet to plain
  • 09:39 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow6001.drmrs.wmnet to plain
  • 09:37 jynus@cumin1002: START - Cookbook sre.hosts.decommission for hosts backup2001.codfw.wmnet
  • 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
  • 09:36 zabe@deploy1003: Finished scap sync-world: Backport for Reapply "categorylinks: Set group0 to read new" (T397912) (duration: 10m 15s)
  • 09:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
  • 09:30 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1092.eqiad.wmnet with OS bullseye
  • 09:29 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1093.eqiad.wmnet with OS bullseye
  • 09:28 zabe@deploy1003: zabe: Continuing with sync
  • 09:27 zabe@deploy1003: zabe: Backport for Reapply "categorylinks: Set group0 to read new" (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:25 zabe@deploy1003: Started scap sync-world: Backport for Reapply "categorylinks: Set group0 to read new" (T397912)
  • 09:23 zabe@deploy1003: Finished scap sync-world: Backport for Fix categorylinks join order and use index on correct table (T398380) (duration: 08m 26s)
  • 09:18 zabe@deploy1003: zabe: Continuing with sync
  • 09:17 zabe@deploy1003: zabe: Backport for Fix categorylinks join order and use index on correct table (T398380) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:15 zabe@deploy1003: Started scap sync-world: Backport for Fix categorylinks join order and use index on correct table (T398380)
  • 09:06 volans: uploaded debmonitor-server,python3-debmonitor_0.6.4 to apt.wikimedia.org bookworm-wikimedia
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
  • 09:06 jmm@dns1004: END - running authdns-update
  • 09:05 jmm@dns1004: START - running authdns-update
  • 09:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
  • 09:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti3006.esams.wmnet to cluster esams02 and group BW27
  • 09:03 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3006.esams.wmnet to cluster esams02 and group BW27
  • 09:01 moritzm: rebalance ganeti/eqsin following Bookworm reimages
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5007.eqsin.wmnet to cluster eqsin and group 1
  • 08:56 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5007.eqsin.wmnet to cluster eqsin and group 1
  • 08:53 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
  • 08:43 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 08:34 jmm@dns1004: END - running authdns-update
  • 08:33 jmm@dns1004: START - running authdns-update
  • 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5007.eqsin.wmnet with OS bookworm
  • 08:28 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver2004.codfw.wmnet
  • 08:20 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetserver2004.codfw.wmnet
  • 08:16 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.8 refs T392178
  • 08:10 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver1003.eqiad.wmnet
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5007.eqsin.wmnet with reason: host reimage
  • 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5007.eqsin.wmnet with reason: host reimage
  • 08:02 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetserver1003.eqiad.wmnet
  • 07:50 jmm@dns1004: END - running authdns-update
  • 07:49 jmm@dns1004: START - running authdns-update
  • 07:40 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5007.eqsin.wmnet with OS bookworm
  • 07:38 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5007.eqsin.wmnet with reason: reimage
  • 06:29 Amir1: dropping l10n_cache table everywhere (T397367)
  • 06:28 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on pc2014.codfw.wmnet,pc1014.eqiad.wmnet with reason: Switch to 10G (T378715)
  • 06:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool pc4 T378715', diff saved to https://phabricator.wikimedia.org/P78735 and previous config saved to /var/cache/conftool/dbconfig/20250702-061517-ladsgroup.json
  • 06:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300
  • 06:02 slyngshede@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300
  • 05:58 slyngshede@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300
  • 05:57 slyngshede@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.11 to netbox-next - slyngshede@cumin1002 - T397300
  • 02:50 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS bullseye
  • 02:32 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 02:28 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 02:12 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bullseye
  • 00:53 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm

2025-07-01

  • 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1093.eqiad.wmnet with reason: host reimage
  • 23:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1093.eqiad.wmnet with reason: host reimage
  • 23:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1092.eqiad.wmnet with reason: host reimage
  • 23:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1092.eqiad.wmnet with reason: host reimage
  • 23:19 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1054.eqiad.wmnet with OS bookworm
  • 23:16 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 23:08 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1093.eqiad.wmnet with OS bullseye
  • 23:03 zabe@deploy1003: Finished scap sync-world: Backport for Revert "categorylinks: Set group0 to read new" (T397912 T398380) (duration: 08m 49s)
  • 22:58 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS bullseye
  • 22:57 zabe@deploy1003: zabe: Continuing with sync
  • 22:57 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 22:57 zabe@deploy1003: zabe: Backport for Revert "categorylinks: Set group0 to read new" (T397912 T398380) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1092.eqiad.wmnet with OS bullseye
  • 22:54 zabe@deploy1003: Started scap sync-world: Backport for Revert "categorylinks: Set group0 to read new" (T397912 T398380)
  • 22:54 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:54 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sync - dzahn@cumin1002"
  • 22:54 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sync - dzahn@cumin1002"
  • 22:54 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:53 zabe@deploy1003: Finished scap sync-world: Backport for categorylinks: Set group0 to read new (T397912) (duration: 08m 40s)
  • 22:49 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:48 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 22:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1093.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:47 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 22:47 zabe@deploy1003: zabe: Continuing with sync
  • 22:46 zabe@deploy1003: zabe: Backport for categorylinks: Set group0 to read new (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:45 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts miscweb1003.eqiad.wmnet
  • 22:45 dzahn@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 22:44 zabe@deploy1003: Started scap sync-world: Backport for categorylinks: Set group0 to read new (T397912)
  • 22:44 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:44 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 22:36 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 22:35 toyofuku@deploy1003: Finished scap sync-world: Backport for Update mobile search overlay temporary input styles (duration: 29m 56s)
  • 22:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1092.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:34 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2001.codfw.wmnet with OS bookworm
  • 22:31 dzahn@cumin1002: START - Cookbook sre.hosts.decommission for hosts miscweb1003.eqiad.wmnet
  • 22:30 toyofuku@deploy1003: bwang, toyofuku: Continuing with sync
  • 22:28 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:28 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts miscweb2003.codfw.wmnet
  • 22:28 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:28 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: miscweb2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1002"
  • 22:28 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: miscweb2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1002"
  • 22:26 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 22:23 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 22:22 ejegg: payments-wiki upgraded from a92f03c3 to 9c7f3a73
  • 22:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-be1092.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-be1093.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:18 dzahn@cumin1002: START - Cookbook sre.hosts.decommission for hosts miscweb2003.codfw.wmnet
  • 22:17 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:17 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns ms-be1092,934 - jclark@cumin1002"
  • 22:17 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns ms-be1092,934 - jclark@cumin1002"
  • 22:14 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 22:10 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 22:10 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 22:09 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 22:07 toyofuku@deploy1003: bwang, toyofuku: Backport for Update mobile search overlay temporary input styles synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:06 ejegg: fundraising scheduled jobs restarted
  • 22:05 toyofuku@deploy1003: Started scap sync-world: Backport for Update mobile search overlay temporary input styles
  • 22:04 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 22:02 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on miscweb2003.codfw.wmnet with reason: decom
  • 22:01 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: decom
  • 21:59 toyofuku@deploy1003: Finished scap sync-world: Backport for Enable mobile search recommendations in all eligible wikis except enwiki (duration: 10m 10s)
  • 21:59 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1054.eqiad.wmnet with OS bookworm
  • 21:56 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 21:55 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1053.eqiad.wmnet with OS bookworm
  • 21:55 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
  • 21:54 toyofuku@deploy1003: toyofuku, bwang: Continuing with sync
  • 21:51 toyofuku@deploy1003: toyofuku, bwang: Backport for Enable mobile search recommendations in all eligible wikis except enwiki synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:51 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:51 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:50 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:49 toyofuku@deploy1003: Started scap sync-world: Backport for Enable mobile search recommendations in all eligible wikis except enwiki
  • 21:48 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:45 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 21:25 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 21:06 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 21:03 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 20:48 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:46 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1005.eqiad.wmnet with OS bullseye
  • 20:45 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:45 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 20:41 cjming@deploy1003: Finished scap sync-world: Backport for zhwiki: Permissions change for abusefilter groups (T397788) (duration: 10m 35s)
  • 20:39 ejegg: fundraising civicrm upgraded from 5ae93148 to 521d0dbe
  • 20:36 cjming@deploy1003: zhaofjx, cjming: Continuing with sync
  • 20:33 cjming@deploy1003: zhaofjx, cjming: Backport for zhwiki: Permissions change for abusefilter groups (T397788) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:31 cjming@deploy1003: Started scap sync-world: Backport for zhwiki: Permissions change for abusefilter groups (T397788)
  • 20:26 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:26 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:24 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1005.eqiad.wmnet with reason: host reimage
  • 20:20 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1005.eqiad.wmnet with reason: host reimage
  • 20:20 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:04 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore1005.eqiad.wmnet with OS bullseye
  • 20:04 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore1005.eqiad.wmnet with OS bullseye
  • 20:03 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2001.codfw.wmnet with reason: T383173
  • 20:02 ejegg: disabled queue consumers for segment updates
  • 19:50 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:50 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:47 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host sessionstore1005.eqiad.wmnet with OS bullseye
  • 19:43 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 19:42 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:37 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:26 kemayo@deploy1003: Finished scap sync-world: Backport for Edit check: fix counter logging for SLO (T395444) (duration: 09m 07s)
  • 19:23 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 19:20 kemayo@deploy1003: kemayo: Continuing with sync
  • 19:20 andrew@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2004-dev.codfw.wmnet with reason: host reimage
  • 19:19 kemayo@deploy1003: kemayo: Backport for Edit check: fix counter logging for SLO (T395444) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:17 kemayo@deploy1003: Started scap sync-world: Backport for Edit check: fix counter logging for SLO (T395444)
  • 19:00 andrew@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephmon2004-dev.codfw.wmnet with OS bookworm
  • 19:00 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2001.codfw.wmnet with reason: T383173
  • 17:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
  • 16:56 andrew@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd2003-dev.codfw.wmnet
  • 16:56 andrew@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:56 andrew@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd2003-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1003"
  • 16:55 andrew@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd2003-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1003"
  • 16:51 andrew@cumin1003: START - Cookbook sre.dns.netbox
  • 16:45 andrew@cumin1003: START - Cookbook sre.hosts.decommission for hosts cloudcephosd2003-dev.codfw.wmnet
  • 16:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repool pc3 T378715', diff saved to https://phabricator.wikimedia.org/P78734 and previous config saved to /var/cache/conftool/dbconfig/20250701-164405-ladsgroup.json
  • 16:37 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 16:37 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 16:13 swfrench@deploy1003: Finished scap sync-world: Backport for Remove title-case overrides for PHP 8.1 migration (T394556) (duration: 09m 21s)
  • 16:11 inflatador: bking@prometheus1005:~$ sudo run-puppet-agent T398341
  • 16:10 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 16:07 swfrench@deploy1003: swfrench: Continuing with sync
  • 16:06 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc2013
  • 16:06 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pc2013
  • 16:06 swfrench@deploy1003: swfrench: Backport for Remove title-case overrides for PHP 8.1 migration (T394556) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:04 swfrench@deploy1003: Started scap sync-world: Backport for Remove title-case overrides for PHP 8.1 migration (T394556)
  • 16:01 swfrench-wmf: finished page renames for Unicode title-case transition - T396903
  • 15:54 swfrench-wmf: starting page renames for Unicode title-case transition - T396903
  • 15:51 swfrench-wmf: renamed 1 user for Unicode title-case transition - T396903
  • 15:44 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7003.magru.wmnet
  • 15:44 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7003.magru.wmnet
  • 15:37 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
  • 15:37 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
  • 15:35 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs7003.magru.wmnet with reason: katran migration
  • 15:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
  • 15:25 ejegg: SmashPig upgraded from 8486f9fb to 52397453
  • 15:21 ejegg: SmashPig upgraded from bdc59e01 to 8486f9fb
  • 15:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
  • 15:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
  • 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1001.eqiad.wmnet
  • 15:10 brennen@deploy1003: Finished deploy [phabricator/deployment@311587a]: deploy phab1004 for T398328 (duration: 00m 37s)
  • 15:09 brennen@deploy1003: Started deploy [phabricator/deployment@311587a]: deploy phab1004 for T398328
  • 15:09 brennen@deploy1003: Finished deploy [phabricator/deployment@311587a]: deploy phab2002 for T398328 (duration: 00m 41s)
  • 15:08 brennen@deploy1003: Started deploy [phabricator/deployment@311587a]: deploy phab2002 for T398328
  • 15:08 ejegg: standalone SmashPig upgraded from ad4baa32 to bdc59e01
  • 15:08 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
  • 15:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-wf1001.eqiad.wmnet
  • 15:04 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:02 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:02 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:01 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:00 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 14:57 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 14:55 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 14:54 moritzm: failover Ganeti master in eqsin to ganeti5004
  • 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5006.eqsin.wmnet to cluster eqsin and group 1
  • 14:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
  • 14:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5006.eqsin.wmnet to cluster eqsin and group 1
  • 14:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
  • 14:26 cgoubert@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 14:25 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
  • 13:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5006.eqsin.wmnet with OS bookworm
  • 13:51 cgoubert@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 13:51 zabe@deploy1003: Finished scap sync-world: Backport for categorylinks: Set testwiki to read new (T397912) (duration: 09m 44s)
  • 13:45 zabe@deploy1003: zabe: Continuing with sync
  • 13:44 zabe@deploy1003: zabe: Backport for categorylinks: Set testwiki to read new (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:43 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:41 zabe@deploy1003: Started scap sync-world: Backport for categorylinks: Set testwiki to read new (T397912)
  • 13:40 jelto@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 13:39 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 13:37 jelto@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 13:36 jelto@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:35 jelto@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:29 urbanecm@deploy1003: Finished scap sync-world: Backport for Growth: Configure higher impact module edit limits for english and test wiki (T341599) (duration: 19m 10s)
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5006.eqsin.wmnet with reason: host reimage
  • 13:23 urbanecm@deploy1003: urbanecm, cyndywikime: Continuing with sync
  • 13:21 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5006.eqsin.wmnet with reason: host reimage
  • 13:13 urbanecm@deploy1003: urbanecm, cyndywikime: Backport for Growth: Configure higher impact module edit limits for english and test wiki (T341599) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:12 jmm@dns1004: END - running authdns-update
  • 13:11 jmm@dns1004: START - running authdns-update
  • 13:10 urbanecm@deploy1003: Started scap sync-world: Backport for Growth: Configure higher impact module edit limits for english and test wiki (T341599)
  • 12:59 XioNoX: setup BGP to Paylb on pfw1-eqiad - T397865
  • 12:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5006.eqsin.wmnet with OS bookworm
  • 12:57 jmm@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5006.eqsin.wmnet with reason: reimage
  • 12:53 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:51 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:49 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
  • 12:48 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
  • 12:45 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl1004.eqiad.wmnet
  • 12:39 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl1004.eqiad.wmnet
  • 12:39 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl1003.eqiad.wmnet
  • 12:38 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver1002.eqiad.wmnet
  • 12:38 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:38 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:35 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 12:34 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:34 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:33 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:32 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:32 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetserver1002.eqiad.wmnet
  • 12:31 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl1003.eqiad.wmnet
  • 12:31 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:31 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:29 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl1002.eqiad.wmnet
  • 12:23 jmm@dns1004: END - running authdns-update
  • 12:22 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:22 jmm@dns1004: START - running authdns-update
  • 12:21 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl1002.eqiad.wmnet
  • 12:21 moritzm: installing libcap2 security updates
  • 12:20 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
  • 12:15 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver2002.codfw.wmnet
  • 12:13 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl1001.eqiad.wmnet
  • 12:08 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetserver2002.codfw.wmnet
  • 12:07 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2005.codfw.wmnet
  • 12:02 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2005.codfw.wmnet
  • 12:00 moritzm: manually clean out external_cloud_vendors directory on puppet 5 frontends to fix Puppet runs
  • 11:59 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2004.codfw.wmnet
  • 11:54 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2004.codfw.wmnet
  • 11:53 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2003.codfw.wmnet
  • 11:47 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:47 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:46 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 11:45 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2003.codfw.wmnet
  • 11:45 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 11:43 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2002.codfw.wmnet
  • 11:37 jmm@dns1004: END - running authdns-update
  • 11:36 jmm@dns1004: START - running authdns-update
  • 11:35 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2002.codfw.wmnet
  • 11:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
  • 11:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
  • 11:08 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2005.codfw.wmnet
  • 11:01 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp2005.codfw.wmnet
  • 11:01 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet
  • 10:58 root@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2001.codfw.wmnet
  • 10:54 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
  • 10:50 root@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2001.codfw.wmnet
  • 10:39 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1006.eqiad.wmnet
  • 10:33 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup2007.codfw.wmnet: Renew puppet certificate - jynus@cumin1002
  • 10:32 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-gp1006.eqiad.wmnet
  • 10:27 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2050.codfw.wmnet to cluster codfw and group B
  • 10:26 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti2050.codfw.wmnet to cluster codfw and group B
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2050.codfw.wmnet
  • 10:19 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts idp-test2004.wikimedia.org
  • 10:19 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:18 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 10:17 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:17 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: idp-test2004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
  • 10:17 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2050.codfw.wmnet
  • 10:11 jmm@cumin1003: START - Cookbook sre.dns.netbox
  • 10:08 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on pc2013.codfw.wmnet,pc1013.eqiad.wmnet with reason: Switch to 10G (T378715)
  • 10:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool pc3 T378715', diff saved to https://phabricator.wikimedia.org/P78729 and previous config saved to /var/cache/conftool/dbconfig/20250701-100729-ladsgroup.json
  • 10:06 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts idp-test2004.wikimedia.org
  • 09:59 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2007.codfw.wmnet with reason: Maintenance and reboot
  • 09:57 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup2006.codfw.wmnet: Renew puppet certificate - jynus@cumin1002
  • 09:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
  • 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
  • 09:49 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti5005.eqsin.wmnet to cluster eqsin and group 1
  • 09:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
  • 09:33 hashar@deploy1003: Finished deploy [gerrit/gerrit@4e671a0]: Remove all references to patchdemo legacy - T391866 (duration: 00m 12s)
  • 09:32 hashar@deploy1003: Started deploy [gerrit/gerrit@4e671a0]: Remove all references to patchdemo legacy - T391866
  • 09:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
  • 09:25 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
  • 09:25 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
  • 09:21 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2006.codfw.wmnet with reason: Maintenance and reboot
  • 09:17 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup2005.codfw.wmnet: Renew puppet certificate - jynus@cumin1002
  • 09:11 kharlan@deploy1003: Finished scap sync-world: Backport for UserInfoCard: Fix opt-in to temporary account label display (T395661), UserInfoCard can unintentionally render information for more than one user (duration: 09m 15s)
  • 09:05 kharlan@deploy1003: kharlan: Continuing with sync
  • 09:04 kharlan@deploy1003: kharlan: Backport for UserInfoCard: Fix opt-in to temporary account label display (T395661), UserInfoCard can unintentionally render information for more than one user synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:02 kharlan@deploy1003: Started scap sync-world: Backport for UserInfoCard: Fix opt-in to temporary account label display (T395661), UserInfoCard can unintentionally render information for more than one user
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti5005.eqsin.wmnet with OS bookworm
  • 08:55 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 08:55 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 08:54 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 08:53 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 08:44 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 08:44 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2005.codfw.wmnet with reason: Maintenance and reboot
  • 08:42 jynus@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for backup2004.codfw.wmnet: Renew puppet certificate - jynus@cumin1002
  • 08:38 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti5005.eqsin.wmnet with reason: host reimage
  • 08:34 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 08:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti5005.eqsin.wmnet with reason: host reimage
  • 08:12 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.8 refs T392178
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti5005.eqsin.wmnet with OS bookworm
  • 08:08 moritzm: installing sudo security updates
  • 08:07 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2050.codfw.wmnet with OS bookworm
  • 07:58 urbanecm: Manually start a Growth cron job via `kubectl create job growthexperiments-deleteoldsurveys-$(date +"%Y%m%d%H%M") --from=cronjobs/growthexperiments-deleteoldsurveys` to verify whether a recent failure is permanent
  • 07:55 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Corvus out of all services on: 2396 hosts
  • 07:54 vgutierrez: switching upload@ulsfo to upload TLS certificate - T394484
  • 07:52 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2050.codfw.wmnet with reason: host reimage
  • 07:48 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2050.codfw.wmnet with reason: host reimage
  • 07:43 urbanecm@deploy1003: Finished scap sync-world: Backport for nlwiki: add VRT agent user group (T398216) (duration: 12m 04s)
  • 07:43 vgutierrez@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet
  • 07:43 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2004.codfw.wmnet with reason: Maintenance and reboot
  • 07:38 vgutierrez@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4045.ulsfo.wmnet
  • 07:38 urbanecm@deploy1003: urbanecm, daniuu: Continuing with sync
  • 07:37 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti5005.eqsin.wmnet with reason: reimage
  • 07:36 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti2050.codfw.wmnet with OS bookworm
  • 07:33 urbanecm@deploy1003: urbanecm, daniuu: Backport for nlwiki: add VRT agent user group (T398216) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:31 urbanecm@deploy1003: Started scap sync-world: Backport for nlwiki: add VRT agent user group (T398216)
  • 07:16 kartik@deploy1003: Finished scap sync-world: Backport for Remove cxstats campaign (T393705) (duration: 14m 17s)
  • 07:09 kartik@deploy1003: kartik: Continuing with sync
  • 07:06 kartik@deploy1003: kartik: Backport for Remove cxstats campaign (T393705) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:02 kartik@deploy1003: Started scap sync-world: Backport for Remove cxstats campaign (T393705)
  • 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.5 (duration: 01m 38s)
  • 03:58 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.8 refs T392178 (duration: 55m 48s)
  • 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.8 refs T392178
  • 02:13 ejegg: payments-wiki upgraded from 52f6940f to a92f03c3
  • 01:46 ejegg: fundraising civicrm upgraded from e35d3778 to 5ae93148
  • 00:20 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 00:19 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 00:19 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 00:18 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 00:01 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply


Other archives

2000s

2010s

2020-2024

2025-present