Server Admin Log/Archive 76

From Wikitech


2024-02-29

  • 22:37 foks: removing 4 files for legal compliance
  • 22:02 jdrewniak@deploy2002: Finished scap: Backport for Default to day mode (T358811) (duration: 10m 40s)
  • 21:57 mutante: phabricator - added STran to WMF-NDA (group 61) - T355388
  • 21:54 jdrewniak@deploy2002: jdlrobson and jdrewniak: Continuing with sync
  • 21:52 jdrewniak@deploy2002: jdlrobson and jdrewniak: Backport for Default to day mode (T358811) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:51 jdrewniak@deploy2002: Started scap: Backport for Default to day mode (T358811)
  • 21:50 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 23s)
  • 21:42 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 40s)
  • 21:31 mutante: phabricator - added Fring to WMF-NDA (group 61) - T358578
  • 21:29 mutante: phabricator - added Ifeatu_Nnaobi_WMDE to WMF-NDA (group 61) - T358578
  • 21:27 eileen: * civicrm upgraded from aeffaf88 to dd378ea1
  • 21:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2114 (T352010)', diff saved to https://phabricator.wikimedia.org/P58268 and previous config saved to /var/cache/conftool/dbconfig/20240229-212602-ladsgroup.json
  • 21:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 21:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 21:24 mutante: LDAP - added uid ifeatunnaobiwmde (46162) to groups nda and wmde (T358091)
  • 21:23 mutante: LDAP - added uid member: uid=ifeatunnaobiwmde,ou=people,dc=wikimedia,dc=org
  • 21:13 jdrewniak@deploy2002: Finished scap: Backport for Performance Impact Assessment for Night Mode Style Correction (T358240) (duration: 09m 28s)
  • 21:05 jdrewniak@deploy2002: mabualruz and jdrewniak: Continuing with sync
  • 21:05 jdrewniak@deploy2002: mabualruz and jdrewniak: Backport for Performance Impact Assessment for Night Mode Style Correction (T358240) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:04 jdrewniak@deploy2002: Started scap: Backport for Performance Impact Assessment for Night Mode Style Correction (T358240)
  • 20:52 mutante: LDAP - added uid frri (43019) to groups nda and wmde (T358584
  • 20:47 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2012.codfw.wmnet with OS bullseye
  • 20:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P58267 and previous config saved to /var/cache/conftool/dbconfig/20240229-202158-root.json
  • 20:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P58266 and previous config saved to /var/cache/conftool/dbconfig/20240229-200653-root.json
  • 19:58 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
  • 19:58 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
  • 19:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P58265 and previous config saved to /var/cache/conftool/dbconfig/20240229-195148-root.json
  • 19:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P58264 and previous config saved to /var/cache/conftool/dbconfig/20240229-193643-root.json
  • 19:35 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.20 refs T354438
  • 19:35 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 19:32 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: host reimage
  • 19:14 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host lvs2012.codfw.wmnet with OS bullseye
  • 19:07 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:07 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for lvs2012 - cmooney@cumin1002"
  • 19:06 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lvs2012.codfw.wmnet on all recursors
  • 19:06 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache lvs2012.codfw.wmnet on all recursors
  • 19:06 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for lvs2012 - cmooney@cumin1002"
  • 19:03 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 18:59 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=(recdns|ntp)
  • 18:58 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org,service=recdns
  • 18:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org,service=ntp
  • 18:40 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr[1-2]-codfw with reason: lvs moves to per-rack vlans
  • 18:40 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr[1-2]-codfw with reason: lvs moves to per-rack vlans
  • 18:37 topranks: disabling PyBal on lvs2012 to move traffic to lvs2014 ahead of reimage T352918
  • 18:13 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: Moving lvs2012 primary interface from private1-b-codfw to private1-b2-codfw
  • 18:13 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2012.codfw.wmnet with reason: Moving lvs2012 primary interface from private1-b-codfw to private1-b2-codfw
  • 18:12 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:12 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:12 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:11 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:11 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:10 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'sretest1001.eqiad.wmnet$' on ulsfo recursors
  • 18:10 volans@cumin1002: START - Cookbook sre.dns.wipe-cache 'sretest1001.eqiad.wmnet$' on ulsfo recursors
  • 18:06 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1001.eqiad.wmnet sretest1002.eqiad.wmnet on all recursors
  • 18:05 volans@cumin1002: START - Cookbook sre.dns.wipe-cache sretest1001.eqiad.wmnet sretest1002.eqiad.wmnet on all recursors
  • 16:53 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1173.eqiad.wmnet with reason: Investigating disk errors
  • 16:53 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1173.eqiad.wmnet with reason: Investigating disk errors
  • 16:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 16:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 16:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 16:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 16:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 16:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 16:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 16:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 16:40 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
  • 16:40 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
  • 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 100%: Excersie over', diff saved to https://phabricator.wikimedia.org/P58262 and previous config saved to /var/cache/conftool/dbconfig/20240229-163459-root.json
  • 16:27 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
  • 16:26 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
  • 16:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 75%: Excersie over', diff saved to https://phabricator.wikimedia.org/P58261 and previous config saved to /var/cache/conftool/dbconfig/20240229-161954-root.json
  • 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P58260 and previous config saved to /var/cache/conftool/dbconfig/20240229-161629-root.json
  • 16:12 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
  • 16:12 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
  • 16:10 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
  • 16:09 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
  • 16:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 16:05 topranks: Commencing network maintenance migrating servers to new switch codfw rack B7 T355872
  • 16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 25%: Excersie over', diff saved to https://phabricator.wikimedia.org/P58259 and previous config saved to /var/cache/conftool/dbconfig/20240229-160449-root.json
  • 16:02 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 9 hosts with reason: Migrating servers in codfw rack B7 to lsw1-b7-codfw
  • 16:02 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 9 hosts with reason: Migrating servers in codfw rack B7 to lsw1-b7-codfw
  • 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-b-codfw,cr[1-2]-codfw,lsw1-b7-codfw with reason: prepping for server uplink migration codfw rack b7
  • 16:01 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on asw-b-codfw,cr[1-2]-codfw,lsw1-b7-codfw with reason: prepping for server uplink migration codfw rack b7
  • 16:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P58258 and previous config saved to /var/cache/conftool/dbconfig/20240229-160124-root.json
  • 15:59 topranks: configuring lsw1-b7-codfw in advance of server migration T355872
  • 15:52 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
  • 15:52 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
  • 15:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 10%: Excersie over', diff saved to https://phabricator.wikimedia.org/P58257 and previous config saved to /var/cache/conftool/dbconfig/20240229-154944-root.json
  • 15:48 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 15:48 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
  • 15:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P58256 and previous config saved to /var/cache/conftool/dbconfig/20240229-154619-root.json
  • 15:46 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
  • 15:43 moritzm: installing tar security updates
  • 15:40 swfrench@cumin2002: dbctl commit (dc=all): 'Depooling db1213 for exercise', diff saved to https://phabricator.wikimedia.org/P58255 and previous config saved to /var/cache/conftool/dbconfig/20240229-154005-swfrench.json
  • 15:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T354015)', diff saved to https://phabricator.wikimedia.org/P58254 and previous config saved to /var/cache/conftool/dbconfig/20240229-153658-marostegui.json
  • 15:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T354015)', diff saved to https://phabricator.wikimedia.org/P58253 and previous config saved to /var/cache/conftool/dbconfig/20240229-153646-marostegui.json
  • 15:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P58252 and previous config saved to /var/cache/conftool/dbconfig/20240229-153115-root.json
  • 15:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P58251 and previous config saved to /var/cache/conftool/dbconfig/20240229-152139-marostegui.json
  • 15:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P58250 and previous config saved to /var/cache/conftool/dbconfig/20240229-151610-root.json
  • 15:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1213.eqiad.wmnet with reason: Maint test
  • 15:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1213.eqiad.wmnet with reason: Maint test
  • 15:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P58248 and previous config saved to /var/cache/conftool/dbconfig/20240229-150632-marostegui.json
  • 15:02 Daimona: T357007 Running mwscript CampaignEvents:GenerateInvitationList --wiki=metawiki --listfile=/home/daimona/list.txt
  • 15:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 5%: After recloning', diff saved to https://phabricator.wikimedia.org/P58247 and previous config saved to /var/cache/conftool/dbconfig/20240229-150105-root.json
  • 14:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T354015)', diff saved to https://phabricator.wikimedia.org/P58246 and previous config saved to /var/cache/conftool/dbconfig/20240229-145125-marostegui.json
  • 14:44 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:44 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for Remove unused Phan suppression, Bump special-new-lexeme, fix redirect without temp user (T358754) (duration: 10m 08s)
  • 14:41 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
  • 14:41 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
  • 14:36 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Continuing with sync
  • 14:36 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Backport for Remove unused Phan suppression, Bump special-new-lexeme, fix redirect without temp user (T358754) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:34 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for Remove unused Phan suppression, Bump special-new-lexeme, fix redirect without temp user (T358754)
  • 14:14 logmsgbot: lucaswerkmeister-wmde@deploy2002 backport Canceled
  • 14:12 urbanecm@deploy2002: Finished scap: Backport for cswiki, commonswiki, enwiki: Lift IP cap for Women in Science Editathon (T358755) (duration: 09m 42s)
  • 14:04 urbanecm@deploy2002: anzx and urbanecm: Continuing with sync
  • 14:04 urbanecm@deploy2002: anzx and urbanecm: Backport for cswiki, commonswiki, enwiki: Lift IP cap for Women in Science Editathon (T358755) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:03 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 14:03 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 14:02 urbanecm@deploy2002: Started scap: Backport for cswiki, commonswiki, enwiki: Lift IP cap for Women in Science Editathon (T358755)
  • 14:02 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jfishback out of all services on: 8 hosts
  • 14:02 root@cumin2002: START - Cookbook sre.idm.logout Logging Jfishback out of all services on: 8 hosts
  • 14:02 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
  • 14:02 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
  • 14:01 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
  • 14:01 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
  • 12:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T352010)', diff saved to https://phabricator.wikimedia.org/P58245 and previous config saved to /var/cache/conftool/dbconfig/20240229-125723-ladsgroup.json
  • 12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P58244 and previous config saved to /var/cache/conftool/dbconfig/20240229-124215-ladsgroup.json
  • 12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P58242 and previous config saved to /var/cache/conftool/dbconfig/20240229-122709-ladsgroup.json
  • 12:16 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
  • 12:14 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
  • 12:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T352010)', diff saved to https://phabricator.wikimedia.org/P58240 and previous config saved to /var/cache/conftool/dbconfig/20240229-121202-ladsgroup.json
  • 12:04 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 12:03 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 12:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T357189)', diff saved to https://phabricator.wikimedia.org/P58239 and previous config saved to /var/cache/conftool/dbconfig/20240229-120335-arnaudb.json
  • 12:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 12:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 12:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T357189)', diff saved to https://phabricator.wikimedia.org/P58238 and previous config saved to /var/cache/conftool/dbconfig/20240229-120312-arnaudb.json
  • 12:02 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 12:01 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 12:00 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:00 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 11:55 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:55 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 11:55 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:55 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
  • 11:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 11:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 11:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P58236 and previous config saved to /var/cache/conftool/dbconfig/20240229-114806-arnaudb.json
  • 11:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 11:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 11:36 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:36 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
  • 11:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 11:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 11:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P58235 and previous config saved to /var/cache/conftool/dbconfig/20240229-113259-arnaudb.json
  • 11:27 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:27 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
  • 11:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T357189)', diff saved to https://phabricator.wikimedia.org/P58234 and previous config saved to /var/cache/conftool/dbconfig/20240229-111753-arnaudb.json
  • 11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T357189)', diff saved to https://phabricator.wikimedia.org/P58233 and previous config saved to /var/cache/conftool/dbconfig/20240229-111247-arnaudb.json
  • 11:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 11:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T357189)', diff saved to https://phabricator.wikimedia.org/P58232 and previous config saved to /var/cache/conftool/dbconfig/20240229-111215-arnaudb.json
  • 11:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2156.codfw.wmnet onto db2190.codfw.wmnet
  • 10:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P58231 and previous config saved to /var/cache/conftool/dbconfig/20240229-105708-arnaudb.json
  • 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58230 and previous config saved to /var/cache/conftool/dbconfig/20240229-104437-root.json
  • 10:42 arnaudb@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58229 and previous config saved to /var/cache/conftool/dbconfig/20240229-104223-arnaudb.json
  • 10:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P58228 and previous config saved to /var/cache/conftool/dbconfig/20240229-104202-arnaudb.json
  • 10:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58227 and previous config saved to /var/cache/conftool/dbconfig/20240229-103431-arnaudb.json
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58226 and previous config saved to /var/cache/conftool/dbconfig/20240229-102932-root.json
  • 10:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58225 and previous config saved to /var/cache/conftool/dbconfig/20240229-102719-arnaudb.json
  • 10:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T357189)', diff saved to https://phabricator.wikimedia.org/P58224 and previous config saved to /var/cache/conftool/dbconfig/20240229-102656-arnaudb.json
  • 10:26 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-tool1005.eqiad.wmnet
  • 10:26 brouberol@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:26 brouberol@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-tool1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
  • 10:24 brouberol@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-tool1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
  • 10:24 claime: Cordoning kubernetes2023.codfw.wmnet for vlan change cookbook tests - T350152
  • 10:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T357189)', diff saved to https://phabricator.wikimedia.org/P58223 and previous config saved to /var/cache/conftool/dbconfig/20240229-102143-arnaudb.json
  • 10:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 10:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 10:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T357189)', diff saved to https://phabricator.wikimedia.org/P58222 and previous config saved to /var/cache/conftool/dbconfig/20240229-102102-arnaudb.json
  • 10:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58221 and previous config saved to /var/cache/conftool/dbconfig/20240229-101926-arnaudb.json
  • 10:17 joal@deploy2002: Finished deploy [analytics/refinery@6e8f25b] (hadoop-test): Additional analytics weekly train - TEST [analytics/refinery@6e8f25b3] (duration: 03m 41s)
  • 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58220 and previous config saved to /var/cache/conftool/dbconfig/20240229-101427-root.json
  • 10:13 joal@deploy2002: Started deploy [analytics/refinery@6e8f25b] (hadoop-test): Additional analytics weekly train - TEST [analytics/refinery@6e8f25b3]
  • 10:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58219 and previous config saved to /var/cache/conftool/dbconfig/20240229-101214-arnaudb.json
  • 10:11 joal@deploy2002: Finished deploy [analytics/refinery@6e8f25b] (thin): Additional analytics weekly train - THIN [analytics/refinery@6e8f25b3] (duration: 00m 05s)
  • 10:11 joal@deploy2002: Started deploy [analytics/refinery@6e8f25b] (thin): Additional analytics weekly train - THIN [analytics/refinery@6e8f25b3]
  • 10:11 joal@deploy2002: Finished deploy [analytics/refinery@6e8f25b]: Additional analytics weekly train [analytics/refinery@6e8f25b3] (duration: 11m 39s)
  • 10:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P58218 and previous config saved to /var/cache/conftool/dbconfig/20240229-100556-arnaudb.json
  • 10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58217 and previous config saved to /var/cache/conftool/dbconfig/20240229-100421-arnaudb.json
  • 09:59 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58216 and previous config saved to /var/cache/conftool/dbconfig/20240229-095923-root.json
  • 09:59 joal@deploy2002: Started deploy [analytics/refinery@6e8f25b]: Additional analytics weekly train [analytics/refinery@6e8f25b3]
  • 09:59 arnaudb@cumin1002: dbctl commit (dc=all): 'T356240 ', diff saved to https://phabricator.wikimedia.org/P58215 and previous config saved to /var/cache/conftool/dbconfig/20240229-095918-arnaudb.json
  • 09:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db2117.codfw.wmnet with reason: Silence for maintenance T356240
  • 09:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db2117.codfw.wmnet with reason: Silence for maintenance T356240
  • 09:57 arnaudb@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 25%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58214 and previous config saved to /var/cache/conftool/dbconfig/20240229-095709-arnaudb.json
  • 09:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1224.eqiad.wmnet
  • 09:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 100%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P58213 and previous config saved to /var/cache/conftool/dbconfig/20240229-095425-root.json
  • 09:51 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db1224.eqiad.wmnet
  • 09:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db1224.eqiad.wmnet with reason: Silence for maintenance T356240
  • 09:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db1224.eqiad.wmnet with reason: Silence for maintenance T356240
  • 09:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P58212 and previous config saved to /var/cache/conftool/dbconfig/20240229-095049-arnaudb.json
  • 09:49 arnaudb@cumin1002: dbctl commit (dc=all): 'T356240 reboot', diff saved to https://phabricator.wikimedia.org/P58211 and previous config saved to /var/cache/conftool/dbconfig/20240229-094945-arnaudb.json
  • 09:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58210 and previous config saved to /var/cache/conftool/dbconfig/20240229-094915-arnaudb.json
  • 09:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T352010)', diff saved to https://phabricator.wikimedia.org/P58209 and previous config saved to /var/cache/conftool/dbconfig/20240229-094429-ladsgroup.json
  • 09:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58208 and previous config saved to /var/cache/conftool/dbconfig/20240229-094418-root.json
  • 09:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 09:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1201.eqiad.wmnet
  • 09:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 75%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P58207 and previous config saved to /var/cache/conftool/dbconfig/20240229-093921-root.json
  • 09:36 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db1201.eqiad.wmnet
  • 09:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T357189)', diff saved to https://phabricator.wikimedia.org/P58206 and previous config saved to /var/cache/conftool/dbconfig/20240229-093543-arnaudb.json
  • 09:34 brouberol@cumin1002: START - Cookbook sre.dns.netbox
  • 09:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T357189)', diff saved to https://phabricator.wikimedia.org/P58205 and previous config saved to /var/cache/conftool/dbconfig/20240229-093025-arnaudb.json
  • 09:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 09:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 09:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T357189)', diff saved to https://phabricator.wikimedia.org/P58204 and previous config saved to /var/cache/conftool/dbconfig/20240229-093003-arnaudb.json
  • 09:29 marostegui@cumin1002: dbctl commit (dc=all): 'Promote back es2034 to es3 codfw master T358180', diff saved to https://phabricator.wikimedia.org/P58203 and previous config saved to /var/cache/conftool/dbconfig/20240229-092929-marostegui.json
  • 09:29 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58202 and previous config saved to /var/cache/conftool/dbconfig/20240229-092913-root.json
  • 09:28 arnaudb@cumin1002: dbctl commit (dc=all): 'depooling for maintenance - reboot', diff saved to https://phabricator.wikimedia.org/P58201 and previous config saved to /var/cache/conftool/dbconfig/20240229-092853-arnaudb.json
  • 09:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2034.codfw.wmnet with OS bookworm
  • 09:26 brouberol@cumin1002: START - Cookbook sre.hosts.decommission for hosts an-tool1005.eqiad.wmnet
  • 09:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db2144.codfw.wmnet,db1201.eqiad.wmnet with reason: Silence for maintenance T356240
  • 09:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db2144.codfw.wmnet,db1201.eqiad.wmnet with reason: Silence for maintenance T356240
  • 09:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 50%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P58200 and previous config saved to /var/cache/conftool/dbconfig/20240229-092416-root.json
  • 09:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P58199 and previous config saved to /var/cache/conftool/dbconfig/20240229-091457-arnaudb.json
  • 09:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 25%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P58198 and previous config saved to /var/cache/conftool/dbconfig/20240229-090911-root.json
  • 09:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2034.codfw.wmnet with reason: host reimage
  • 09:07 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db2156.codfw.wmnet onto db2190.codfw.wmnet
  • 09:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2034.codfw.wmnet with reason: host reimage
  • 08:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P58197 and previous config saved to /var/cache/conftool/dbconfig/20240229-085951-arnaudb.json
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 10%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P58196 and previous config saved to /var/cache/conftool/dbconfig/20240229-085406-root.json
  • 08:52 kartik@deploy2002: Finished scap: Backport for Section Translation: Add 'nb' in target language code (T353734) (duration: 12m 45s)
  • 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58195 and previous config saved to /var/cache/conftool/dbconfig/20240229-085021-root.json
  • 08:47 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es2034.codfw.wmnet with OS bookworm
  • 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2034 T358180', diff saved to https://phabricator.wikimedia.org/P58194 and previous config saved to /var/cache/conftool/dbconfig/20240229-084541-root.json
  • 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'Promote back es2029 to es3 codfw master T358180', diff saved to https://phabricator.wikimedia.org/P58193 and previous config saved to /var/cache/conftool/dbconfig/20240229-084502-marostegui.json
  • 08:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T357189)', diff saved to https://phabricator.wikimedia.org/P58192 and previous config saved to /var/cache/conftool/dbconfig/20240229-084444-arnaudb.json
  • 08:44 kartik@deploy2002: kartik: Continuing with sync
  • 08:40 kartik@deploy2002: kartik: Backport for Section Translation: Add 'nb' in target language code (T353734) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T357189)', diff saved to https://phabricator.wikimedia.org/P58191 and previous config saved to /var/cache/conftool/dbconfig/20240229-083928-arnaudb.json
  • 08:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 08:39 kartik@deploy2002: Started scap: Backport for Section Translation: Add 'nb' in target language code (T353734)
  • 08:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 08:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 5%: Pooling for the first time', diff saved to https://phabricator.wikimedia.org/P58190 and previous config saved to /var/cache/conftool/dbconfig/20240229-083901-root.json
  • 08:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58189 and previous config saved to /var/cache/conftool/dbconfig/20240229-083517-root.json
  • 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1163 (T354015)', diff saved to https://phabricator.wikimedia.org/P58188 and previous config saved to /var/cache/conftool/dbconfig/20240229-082602-marostegui.json
  • 08:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 08:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58185 and previous config saved to /var/cache/conftool/dbconfig/20240229-080507-root.json
  • 08:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: After optimizing revision table', diff saved to https://phabricator.wikimedia.org/P58184 and previous config saved to /var/cache/conftool/dbconfig/20240229-080449-root.json
  • 08:04 kartik@deploy2002: kartik: Backport for Enable Section translation on Wikipedias with Content Translation available as default (T351882) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:03 kartik@deploy2002: Started scap: Backport for Enable Section translation on Wikipedias with Content Translation available as default (T351882)
  • 07:51 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test1003.wikimedia.org with OS bookworm
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58183 and previous config saved to /var/cache/conftool/dbconfig/20240229-075002-root.json
  • 07:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After optimizing revision table', diff saved to https://phabricator.wikimedia.org/P58182 and previous config saved to /var/cache/conftool/dbconfig/20240229-074944-root.json
  • 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'Promote back es1034 to es3 eqiad master T358180', diff saved to https://phabricator.wikimedia.org/P58181 and previous config saved to /var/cache/conftool/dbconfig/20240229-073523-marostegui.json
  • 07:35 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1003.wikimedia.org with reason: host reimage
  • 07:34 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58180 and previous config saved to /var/cache/conftool/dbconfig/20240229-073457-root.json
  • 07:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After optimizing revision table', diff saved to https://phabricator.wikimedia.org/P58179 and previous config saved to /var/cache/conftool/dbconfig/20240229-073440-root.json
  • 07:32 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1003.wikimedia.org with reason: host reimage
  • 07:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After optimizing revision table', diff saved to https://phabricator.wikimedia.org/P58178 and previous config saved to /var/cache/conftool/dbconfig/20240229-071935-root.json
  • 07:19 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1003.wikimedia.org with OS bookworm
  • 07:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2003.codfw.wmnet with reason: sretest
  • 07:14 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2003.codfw.wmnet with reason: sretest
  • 07:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1034.eqiad.wmnet with OS bookworm
  • 07:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After optimizing revision table', diff saved to https://phabricator.wikimedia.org/P58177 and previous config saved to /var/cache/conftool/dbconfig/20240229-070430-root.json
  • 06:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1034.eqiad.wmnet with reason: host reimage
  • 06:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1034.eqiad.wmnet with reason: host reimage
  • 06:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: After optimizing revision table', diff saved to https://phabricator.wikimedia.org/P58176 and previous config saved to /var/cache/conftool/dbconfig/20240229-064925-root.json
  • 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'Pool db2218 with 5% weight only', diff saved to https://phabricator.wikimedia.org/P58175 and previous config saved to /var/cache/conftool/dbconfig/20240229-064402-marostegui.json
  • 06:37 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1034.eqiad.wmnet with OS bookworm
  • 06:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on es1034.eqiad.wmnet with reason: Reimage
  • 06:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on es1034.eqiad.wmnet with reason: Reimage
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1034 T358180', diff saved to https://phabricator.wikimedia.org/P58174 and previous config saved to /var/cache/conftool/dbconfig/20240229-063502-root.json
  • 06:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: After optimizing revision table', diff saved to https://phabricator.wikimedia.org/P58173 and previous config saved to /var/cache/conftool/dbconfig/20240229-063420-root.json
  • 06:34 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2118 from dbctl', diff saved to https://phabricator.wikimedia.org/P58172 and previous config saved to /var/cache/conftool/dbconfig/20240229-063412-marostegui.json
  • 06:26 marostegui@cumin1002: dbctl commit (dc=all): 'Pool db2218 with 1% weight only T358421 T355422', diff saved to https://phabricator.wikimedia.org/P58171 and previous config saved to /var/cache/conftool/dbconfig/20240229-062601-marostegui.json
  • 06:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 06:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 06:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T352010)', diff saved to https://phabricator.wikimedia.org/P58170 and previous config saved to /var/cache/conftool/dbconfig/20240229-060721-ladsgroup.json
  • 05:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P58169 and previous config saved to /var/cache/conftool/dbconfig/20240229-055215-ladsgroup.json
  • 05:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P58168 and previous config saved to /var/cache/conftool/dbconfig/20240229-053708-ladsgroup.json
  • 05:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T352010)', diff saved to https://phabricator.wikimedia.org/P58167 and previous config saved to /var/cache/conftool/dbconfig/20240229-052202-ladsgroup.json
  • 04:30 TimStarling: on mwmaint2002 running migrateBlocks.php on all wikis
  • 04:19 tstarling@deploy2002: Synchronized wmf-config/CommonSettings.php: Switch block schema to read-old/write-both mode T355034 (duration: 08m 47s)
  • 03:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T352010)', diff saved to https://phabricator.wikimedia.org/P58165 and previous config saved to /var/cache/conftool/dbconfig/20240229-030309-ladsgroup.json
  • 03:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 03:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 03:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T352010)', diff saved to https://phabricator.wikimedia.org/P58164 and previous config saved to /var/cache/conftool/dbconfig/20240229-030247-ladsgroup.json
  • 02:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P58163 and previous config saved to /var/cache/conftool/dbconfig/20240229-024741-ladsgroup.json
  • 02:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P58162 and previous config saved to /var/cache/conftool/dbconfig/20240229-023234-ladsgroup.json
  • 02:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T352010)', diff saved to https://phabricator.wikimedia.org/P58161 and previous config saved to /var/cache/conftool/dbconfig/20240229-021728-ladsgroup.json
  • 01:50 Krinkle: ruwiktionary `UPDATE page SET page_namespace=1,page_title=CONCAT('Broken/NS2303:',page_title) WHERE page_id=2469241 AND page_namespace=2303; ` T31272
  • 01:49 Krinkle: ruwiktionary `UPDATE page SET page_namespace=1,page_title=CONCAT('Broken/NS2301:',page_title) WHERE page_id=2469240 AND page_namespace=2301` T31272
  • 00:59 ayounsi@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=93) for new host testvm2006.codfw.wmnet
  • 00:59 ayounsi@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host testvm2006.codfw.wmnet with OS bookworm

2024-02-28

  • 23:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T352010)', diff saved to https://phabricator.wikimedia.org/P58159 and previous config saved to /var/cache/conftool/dbconfig/20240228-232800-ladsgroup.json
  • 23:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 23:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 23:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T357189)', diff saved to https://phabricator.wikimedia.org/P58158 and previous config saved to /var/cache/conftool/dbconfig/20240228-230015-arnaudb.json
  • 22:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P58157 and previous config saved to /var/cache/conftool/dbconfig/20240228-224508-arnaudb.json
  • 22:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P58156 and previous config saved to /var/cache/conftool/dbconfig/20240228-223002-arnaudb.json
  • 22:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T357189)', diff saved to https://phabricator.wikimedia.org/P58155 and previous config saved to /var/cache/conftool/dbconfig/20240228-221456-arnaudb.json
  • 22:14 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2043*,2044*,2079*,2080* for switch maintenance - bking@cumin2002 - T355872
  • 22:13 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2043*,2044*,2079*,2080* for switch maintenance - bking@cumin2002 - T355872
  • 21:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T357189)', diff saved to https://phabricator.wikimedia.org/P58154 and previous config saved to /var/cache/conftool/dbconfig/20240228-211823-arnaudb.json
  • 21:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 21:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 21:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T357189)', diff saved to https://phabricator.wikimedia.org/P58153 and previous config saved to /var/cache/conftool/dbconfig/20240228-211801-arnaudb.json
  • 21:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P58152 and previous config saved to /var/cache/conftool/dbconfig/20240228-210254-arnaudb.json
  • 20:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 20:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 20:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T352010)', diff saved to https://phabricator.wikimedia.org/P58151 and previous config saved to /var/cache/conftool/dbconfig/20240228-205308-ladsgroup.json
  • 20:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P58150 and previous config saved to /var/cache/conftool/dbconfig/20240228-204748-arnaudb.json
  • 20:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P58149 and previous config saved to /var/cache/conftool/dbconfig/20240228-203802-ladsgroup.json
  • 20:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T357189)', diff saved to https://phabricator.wikimedia.org/P58148 and previous config saved to /var/cache/conftool/dbconfig/20240228-203241-arnaudb.json
  • 20:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T357189)', diff saved to https://phabricator.wikimedia.org/P58147 and previous config saved to /var/cache/conftool/dbconfig/20240228-202435-arnaudb.json
  • 20:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 20:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 20:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T357189)', diff saved to https://phabricator.wikimedia.org/P58146 and previous config saved to /var/cache/conftool/dbconfig/20240228-202413-arnaudb.json
  • 20:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P58145 and previous config saved to /var/cache/conftool/dbconfig/20240228-202256-ladsgroup.json
  • 20:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P58144 and previous config saved to /var/cache/conftool/dbconfig/20240228-200906-arnaudb.json
  • 20:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T352010)', diff saved to https://phabricator.wikimedia.org/P58143 and previous config saved to /var/cache/conftool/dbconfig/20240228-200748-ladsgroup.json
  • 19:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P58142 and previous config saved to /var/cache/conftool/dbconfig/20240228-195400-arnaudb.json
  • 19:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T357189)', diff saved to https://phabricator.wikimedia.org/P58141 and previous config saved to /var/cache/conftool/dbconfig/20240228-193854-arnaudb.json
  • 19:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T357189)', diff saved to https://phabricator.wikimedia.org/P58140 and previous config saved to /var/cache/conftool/dbconfig/20240228-193133-arnaudb.json
  • 19:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 19:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 19:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T357189)', diff saved to https://phabricator.wikimedia.org/P58139 and previous config saved to /var/cache/conftool/dbconfig/20240228-193111-arnaudb.json
  • 19:22 dduvall@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.20 refs T354438 (duration: 08m 37s)
  • 19:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P58138 and previous config saved to /var/cache/conftool/dbconfig/20240228-191605-arnaudb.json
  • 19:14 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2118.codfw.wmnet onto db2218.codfw.wmnet
  • 19:14 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.20 refs T354438
  • 19:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P58136 and previous config saved to /var/cache/conftool/dbconfig/20240228-190059-arnaudb.json
  • 18:50 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 18:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 18:49 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 18:49 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 18:49 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 18:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 18:46 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbprov1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T357189)', diff saved to https://phabricator.wikimedia.org/P58135 and previous config saved to /var/cache/conftool/dbconfig/20240228-184552-arnaudb.json
  • 18:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T357189)', diff saved to https://phabricator.wikimedia.org/P58134 and previous config saved to /var/cache/conftool/dbconfig/20240228-183915-arnaudb.json
  • 18:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 18:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 18:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T357189)', diff saved to https://phabricator.wikimedia.org/P58133 and previous config saved to /var/cache/conftool/dbconfig/20240228-183853-arnaudb.json
  • 18:34 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbprov1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P58132 and previous config saved to /var/cache/conftool/dbconfig/20240228-182347-arnaudb.json
  • 18:14 vriley@cumin1002: START - Cookbook sre.hosts.provision for host dbprov1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:14 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:13 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbprov1006 - vriley@cumin1002"
  • 18:13 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbprov1006 - vriley@cumin1002"
  • 18:10 sbailey@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 18:10 sbailey@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 18:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P58131 and previous config saved to /var/cache/conftool/dbconfig/20240228-180840-arnaudb.json
  • 18:08 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 17:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T357189)', diff saved to https://phabricator.wikimedia.org/P58130 and previous config saved to /var/cache/conftool/dbconfig/20240228-175333-arnaudb.json
  • 17:52 sbailey@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 17:52 sbailey@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 17:49 vriley@cumin1002: START - Cookbook sre.hosts.provision for host dbprov1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:48 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:48 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbprov1005 - vriley@cumin1002"
  • 17:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T357189)', diff saved to https://phabricator.wikimedia.org/P58129 and previous config saved to /var/cache/conftool/dbconfig/20240228-174759-arnaudb.json
  • 17:47 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbprov1005 - vriley@cumin1002"
  • 17:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 17:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 17:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 17:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 17:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T357189)', diff saved to https://phabricator.wikimedia.org/P58128 and previous config saved to /var/cache/conftool/dbconfig/20240228-174720-arnaudb.json
  • 17:46 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 17:38 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-redacteddb1001']
  • 17:38 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-redacteddb1001']
  • 17:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P58127 and previous config saved to /var/cache/conftool/dbconfig/20240228-173214-arnaudb.json
  • 17:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P58126 and previous config saved to /var/cache/conftool/dbconfig/20240228-171707-arnaudb.json
  • 17:16 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2118.codfw.wmnet onto db2218.codfw.wmnet
  • 17:16 marostegui@cumin1002: dbctl commit (dc=all): 'Add db2218 depooled T355422', diff saved to https://phabricator.wikimedia.org/P58125 and previous config saved to /var/cache/conftool/dbconfig/20240228-171633-marostegui.json
  • 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T352010)', diff saved to https://phabricator.wikimedia.org/P58124 and previous config saved to /var/cache/conftool/dbconfig/20240228-171157-ladsgroup.json
  • 17:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 17:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T352010)', diff saved to https://phabricator.wikimedia.org/P58123 and previous config saved to /var/cache/conftool/dbconfig/20240228-171136-ladsgroup.json
  • 17:03 sukhe: running dummy authdns-update
  • 17:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T357189)', diff saved to https://phabricator.wikimedia.org/P58122 and previous config saved to /var/cache/conftool/dbconfig/20240228-170201-arnaudb.json
  • 17:01 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1028 to es3 eqiad master T358180', diff saved to https://phabricator.wikimedia.org/P58121 and previous config saved to /var/cache/conftool/dbconfig/20240228-170134-marostegui.json
  • 16:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58120 and previous config saved to /var/cache/conftool/dbconfig/20240228-165841-arnaudb.json
  • 16:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58119 and previous config saved to /var/cache/conftool/dbconfig/20240228-165832-arnaudb.json
  • 16:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2096 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58118 and previous config saved to /var/cache/conftool/dbconfig/20240228-165832-arnaudb.json
  • 16:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58117 and previous config saved to /var/cache/conftool/dbconfig/20240228-165823-arnaudb.json
  • 16:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58116 and previous config saved to /var/cache/conftool/dbconfig/20240228-165815-arnaudb.json
  • 16:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2110 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58115 and previous config saved to /var/cache/conftool/dbconfig/20240228-165806-arnaudb.json
  • 16:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P58114 and previous config saved to /var/cache/conftool/dbconfig/20240228-165629-ladsgroup.json
  • 16:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T357189)', diff saved to https://phabricator.wikimedia.org/P58113 and previous config saved to /var/cache/conftool/dbconfig/20240228-165315-arnaudb.json
  • 16:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 16:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 16:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T357189)', diff saved to https://phabricator.wikimedia.org/P58112 and previous config saved to /var/cache/conftool/dbconfig/20240228-165253-arnaudb.json
  • 16:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Optimize revision table T354015
  • 16:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Optimize revision table T354015
  • 16:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169 T354015', diff saved to https://phabricator.wikimedia.org/P58111 and previous config saved to /var/cache/conftool/dbconfig/20240228-164451-root.json
  • 16:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58110 and previous config saved to /var/cache/conftool/dbconfig/20240228-164337-arnaudb.json
  • 16:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58109 and previous config saved to /var/cache/conftool/dbconfig/20240228-164327-arnaudb.json
  • 16:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2096 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58108 and previous config saved to /var/cache/conftool/dbconfig/20240228-164321-arnaudb.json
  • 16:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58107 and previous config saved to /var/cache/conftool/dbconfig/20240228-164312-arnaudb.json
  • 16:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58106 and previous config saved to /var/cache/conftool/dbconfig/20240228-164310-arnaudb.json
  • 16:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2110 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58105 and previous config saved to /var/cache/conftool/dbconfig/20240228-164301-arnaudb.json
  • 16:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P58104 and previous config saved to /var/cache/conftool/dbconfig/20240228-164123-ladsgroup.json
  • 16:40 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:39 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P58103 and previous config saved to /var/cache/conftool/dbconfig/20240228-163747-arnaudb.json
  • 16:31 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=mw23(2[5-9]|3[0-4]).codfw.wmnet
  • 16:28 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58102 and previous config saved to /var/cache/conftool/dbconfig/20240228-162832-arnaudb.json
  • 16:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58101 and previous config saved to /var/cache/conftool/dbconfig/20240228-162823-arnaudb.json
  • 16:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2096 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58100 and previous config saved to /var/cache/conftool/dbconfig/20240228-162816-arnaudb.json
  • 16:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58099 and previous config saved to /var/cache/conftool/dbconfig/20240228-162807-arnaudb.json
  • 16:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58098 and previous config saved to /var/cache/conftool/dbconfig/20240228-162806-arnaudb.json
  • 16:27 arnaudb@cumin1002: dbctl commit (dc=all): 'db2110 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58097 and previous config saved to /var/cache/conftool/dbconfig/20240228-162756-arnaudb.json
  • 16:27 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:27 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T352010)', diff saved to https://phabricator.wikimedia.org/P58096 and previous config saved to /var/cache/conftool/dbconfig/20240228-162616-ladsgroup.json
  • 16:25 topranks: Disabling IPv6 RAs for private1-b-codfw vlan on codfw CR routers, moving GW to lsw/ssw T355544
  • 16:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P58095 and previous config saved to /var/cache/conftool/dbconfig/20240228-162240-arnaudb.json
  • 16:21 dancy@deploy2002: Finished scap: testing new scap release (duration: 09m 12s)
  • 16:18 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:17 sukhe: sudo cumin 'A:dns-rec' "run-puppet-agent --enable 'merging CR 1006955'"
  • 16:17 moritzm: import cas 6.6.12+wmf12u3 to bookworm-wikimedia T357748
  • 16:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58094 and previous config saved to /var/cache/conftool/dbconfig/20240228-161327-arnaudb.json
  • 16:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58093 and previous config saved to /var/cache/conftool/dbconfig/20240228-161318-arnaudb.json
  • 16:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2096 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58092 and previous config saved to /var/cache/conftool/dbconfig/20240228-161312-arnaudb.json
  • 16:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2124 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58091 and previous config saved to /var/cache/conftool/dbconfig/20240228-161303-arnaudb.json
  • 16:13 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
  • 16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2111 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58090 and previous config saved to /var/cache/conftool/dbconfig/20240228-161254-arnaudb.json
  • 16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2110 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58089 and previous config saved to /var/cache/conftool/dbconfig/20240228-161251-arnaudb.json
  • 16:12 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=authdns-update
  • 16:12 dancy@deploy2002: Started scap: testing new scap release
  • 16:12 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org,service=authdns-update
  • 16:11 dancy@deploy2002: Installation of scap version "4.67.0" completed for 445 hosts
  • 16:11 dancy@deploy2002: Installing scap version "4.67.0" for 445 hosts
  • 16:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T357189)', diff saved to https://phabricator.wikimedia.org/P58088 and previous config saved to /var/cache/conftool/dbconfig/20240228-160734-arnaudb.json
  • 16:06 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:06 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 37 hosts with reason: Migrating servers in codfw rack B6 to lsw1-b6-codfw
  • 16:04 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 37 hosts with reason: Migrating servers in codfw rack B6 to lsw1-b6-codfw
  • 16:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2104 (T357189)', diff saved to https://phabricator.wikimedia.org/P58087 and previous config saved to /var/cache/conftool/dbconfig/20240228-160202-arnaudb.json
  • 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 15:59 sukhe: sudo cumin "A:dns-rec" "disable-puppet 'merging CR 1006955'"
  • 15:57 samtar@deploy2002: Finished scap: Backport for InitialiseSettings: Enable Edit Recovery on arwiki (T355548) (duration: 10m 10s)
  • 15:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 15:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2198.codfw.wmnet with OS bookworm
  • 15:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 15:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:51 topranks: configuring lsw1-b6-codfw in advance of server migration T355871
  • 15:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T357189)', diff saved to https://phabricator.wikimedia.org/P58086 and previous config saved to /var/cache/conftool/dbconfig/20240228-155113-arnaudb.json
  • 15:49 samtar@deploy2002: samtar: Continuing with sync
  • 15:49 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-b-codfw,cr[1-2]-codfw,lsw1-b6-codfw.mgmt with reason: prepping for server uplink migration codfw rack b6
  • 15:48 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on asw-b-codfw,cr[1-2]-codfw,lsw1-b6-codfw.mgmt with reason: prepping for server uplink migration codfw rack b6
  • 15:48 samtar@deploy2002: samtar: Backport for InitialiseSettings: Enable Edit Recovery on arwiki (T355548) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:48 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
  • 15:46 samtar@deploy2002: Started scap: Backport for InitialiseSettings: Enable Edit Recovery on arwiki (T355548)
  • 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2197.codfw.wmnet with OS bookworm
  • 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'T355871 - depooling db2110 db2111 db2124 db2134 db2096 db2161 db2162', diff saved to https://phabricator.wikimedia.org/P58085 and previous config saved to /var/cache/conftool/dbconfig/20240228-154043-arnaudb.json
  • 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2198.codfw.wmnet with reason: host reimage
  • 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 7 hosts with reason: Silence for maintenance T355871
  • 15:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on 7 hosts with reason: Silence for maintenance T355871
  • 15:37 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2198.codfw.wmnet with reason: host reimage
  • 15:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P58084 and previous config saved to /var/cache/conftool/dbconfig/20240228-153607-arnaudb.json
  • 15:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:35 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=mw23(2[5-9]|3[0-4]).codfw.wmnet
  • 15:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2215.codfw.wmnet with OS bookworm
  • 15:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:30 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:28 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:28 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:25 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:25 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:23 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P58083 and previous config saved to /var/cache/conftool/dbconfig/20240228-152101-arnaudb.json
  • 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2197.codfw.wmnet with reason: host reimage
  • 15:18 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2215.codfw.wmnet with reason: host reimage
  • 15:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2198.codfw.wmnet with OS bookworm
  • 15:17 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:17 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:15 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2197.codfw.wmnet with reason: host reimage
  • 15:15 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:14 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2215.codfw.wmnet with reason: host reimage
  • 15:11 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:10 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:10 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:09 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:08 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:08 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2198.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T357189)', diff saved to https://phabricator.wikimedia.org/P58082 and previous config saved to /var/cache/conftool/dbconfig/20240228-150554-arnaudb.json
  • 15:04 fab@deploy2002: Finished deploy [airflow-dags/research@4bed377]: (no justification provided) (duration: 00m 42s)
  • 15:03 fab@deploy2002: Started deploy [airflow-dags/research@4bed377]: (no justification provided)
  • 15:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T357189)', diff saved to https://phabricator.wikimedia.org/P58081 and previous config saved to /var/cache/conftool/dbconfig/20240228-145958-arnaudb.json
  • 14:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 14:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2198.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 14:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2197.codfw.wmnet with OS bookworm
  • 14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T357189)', diff saved to https://phabricator.wikimedia.org/P58080 and previous config saved to /var/cache/conftool/dbconfig/20240228-145457-arnaudb.json
  • 14:53 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2215.codfw.wmnet with OS bookworm
  • 14:40 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:39 jiji@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P58079 and previous config saved to /var/cache/conftool/dbconfig/20240228-143951-arnaudb.json
  • 14:39 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:39 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:38 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:38 daniel@deploy2002: Finished scap: Backport for Configure parser cache filters for parsoid-pcache (T346765 T355375) (duration: 14m 56s)
  • 14:37 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:32 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:30 daniel@deploy2002: daniel: Continuing with sync
  • 14:29 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on wdqs2008.codfw.wmnet with reason: T355617
  • 14:27 bking@cumin2002: START - Cookbook sre.hosts.downtime for 6:00:00 on wdqs2008.codfw.wmnet with reason: T355617
  • 14:25 jiji@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:25 daniel@deploy2002: daniel: Backport for Configure parser cache filters for parsoid-pcache (T346765 T355375) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P58077 and previous config saved to /var/cache/conftool/dbconfig/20240228-142445-arnaudb.json
  • 14:24 jiji@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:24 jiji@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:23 daniel@deploy2002: Started scap: Backport for Configure parser cache filters for parsoid-pcache (T346765 T355375)
  • 14:23 jiji@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:22 jiji@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:22 jiji@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T352010)', diff saved to https://phabricator.wikimedia.org/P58076 and previous config saved to /var/cache/conftool/dbconfig/20240228-141413-ladsgroup.json
  • 14:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 14:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T357189)', diff saved to https://phabricator.wikimedia.org/P58075 and previous config saved to /var/cache/conftool/dbconfig/20240228-140938-arnaudb.json
  • 14:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 100%: After running optimize', diff saved to https://phabricator.wikimedia.org/P58074 and previous config saved to /var/cache/conftool/dbconfig/20240228-140626-root.json
  • 14:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T357189)', diff saved to https://phabricator.wikimedia.org/P58073 and previous config saved to /var/cache/conftool/dbconfig/20240228-140346-arnaudb.json
  • 14:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 14:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 14:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T357189)', diff saved to https://phabricator.wikimedia.org/P58072 and previous config saved to /var/cache/conftool/dbconfig/20240228-140323-arnaudb.json
  • 13:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2003 - ayounsi@cumin1002"
  • 13:52 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2003 - ayounsi@cumin1002"
  • 13:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 75%: After running optimize', diff saved to https://phabricator.wikimedia.org/P58071 and previous config saved to /var/cache/conftool/dbconfig/20240228-135121-root.json
  • 13:49 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 13:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P58070 and previous config saved to /var/cache/conftool/dbconfig/20240228-134817-arnaudb.json
  • 13:41 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=99) for host <spicerack.netbox.NetboxServer object at 0x7f3aaebfffa0>
  • 13:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 6.0.0.0.0.1.0.0.2.9.1.0.0.1.0.0.b.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:41 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache 6.0.0.0.0.1.0.0.2.9.1.0.0.1.0.0.b.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 6.10.192.10.in-addr.arpa on all recursors
  • 13:41 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache 6.10.192.10.in-addr.arpa on all recursors
  • 13:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2003.codfw.wmnet on all recursors
  • 13:41 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache sretest2003.codfw.wmnet on all recursors
  • 13:40 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:40 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1007.eqiad.wmnet with OS bookworm
  • 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T352010)', diff saved to https://phabricator.wikimedia.org/P58069 and previous config saved to /var/cache/conftool/dbconfig/20240228-133959-ladsgroup.json
  • 13:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T352010)', diff saved to https://phabricator.wikimedia.org/P58068 and previous config saved to /var/cache/conftool/dbconfig/20240228-133937-ladsgroup.json
  • 13:39 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 13:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 5.5.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:39 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache 5.5.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 13:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 255.0.192.10.in-addr.arpa on all recursors
  • 13:39 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache 255.0.192.10.in-addr.arpa on all recursors
  • 13:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2003.codfw.wmnet on all recursors
  • 13:39 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache sretest2003.codfw.wmnet on all recursors
  • 13:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest2003 - ayounsi@cumin1002"
  • 13:38 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest2003 - ayounsi@cumin1002"
  • 13:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 50%: After running optimize', diff saved to https://phabricator.wikimedia.org/P58066 and previous config saved to /var/cache/conftool/dbconfig/20240228-133616-root.json
  • 13:36 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 13:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P58065 and previous config saved to /var/cache/conftool/dbconfig/20240228-133311-arnaudb.json
  • 13:33 ayounsi@cumin1002: START - Cookbook sre.hosts.move-vlan for host <spicerack.netbox.NetboxServer object at 0x7f3aaebfffa0>
  • 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P58064 and previous config saved to /var/cache/conftool/dbconfig/20240228-132431-ladsgroup.json
  • 13:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 25%: After running optimize', diff saved to https://phabricator.wikimedia.org/P58063 and previous config saved to /var/cache/conftool/dbconfig/20240228-132111-root.json
  • 13:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2177 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P58062 and previous config saved to /var/cache/conftool/dbconfig/20240228-132002-root.json
  • 13:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
  • 13:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T357189)', diff saved to https://phabricator.wikimedia.org/P58061 and previous config saved to /var/cache/conftool/dbconfig/20240228-131804-arnaudb.json
  • 13:16 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
  • 13:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T357189)', diff saved to https://phabricator.wikimedia.org/P58060 and previous config saved to /var/cache/conftool/dbconfig/20240228-131318-arnaudb.json
  • 13:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 13:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2006.codfw.wmnet
  • 13:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2006.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
  • 13:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 13:12 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2006.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
  • 13:11 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 255.0.192.10.in-addr.arpa on codfw recursors
  • 13:11 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache 255.0.192.10.in-addr.arpa on codfw recursors
  • 13:11 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 10.192.0.229 on codfw recursors
  • 13:11 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache 10.192.0.229 on codfw recursors
  • 13:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P58059 and previous config saved to /var/cache/conftool/dbconfig/20240228-130925-ladsgroup.json
  • 13:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 13:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 13:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T357189)', diff saved to https://phabricator.wikimedia.org/P58058 and previous config saved to /var/cache/conftool/dbconfig/20240228-130811-arnaudb.json
  • 13:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 10%: After running optimize', diff saved to https://phabricator.wikimedia.org/P58057 and previous config saved to /var/cache/conftool/dbconfig/20240228-130606-root.json
  • 13:05 moritzm: installing bind9 security updates
  • 13:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2177 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P58056 and previous config saved to /var/cache/conftool/dbconfig/20240228-130457-root.json
  • 13:03 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host dbstore1007.eqiad.wmnet with OS bookworm
  • 13:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 13:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 13:01 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 12:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 12:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 12:57 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts testvm2006.codfw.wmnet
  • 12:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T352010)', diff saved to https://phabricator.wikimedia.org/P58055 and previous config saved to /var/cache/conftool/dbconfig/20240228-125418-ladsgroup.json
  • 12:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P58054 and previous config saved to /var/cache/conftool/dbconfig/20240228-125305-arnaudb.json
  • 12:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1232 (re)pooling @ 5%: After running optimize', diff saved to https://phabricator.wikimedia.org/P58053 and previous config saved to /var/cache/conftool/dbconfig/20240228-125102-root.json
  • 12:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2177 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P58052 and previous config saved to /var/cache/conftool/dbconfig/20240228-124953-root.json
  • 12:47 moritzm: import cas 6.6.12+wmf12u2 to bookworm-wikimedia T357748
  • 12:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P58050 and previous config saved to /var/cache/conftool/dbconfig/20240228-123759-arnaudb.json
  • 12:37 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1007.eqiad.wmnet with OS bullseye
  • 12:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2177 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P58049 and previous config saved to /var/cache/conftool/dbconfig/20240228-123448-root.json
  • 12:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T357189)', diff saved to https://phabricator.wikimedia.org/P58048 and previous config saved to /var/cache/conftool/dbconfig/20240228-122252-arnaudb.json
  • 12:16 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
  • 12:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T357189)', diff saved to https://phabricator.wikimedia.org/P58047 and previous config saved to /var/cache/conftool/dbconfig/20240228-121603-arnaudb.json
  • 12:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 12:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 12:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T357189)', diff saved to https://phabricator.wikimedia.org/P58046 and previous config saved to /var/cache/conftool/dbconfig/20240228-121541-arnaudb.json
  • 12:14 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
  • 12:01 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host dbstore1007.eqiad.wmnet with OS bullseye
  • 12:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P58045 and previous config saved to /var/cache/conftool/dbconfig/20240228-120035-arnaudb.json
  • 11:57 jiji@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:54 jiji@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:52 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1007.eqiad.wmnet with OS bookworm
  • 11:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P58044 and previous config saved to /var/cache/conftool/dbconfig/20240228-114529-arnaudb.json
  • 11:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2156.codfw.wmnet onto db2177.codfw.wmnet
  • 11:31 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
  • 11:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T357189)', diff saved to https://phabricator.wikimedia.org/P58043 and previous config saved to /var/cache/conftool/dbconfig/20240228-113022-arnaudb.json
  • 11:27 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: host reimage
  • 11:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T357189)', diff saved to https://phabricator.wikimedia.org/P58042 and previous config saved to /var/cache/conftool/dbconfig/20240228-112523-arnaudb.json
  • 11:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 11:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 11:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T357189)', diff saved to https://phabricator.wikimedia.org/P58041 and previous config saved to /var/cache/conftool/dbconfig/20240228-112501-arnaudb.json
  • 11:24 stran@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 11:23 stran@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 11:22 stran@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 11:22 stran@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 11:19 stran@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 11:18 stran@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 11:14 moritzm: import cas 6.6.12+wmf12u1 to bookworm-wikimedia T357748
  • 11:13 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host dbstore1007.eqiad.wmnet with OS bookworm
  • 11:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P58039 and previous config saved to /var/cache/conftool/dbconfig/20240228-110955-arnaudb.json
  • 11:03 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 11:02 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 10:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P58038 and previous config saved to /var/cache/conftool/dbconfig/20240228-105449-arnaudb.json
  • 10:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T357189)', diff saved to https://phabricator.wikimedia.org/P58037 and previous config saved to /var/cache/conftool/dbconfig/20240228-103942-arnaudb.json
  • 10:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T357189)', diff saved to https://phabricator.wikimedia.org/P58036 and previous config saved to /var/cache/conftool/dbconfig/20240228-103442-arnaudb.json
  • 10:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 10:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 10:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T357189)', diff saved to https://phabricator.wikimedia.org/P58035 and previous config saved to /var/cache/conftool/dbconfig/20240228-103419-arnaudb.json
  • 10:32 claime: Lowered the weight of small disk videoscalers
  • 10:31 cgoubert@cumin2002: conftool action : set/weight=15; selector: name=mw(2259|226[3-6]|2278|2279|2281).codfw.wmnet,cluster=videoscaler
  • 10:31 moritzm: copy cas from bullseye-wikimedia to bookworm-wikimedia T357748
  • 10:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P58034 and previous config saved to /var/cache/conftool/dbconfig/20240228-101913-arnaudb.json
  • 10:18 volans: installed spicerack 8.4.0 on cumin1002
  • 10:12 claime: clearing up leftover boxedcommand media files on mw2281 - sudo find . -type f \( -name '*.wav' -o -name '*.ogg' -o -name '*.webm' -o -name '*.mov' -o -name '*.mp4' \) -mmin +1200 -exec sh -c "lsof {} || rm {}" \;
  • 10:12 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db2156.codfw.wmnet onto db2177.codfw.wmnet
  • 10:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T352010)', diff saved to https://phabricator.wikimedia.org/P58033 and previous config saved to /var/cache/conftool/dbconfig/20240228-100720-ladsgroup.json
  • 10:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:04 claime: clearing up leftover boxedcommand media files on mw2278 - sudo find . -type f \( -name '*.wav' -o -name '*.ogg' -o -name '*.webm' -o -name '*.mov' -o -name '*.mp4' \) -mmin +1200 -exec sh -c "lsof {} || rm {}" \;
  • 10:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P58032 and previous config saved to /var/cache/conftool/dbconfig/20240228-100406-arnaudb.json
  • 10:03 ayounsi@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
  • 10:00 ayounsi@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
  • 09:54 ladsgroup@deploy2002: Finished scap: Backport for Set three more wikis to read new on pagelinks migration (T351237) (duration: 10m 03s)
  • 09:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T357189)', diff saved to https://phabricator.wikimedia.org/P58030 and previous config saved to /var/cache/conftool/dbconfig/20240228-094900-arnaudb.json
  • 09:46 ayounsi@cumin2002: START - Cookbook sre.hosts.reimage for host testvm2006.codfw.wmnet with OS bookworm
  • 09:46 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 09:46 joal@deploy2002: Finished deploy [analytics/refinery@dba67fd] (hadoop-test): Additional analytics weekly train - TEST [analytics/refinery@dba67fd6] (duration: 03m 33s)
  • 09:46 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - ayounsi@cumin2002"
  • 09:45 ladsgroup@deploy2002: ladsgroup: Backport for Set three more wikis to read new on pagelinks migration (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:45 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2006.codfw.wmnet - ayounsi@cumin2002"
  • 09:45 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2006.codfw.wmnet on all recursors
  • 09:44 ayounsi@cumin2002: START - Cookbook sre.dns.wipe-cache testvm2006.codfw.wmnet on all recursors
  • 09:44 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:44 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - ayounsi@cumin2002"
  • 09:44 ladsgroup@deploy2002: Started scap: Backport for Set three more wikis to read new on pagelinks migration (T351237)
  • 09:42 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2006.codfw.wmnet - ayounsi@cumin2002"
  • 09:42 joal@deploy2002: Started deploy [analytics/refinery@dba67fd] (hadoop-test): Additional analytics weekly train - TEST [analytics/refinery@dba67fd6]
  • 09:42 joal@deploy2002: Finished deploy [analytics/refinery@dba67fd] (thin): Additional analytics weekly train - THIN [analytics/refinery@dba67fd6] (duration: 00m 05s)
  • 09:42 joal@deploy2002: Started deploy [analytics/refinery@dba67fd] (thin): Additional analytics weekly train - THIN [analytics/refinery@dba67fd6]
  • 09:41 joal@deploy2002: Finished deploy [analytics/refinery@dba67fd]: Additional analytics weekly train [analytics/refinery@dba67fd6] (duration: 13m 16s)
  • 09:41 filippo@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:41 filippo@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T357189)', diff saved to https://phabricator.wikimedia.org/P58029 and previous config saved to /var/cache/conftool/dbconfig/20240228-094103-arnaudb.json
  • 09:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 09:41 filippo@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:41 ayounsi@cumin2002: START - Cookbook sre.dns.netbox
  • 09:41 ayounsi@cumin2002: START - Cookbook sre.ganeti.makevm for new host testvm2006.codfw.wmnet
  • 09:40 filippo@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 09:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T357189)', diff saved to https://phabricator.wikimedia.org/P58028 and previous config saved to /var/cache/conftool/dbconfig/20240228-094041-arnaudb.json
  • 09:40 filippo@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:39 filippo@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:34 moritzm: installing monitoring-plugins bugfix updates from Bookworm point update
  • 09:28 joal@deploy2002: Started deploy [analytics/refinery@dba67fd]: Additional analytics weekly train [analytics/refinery@dba67fd6]
  • 09:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P58027 and previous config saved to /var/cache/conftool/dbconfig/20240228-092535-arnaudb.json
  • 09:25 volans: installed spicerack 8.4.0 on cumin2002
  • 09:23 slyngshede@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host idp-test2003.wikimedia.org
  • 09:23 slyngshede@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test2003.wikimedia.org with OS bookworm
  • 09:15 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test2003.wikimedia.org with reason: host reimage
  • 09:14 moritzm: installing perl security updates on bullseye
  • 09:13 volans: temporary disabling puppet on cumin1002
  • 09:12 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test2003.wikimedia.org with reason: host reimage
  • 09:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P58026 and previous config saved to /var/cache/conftool/dbconfig/20240228-091029-arnaudb.json
  • 08:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T357189)', diff saved to https://phabricator.wikimedia.org/P58025 and previous config saved to /var/cache/conftool/dbconfig/20240228-085523-arnaudb.json
  • 08:55 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test2003.wikimedia.org with OS bookworm
  • 08:52 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2003.wikimedia.org - slyngshede@cumin1002"
  • 08:51 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2003.wikimedia.org - slyngshede@cumin1002"
  • 08:51 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test2003.wikimedia.org on all recursors
  • 08:51 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp-test2003.wikimedia.org on all recursors
  • 08:51 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2003.wikimedia.org - slyngshede@cumin1002"
  • 08:50 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2003.wikimedia.org - slyngshede@cumin1002"
  • 08:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T357189)', diff saved to https://phabricator.wikimedia.org/P58024 and previous config saved to /var/cache/conftool/dbconfig/20240228-084731-arnaudb.json
  • 08:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 08:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58023 and previous config saved to /var/cache/conftool/dbconfig/20240228-084322-root.json
  • 08:28 kartik@deploy2002: Finished scap: Backport for Enable Section Translation on newly created Wikipedias by default (T298235), Enable SectionTranslation for Wikipedias where ContentTranslation is in beta (T353734) (duration: 12m 59s)
  • 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58022 and previous config saved to /var/cache/conftool/dbconfig/20240228-082817-root.json
  • 08:02 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 08:02 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp-test2003.wikimedia.org
  • 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58020 and previous config saved to /var/cache/conftool/dbconfig/20240228-075807-root.json
  • 07:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2187.codfw.wmnet with OS bookworm
  • 07:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2186.codfw.wmnet with OS bookworm
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58018 and previous config saved to /var/cache/conftool/dbconfig/20240228-074302-root.json
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2156 T358640', diff saved to https://phabricator.wikimedia.org/P58017 and previous config saved to /var/cache/conftool/dbconfig/20240228-074259-root.json
  • 07:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2187.codfw.wmnet with reason: host reimage
  • 07:27 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P58016 and previous config saved to /var/cache/conftool/dbconfig/20240228-072757-root.json
  • 07:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2027.codfw.wmnet with OS bookworm
  • 07:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2187.codfw.wmnet with reason: host reimage
  • 07:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: host reimage
  • 07:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2186.codfw.wmnet with reason: host reimage
  • 07:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2027.codfw.wmnet with reason: host reimage
  • 07:09 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2187.codfw.wmnet with OS bookworm
  • 07:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2027.codfw.wmnet with reason: host reimage
  • 06:58 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2186.codfw.wmnet with OS bookworm
  • 06:51 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s1
  • 06:51 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s3
  • 06:51 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es2027.codfw.wmnet with OS bookworm
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2027 T358180', diff saved to https://phabricator.wikimedia.org/P58015 and previous config saved to /var/cache/conftool/dbconfig/20240228-064731-root.json
  • 06:44 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet,service=s1
  • 06:44 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet,service=s3
  • 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1232 - optimizing revision table T354015', diff saved to https://phabricator.wikimedia.org/P58014 and previous config saved to /var/cache/conftool/dbconfig/20240228-064210-root.json
  • 03:13 slyngshede@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host idp-test1003.wikimedia.org
  • 03:12 slyngshede@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test1003.wikimedia.org with OS bookworm
  • 03:05 swfrench@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver,service=nginx,name=mw2268.codfw.wmnet
  • 03:03 swfrench@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,cluster=appserver,service=nginx,name=mw2268.codfw.wmnet
  • 02:52 swfrench-wmf: Running 'sudo systemctl start etcdmirror-conftool-eqiad-wmnet.service' on conf2005
  • 02:50 swfrench-wmf: Correction: Actually running 'curl https://conf2005.codfw.wmnet:2379/v2/keys/__replication/conftool -XPUT -d "value=3021126"' on conf2005 in an attempt to unwedge replication
  • 02:47 swfrench-wmf: Running 'curl https://conf2005.codfw.wmnet:2379/v2/keys/__replication -XPUT -d "value=3021126"' on conf2005 in an attempt to unwedge replication
  • 02:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2213.codfw.wmnet with OS bookworm
  • 02:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2218.codfw.wmnet with OS bookworm
  • 02:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2217.codfw.wmnet with OS bookworm
  • 01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2220.codfw.wmnet with OS bookworm
  • 01:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2210.codfw.wmnet with OS bookworm
  • 01:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2212.codfw.wmnet with OS bookworm
  • 01:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2213.codfw.wmnet with reason: host reimage
  • 01:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:48 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2211.codfw.wmnet with OS bookworm
  • 01:48 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:47 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2219.codfw.wmnet with OS bookworm
  • 01:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2218.codfw.wmnet with reason: host reimage
  • 01:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2214.codfw.wmnet with OS bookworm
  • 01:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2217.codfw.wmnet with reason: host reimage
  • 01:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2220.codfw.wmnet with reason: host reimage
  • 01:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2216.codfw.wmnet with OS bookworm
  • 01:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2209.codfw.wmnet with OS bookworm
  • 01:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2210.codfw.wmnet with reason: host reimage
  • 01:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2212.codfw.wmnet with reason: host reimage
  • 01:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2211.codfw.wmnet with reason: host reimage
  • 01:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2219.codfw.wmnet with reason: host reimage
  • 01:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2214.codfw.wmnet with reason: host reimage
  • 01:26 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2219.codfw.wmnet with reason: host reimage
  • 01:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2216.codfw.wmnet with reason: host reimage
  • 01:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: host reimage
  • 01:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2212.codfw.wmnet with reason: host reimage
  • 01:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2217.codfw.wmnet with reason: host reimage
  • 01:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2213.codfw.wmnet with reason: host reimage
  • 01:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2218.codfw.wmnet with reason: host reimage
  • 01:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2210.codfw.wmnet with reason: host reimage
  • 01:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2214.codfw.wmnet with reason: host reimage
  • 01:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2206.codfw.wmnet with OS bookworm
  • 01:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2216.codfw.wmnet with reason: host reimage
  • 01:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2220.codfw.wmnet with reason: host reimage
  • 01:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2211.codfw.wmnet with reason: host reimage
  • 01:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: host reimage
  • 01:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2208.codfw.wmnet with OS bookworm
  • 01:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2207.codfw.wmnet with OS bookworm
  • 01:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2204.codfw.wmnet with OS bookworm
  • 01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2205.codfw.wmnet with OS bookworm
  • 01:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2203.codfw.wmnet with OS bookworm
  • 01:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2206.codfw.wmnet with reason: host reimage
  • 01:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2208.codfw.wmnet with reason: host reimage
  • 00:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: host reimage
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2220.codfw.wmnet with OS bookworm
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2219.codfw.wmnet with OS bookworm
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2218.codfw.wmnet with OS bookworm
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2217.codfw.wmnet with OS bookworm
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2216.codfw.wmnet with OS bookworm
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2215.codfw.wmnet with OS bookworm
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2214.codfw.wmnet with OS bookworm
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2213.codfw.wmnet with OS bookworm
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2212.codfw.wmnet with OS bookworm
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2211.codfw.wmnet with OS bookworm
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2210.codfw.wmnet with OS bookworm
  • 00:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2209.codfw.wmnet with OS bookworm
  • 00:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2204.codfw.wmnet with reason: host reimage
  • 00:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2205.codfw.wmnet with reason: host reimage
  • 00:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2204.codfw.wmnet with reason: host reimage
  • 00:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2208.codfw.wmnet with reason: host reimage
  • 00:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2206.codfw.wmnet with reason: host reimage
  • 00:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: host reimage
  • 00:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2205.codfw.wmnet with reason: host reimage
  • 00:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2203.codfw.wmnet with reason: host reimage
  • 00:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2203.codfw.wmnet with reason: host reimage
  • 00:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2198.codfw.wmnet with OS bookworm
  • 00:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2208.codfw.wmnet with OS bookworm
  • 00:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2207.codfw.wmnet with OS bookworm
  • 00:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2206.codfw.wmnet with OS bookworm
  • 00:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2205.codfw.wmnet with OS bookworm
  • 00:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2204.codfw.wmnet with OS bookworm
  • 00:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2203.codfw.wmnet with OS bookworm
  • 00:10 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 00:10 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 00:08 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 00:08 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 00:08 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 00:07 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 00:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15:00:00 on wdqs1011.eqiad.wmnet with reason: T355617
  • 00:06 bking@cumin2002: START - Cookbook sre.hosts.downtime for 15:00:00 on wdqs1011.eqiad.wmnet with reason: T355617
  • 00:02 dzahn@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host contint1003.eqiad.wmnet
  • 00:02 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host contint1003.eqiad.wmnet with OS bullseye
  • 00:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2199.codfw.wmnet with OS bookworm
  • 00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2202.codfw.wmnet with OS bookworm
  • 00:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"

2024-02-27

  • 23:57 mutante: T358237 - manually went through "fix forward"-steps from T349619 (install puppet-agent package, delete old key material, create new CSR, sign on puppetserver, node clean on puppetmaster) to fix puppet failures while makevm cookbook still running (which couldn't find succesful puppet run)
  • 23:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2201.codfw.wmnet with OS bookworm
  • 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:52 mutante: T358237 - creating VM with cookbook fails because puppet runs have certificate issue, applied role is already migrated to puppet 7 though
  • 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2200.codfw.wmnet with OS bookworm
  • 23:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2199.codfw.wmnet with reason: host reimage
  • 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2202.codfw.wmnet with reason: host reimage
  • 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2201.codfw.wmnet with reason: host reimage
  • 23:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2199.codfw.wmnet with reason: host reimage
  • 23:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2202.codfw.wmnet with reason: host reimage
  • 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2200.codfw.wmnet with reason: host reimage
  • 23:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2201.codfw.wmnet with reason: host reimage
  • 23:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2200.codfw.wmnet with reason: host reimage
  • 23:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2202.codfw.wmnet with OS bookworm
  • 23:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2201.codfw.wmnet with OS bookworm
  • 23:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2200.codfw.wmnet with OS bookworm
  • 23:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2199.codfw.wmnet with OS bookworm
  • 23:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2198.codfw.wmnet with OS bookworm
  • 23:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2197.codfw.wmnet with OS bookworm
  • 22:47 mutante: DNS - added new project language "bew" - Betawi, also known as Betawi Malay, Jakartan Malay, or Batavian Malay is the spoken language of the Betawi people in Jakarta, Indonesia with an estimated 5 million native speakers. T357866
  • 22:44 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on contint1003.eqiad.wmnet with reason: host reimage
  • 22:41 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on contint1003.eqiad.wmnet with reason: host reimage
  • 22:32 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host contint1003.eqiad.wmnet with OS bullseye
  • 22:31 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM contint1003.eqiad.wmnet - dzahn@cumin1002"
  • 22:30 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM contint1003.eqiad.wmnet - dzahn@cumin1002"
  • 22:30 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) contint1003.eqiad.wmnet on all recursors
  • 22:30 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache contint1003.eqiad.wmnet on all recursors
  • 22:30 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:30 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM contint1003.eqiad.wmnet - dzahn@cumin1002"
  • 22:29 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM contint1003.eqiad.wmnet - dzahn@cumin1002"
  • 22:24 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 22:24 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host contint1003.eqiad.wmnet
  • 20:51 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: sync
  • 20:51 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: sync
  • 20:50 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 20:50 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 20:48 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: sync
  • 20:48 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: sync
  • 20:48 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: sync
  • 20:47 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 20:47 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 20:45 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 20:45 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 20:43 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 20:41 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 20:40 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 19:47 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T347624, testing 961878 patch) xfer categories from wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 19:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T352010)', diff saved to https://phabricator.wikimedia.org/P58012 and previous config saved to /var/cache/conftool/dbconfig/20240227-194021-ladsgroup.json
  • 19:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T347624, testing 961878 patch) xfer categories from wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 19:26 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.20 refs T354438
  • 18:57 tchin: finished deploying refinery successfully
  • 18:53 tchin@deploy2002: Finished deploy [analytics/refinery@ac9fd7b] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ac9fd7b4] (duration: 03m 42s)
  • 18:50 tchin@deploy2002: Started deploy [analytics/refinery@ac9fd7b] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ac9fd7b4]
  • 18:50 tchin@deploy2002: Finished deploy [analytics/refinery@ac9fd7b] (thin): Regular analytics weekly train THIN [analytics/refinery@ac9fd7b4] (duration: 00m 06s)
  • 18:49 tchin@deploy2002: Started deploy [analytics/refinery@ac9fd7b] (thin): Regular analytics weekly train THIN [analytics/refinery@ac9fd7b4]
  • 18:49 tchin@deploy2002: Finished deploy [analytics/refinery@ac9fd7b]: Regular analytics weekly train [analytics/refinery@ac9fd7b4] (duration: 00m 18s)
  • 18:49 tchin@deploy2002: Started deploy [analytics/refinery@ac9fd7b]: Regular analytics weekly train [analytics/refinery@ac9fd7b4]
  • 18:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-hd1003.eqiad.wmnet with OS bookworm
  • 18:48 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:48 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logging-hd1001.eqiad.wmnet with OS bookworm
  • 18:48 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host logging-hd1001.eqiad.wmnet with OS bookworm
  • 18:46 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-hd1001.eqiad.wmnet with OS bookworm
  • 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:38 tchin: rollbacked refinery deployment, failed on stat1010 and stat1011
  • 18:37 tchin@deploy2002: Finished deploy [analytics/refinery@ac9fd7b]: Regular analytics weekly train [analytics/refinery@ac9fd7b4] (duration: 09m 51s)
  • 18:27 tchin@deploy2002: Started deploy [analytics/refinery@ac9fd7b]: Regular analytics weekly train [analytics/refinery@ac9fd7b4]
  • 18:25 tchin: deploying refinery
  • 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host logging-hd1002.eqiad.wmnet with OS bookworm
  • 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-hd1003.eqiad.wmnet with reason: host reimage
  • 18:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-hd1001.eqiad.wmnet with reason: host reimage
  • 18:22 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 18:21 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 18:19 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-hd1003.eqiad.wmnet with reason: host reimage
  • 18:19 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-hd1001.eqiad.wmnet with reason: host reimage
  • 18:18 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 18:17 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 18:15 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 18:15 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 18:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logging-hd1002.eqiad.wmnet with reason: host reimage
  • 17:56 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on logging-hd1002.eqiad.wmnet with reason: host reimage
  • 17:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host logging-hd1001.eqiad.wmnet with OS bookworm
  • 17:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host logging-hd1003.eqiad.wmnet with OS bookworm
  • 17:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host logging-hd1002.eqiad.wmnet with OS bookworm
  • 17:23 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts contint1004.eqiad.wmnet
  • 17:23 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:22 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1002"
  • 17:19 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1002"
  • 17:14 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 17:09 dzahn@cumin1002: START - Cookbook sre.hosts.decommission for hosts contint1004.eqiad.wmnet
  • 17:08 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts contint1003.eqiad.wmnet
  • 17:08 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:08 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1002"
  • 17:07 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: contint1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1002"
  • 17:05 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58011 and previous config saved to /var/cache/conftool/dbconfig/20240227-170342-arnaudb.json
  • 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db2108 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58010 and previous config saved to /var/cache/conftool/dbconfig/20240227-170330-arnaudb.json
  • 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58009 and previous config saved to /var/cache/conftool/dbconfig/20240227-170312-arnaudb.json
  • 17:01 effie: pool citoid eqiad back
  • 17:01 dzahn@cumin1002: START - Cookbook sre.hosts.decommission for hosts contint1003.eqiad.wmnet
  • 17:01 jiji@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=citoid,name=eqiad
  • 16:51 claime: Repooling mw2324.codfw.wmnet,mw2323.codfw.wmnet,mw2259.codfw.wmnet,mw2261.codfw.wmnet,mw2262.codfw.wmnet,mw2263.codfw.wmnet,mw2264.codfw.wmnet,mw2265.codfw.wmnet,mw2266.codfw.wmnet,mw2268.codfw.wmnet,mw2269.codfw.wmnet,mw2270.codfw.wmnet,mw2314.codfw.wmnet,mw2315.codfw.wmnet,mw2316.codfw.wmnet,mw2320.codfw.wmnet,mw2321.codfw.wmnet,mw2322.codfw.wmnet for T355870
  • 16:49 claime: Uncordoning mw2260.codfw.wmnet mw2267.codfw.wmnet mw2310.codfw.wmnet mw2311.codfw.wmnet mw2312.codfw.wmnet mw2313.codfw.wmnet mw2317.codfw.wmnet mw2318.codfw.wmnet mw2319.codfw.wmnet kubernetes2030.codfw.wmnet kubernetes2029.codfw.wmnet kubernetes2057.codfw.wmnet for T355870
  • 16:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58008 and previous config saved to /var/cache/conftool/dbconfig/20240227-164837-arnaudb.json
  • 16:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db2108 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58007 and previous config saved to /var/cache/conftool/dbconfig/20240227-164825-arnaudb.json
  • 16:48 arnaudb@cumin1002: dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58006 and previous config saved to /var/cache/conftool/dbconfig/20240227-164808-arnaudb.json
  • 16:47 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host logging-hd1002.eqiad.wmnet with OS bookworm
  • 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58005 and previous config saved to /var/cache/conftool/dbconfig/20240227-163332-arnaudb.json
  • 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db2108 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58004 and previous config saved to /var/cache/conftool/dbconfig/20240227-163320-arnaudb.json
  • 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2196.codfw.wmnet with OS bookworm
  • 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58003 and previous config saved to /var/cache/conftool/dbconfig/20240227-163303-arnaudb.json
  • 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host logging-hd1002.eqiad.wmnet with OS bookworm
  • 16:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host logging-hd1001.eqiad.wmnet with OS bookworm
  • 16:23 fabfur: restarting pybal on lvs2014,lvs2011,lvs2012 and lvs2013 for T355544
  • 16:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1039.eqiad.wmnet with OS bookworm
  • 16:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:22 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58002 and previous config saved to /var/cache/conftool/dbconfig/20240227-161827-arnaudb.json
  • 16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db2108 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58001 and previous config saved to /var/cache/conftool/dbconfig/20240227-161815-arnaudb.json
  • 16:17 arnaudb@cumin1002: dbctl commit (dc=all): 'es2021 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P58000 and previous config saved to /var/cache/conftool/dbconfig/20240227-161758-arnaudb.json
  • 16:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2196.codfw.wmnet with reason: host reimage
  • 16:13 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2196.codfw.wmnet with reason: host reimage
  • 16:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1039.eqiad.wmnet with reason: host reimage
  • 16:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1039.eqiad.wmnet with reason: host reimage
  • 15:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2196.codfw.wmnet with OS bookworm
  • 15:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2196.codfw.wmnet with OS bookworm
  • 15:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2196.codfw.wmnet with OS bookworm
  • 15:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 36 hosts with reason: Migrating servers in codfw rack B3 to lsw1-b3-codfw
  • 15:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 36 hosts with reason: Migrating servers in codfw rack B3 to lsw1-b3-codfw
  • 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-b-codfw,cr[1-2]-codfw,lsw1-b3-codfw.mgmt with reason: prepping for server uplink migration codfw rack b3
  • 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on asw-b-codfw,cr[1-2]-codfw,lsw1-b3-codfw.mgmt with reason: prepping for server uplink migration codfw rack b3
  • 15:51 jiji@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM urldownloader1003.wikimedia.org
  • 15:46 jiji@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM urldownloader1003.wikimedia.org
  • 15:45 effie: reboot urldownloader1003 - T358597
  • 15:41 topranks: configuring lsw1-b3-codfw in advance of server migration T355870
  • 15:39 arnaudb@cumin1002: dbctl commit (dc=all): 'T355870 - depooling es2021 db2108 db2123', diff saved to https://phabricator.wikimedia.org/P57999 and previous config saved to /var/cache/conftool/dbconfig/20240227-153951-arnaudb.json
  • 15:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on db[2108,2123].codfw.wmnet,es2021.codfw.wmnet with reason: Silence for network maintenance T355870
  • 15:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on db[2108,2123].codfw.wmnet,es2021.codfw.wmnet with reason: Silence for network maintenance T355870
  • 15:24 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm2001.codfw.wmnet
  • 15:24 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:24 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cmooney@cumin1002"
  • 15:23 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cmooney@cumin1002"
  • 15:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1038.eqiad.wmnet with OS bookworm
  • 15:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:22 moritzm: copy prometheus-mcrouter-exporter from bullseye-wikimedia to bookworm-wikimedia T357748
  • 15:21 claime: Extending vg-root on remaining small disk codfw jobrunners
  • 15:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:20 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) es1040.eqiad.wmnet on all recursors
  • 15:20 volans@cumin1002: START - Cookbook sre.dns.wipe-cache es1040.eqiad.wmnet on all recursors
  • 15:20 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) es1039.eqiad.wmnet on all recursors
  • 15:20 volans@cumin1002: START - Cookbook sre.dns.wipe-cache es1039.eqiad.wmnet on all recursors
  • 15:20 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) es1038.eqiad.wmnet on all recursors
  • 15:20 volans@cumin1002: START - Cookbook sre.dns.wipe-cache es1038.eqiad.wmnet on all recursors
  • 15:20 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) es1037.eqiad.wmnet on all recursors
  • 15:20 volans@cumin1002: START - Cookbook sre.dns.wipe-cache es1037.eqiad.wmnet on all recursors
  • 15:20 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) es1036.eqiad.wmnet on all recursors
  • 15:20 volans@cumin1002: START - Cookbook sre.dns.wipe-cache es1036.eqiad.wmnet on all recursors
  • 15:20 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) es1035.eqiad.wmnet on all recursors
  • 15:20 volans@cumin1002: START - Cookbook sre.dns.wipe-cache es1035.eqiad.wmnet on all recursors
  • 15:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1037.eqiad.wmnet with OS bookworm
  • 15:19 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2220.codfw.wmnet on all recursors
  • 15:19 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:19 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2220.codfw.wmnet on all recursors
  • 15:19 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2219.codfw.wmnet on all recursors
  • 15:19 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2219.codfw.wmnet on all recursors
  • 15:19 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2218.codfw.wmnet on all recursors
  • 15:19 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2218.codfw.wmnet on all recursors
  • 15:19 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2217.codfw.wmnet on all recursors
  • 15:19 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2217.codfw.wmnet on all recursors
  • 15:19 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2216.codfw.wmnet on all recursors
  • 15:19 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2216.codfw.wmnet on all recursors
  • 15:19 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2215.codfw.wmnet on all recursors
  • 15:19 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:19 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2215.codfw.wmnet on all recursors
  • 15:19 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2214.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2214.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2213.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2213.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2212.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2212.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2211.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2211.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2210.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2210.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2209.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2209.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2208.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2208.codfw.wmnet on all recursors
  • 15:18 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2207.codfw.wmnet on all recursors
  • 15:17 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2207.codfw.wmnet on all recursors
  • 15:17 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:17 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2206.codfw.wmnet on all recursors
  • 15:17 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2206.codfw.wmnet on all recursors
  • 15:17 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2205.codfw.wmnet on all recursors
  • 15:17 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2205.codfw.wmnet on all recursors
  • 15:17 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2204.codfw.wmnet on all recursors
  • 15:17 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2204.codfw.wmnet on all recursors
  • 15:17 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2203.codfw.wmnet on all recursors
  • 15:17 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2203.codfw.wmnet on all recursors
  • 15:17 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2202.codfw.wmnet on all recursors
  • 15:17 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2202.codfw.wmnet on all recursors
  • 15:17 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2201.codfw.wmnet on all recursors
  • 15:16 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2201.codfw.wmnet on all recursors
  • 15:16 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2200.codfw.wmnet on all recursors
  • 15:16 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2200.codfw.wmnet on all recursors
  • 15:16 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2199.codfw.wmnet on all recursors
  • 15:16 claime: Cleaning up old tmp media files on codfw jobrunners
  • 15:16 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2199.codfw.wmnet on all recursors
  • 15:16 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2198.codfw.wmnet on all recursors
  • 15:16 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2198.codfw.wmnet on all recursors
  • 15:16 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2197.codfw.wmnet on all recursors
  • 15:16 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2197.codfw.wmnet on all recursors
  • 15:16 volans@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2196.codfw.wmnet on all recursors
  • 15:16 volans@cumin1002: START - Cookbook sre.dns.wipe-cache db2196.codfw.wmnet on all recursors
  • 15:13 cmooney@cumin1002: START - Cookbook sre.hosts.decommission for hosts testvm2001.codfw.wmnet
  • 15:11 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:11 volans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Deleted AAAA records from new DBs - volans@cumin1002"
  • 15:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1039.eqiad.wmnet with OS bookworm
  • 15:10 volans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Deleted AAAA records from new DBs - volans@cumin1002"
  • 15:10 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host es1039.eqiad.wmnet with OS bookworm
  • 15:08 volans@cumin1002: START - Cookbook sre.dns.netbox
  • 15:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1038.eqiad.wmnet with reason: host reimage
  • 15:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1038.eqiad.wmnet with reason: host reimage
  • 15:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1037.eqiad.wmnet with reason: host reimage
  • 15:00 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts sretest2004.codfw.wmnet
  • 15:00 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:00 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cmooney@cumin1002"
  • 14:59 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2004.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cmooney@cumin1002"
  • 14:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1037.eqiad.wmnet with reason: host reimage
  • 14:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 14:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1039.eqiad.wmnet with OS bookworm
  • 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
  • 14:52 claime: Drainining mw2260.codfw.wmnet mw2267.codfw.wmnet mw2310.codfw.wmnet mw2311.codfw.wmnet mw2312.codfw.wmnet mw2313.codfw.wmnet mw2317.codfw.wmnet mw2318.codfw.wmnet mw2319.codfw.wmnet kubernetes2030.codfw.wmnet kubernetes2029.codfw.wmnet kubernetes2057.codfw.wmnet for T355870
  • 14:52 cmooney@cumin1002: START - Cookbook sre.hosts.decommission for hosts sretest2004.codfw.wmnet
  • 14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
  • 14:50 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1038.eqiad.wmnet with OS bookworm
  • 14:50 cmooney@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2004.codfw.wmnet with OS bookworm
  • 14:47 claime: Depooling mw2324.codfw.wmnet,mw2323.codfw.wmnet,mw2259.codfw.wmnet,mw2261.codfw.wmnet,mw2262.codfw.wmnet,mw2263.codfw.wmnet,mw2264.codfw.wmnet,mw2265.codfw.wmnet,mw2266.codfw.wmnet,mw2268.codfw.wmnet,mw2269.codfw.wmnet,mw2270.codfw.wmnet,mw2314.codfw.wmnet,mw2315.codfw.wmnet,mw2316.codfw.wmnet,mw2320.codfw.wmnet,mw2321.codfw.wmnet,mw2322.codfw.wmnet for T355870
  • 14:45 claime: disregard previous depooling message for T355544
  • 14:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1037.eqiad.wmnet with OS bookworm
  • 14:41 volans: uploaded spicerack_8.4.0 to apt.wikimedia.org bullseye-wikimedia
  • 14:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 14:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 14:39 claime: depooling mw2325.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet,mw2330.codfw.wmnet,mw2331.codfw.wmnet,mw2332.codfw.wmnet,mw2333.codfw.wmnet,mw2334.codfw.wmnet for T355544
  • 14:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1035.eqiad.wmnet with OS bookworm
  • 14:36 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:35 claime: Adding 20G to root lv on mw2279
  • 14:33 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2004.codfw.wmnet with OS bookworm
  • 14:32 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2004.codfw.wmnet on all recursors
  • 14:32 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache sretest2004.codfw.wmnet on all recursors
  • 14:32 fabfur: restarting pybal on lvs2014,lvs2011,lvs2012 and lvs2013 for T355544
  • 14:29 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:29 jiji@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=citoid,name=eqiad
  • 14:28 effie: depool citoid eqiad
  • 14:28 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 14:27 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2004.wikimedia.org on all recursors
  • 14:27 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache sretest2004.wikimedia.org on all recursors
  • 14:27 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 14:27 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 14:24 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 14:24 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: sync
  • 14:23 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 14:23 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 14:22 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:22 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2004 - cmooney@cumin1002"
  • 14:20 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2004 - cmooney@cumin1002"
  • 14:19 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2001.codfw.wmnet
  • 14:19 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2001.codfw.wmnet with OS bookworm
  • 14:18 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 14:11 herron: pyrra upgraded to 0.7.4-2 T351111
  • 14:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 14:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 14:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 14:09 effie: force restarted all citoid pods in eqiad
  • 14:08 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync
  • 14:08 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: sync
  • 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 14:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 14:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 14:07 effie: force restarted all zotero pods in eqiad
  • 14:06 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 14:06 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: sync
  • 14:05 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2001.codfw.wmnet with reason: host reimage
  • 14:02 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2001.codfw.wmnet with reason: host reimage
  • 13:50 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2001.codfw.wmnet with OS bookworm
  • 13:19 XioNoX: remove unused 208.80.154.143/32 - 208.80.153.47/32 - 208.80.153.50/32 from Netbox
  • 13:17 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2001.codfw.wmnet - cmooney@cumin1002"
  • 13:16 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2001.codfw.wmnet - cmooney@cumin1002"
  • 13:16 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2001.codfw.wmnet on all recursors
  • 13:16 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache testvm2001.codfw.wmnet on all recursors
  • 13:16 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:16 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2001.codfw.wmnet - cmooney@cumin1002"
  • 13:15 taavi@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 13:14 taavi@cumin1002: conftool action : set/pooled=inactive; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 13:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2001.codfw.wmnet - cmooney@cumin1002"
  • 13:12 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 13:12 cmooney@cumin1002: START - Cookbook sre.ganeti.makevm for new host testvm2001.codfw.wmnet
  • 12:48 claime: restarting apache2 on mw2278
  • 12:42 claime: restarting apache2 on mw2281
  • 12:40 cgoubert@cumin2002: conftool action : set/weight=25; selector: cluster=jobrunner,dc=codfw,name=mw22(59|63|64|65|66|78|79|81).*
  • 12:39 cgoubert@cumin2002: conftool action : set/weight=25; selector: cluster=videoscaler,dc=codfw,name=mw22(59|63|64|65|66|78|79|81).*
  • 12:39 claime: rebalancing videoscaler cluster: all E5-2650 to weight 25
  • 12:31 claime: Lowered weight and restarted apache on mw2281.codfw.wmnet
  • 12:30 cgoubert@cumin2002: conftool action : set/pooled=yes; selector: name=mw2281.codfw.wmnet,cluster=videoscaler,dc=codfw
  • 12:30 cgoubert@cumin2002: conftool action : set/pooled=no; selector: name=mw2281.codfw.wmnet,cluster=videoscaler,dc=codfw
  • 12:29 cgoubert@cumin2002: conftool action : set/weight=20; selector: name=mw2281.codfw.wmnet,cluster=videoscaler,dc=codfw
  • 12:28 moritzm: installing perl security updates on bullseye
  • 12:23 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 12:23 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 12:23 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:22 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:22 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 12:22 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 12:21 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 12:20 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 12:18 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 12:17 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 12:17 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:15 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:15 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 12:14 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1003.wikimedia.org with reason: host reimage
  • 12:14 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 12:14 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 12:13 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 12:11 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1003.wikimedia.org with reason: host reimage
  • 12:10 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 12:10 kamila@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 12:09 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:09 kamila@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:09 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 12:08 kamila@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 12:08 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 12:08 kamila@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host es1025.eqiad.wmnet
  • 12:05 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 12:04 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 12:02 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1003.wikimedia.org with OS bookworm
  • 12:01 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1003.wikimedia.org - slyngshede@cumin1002"
  • 12:01 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1003.wikimedia.org - slyngshede@cumin1002"
  • 12:00 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test1003.wikimedia.org on all recursors
  • 12:00 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp-test1003.wikimedia.org on all recursors
  • 12:00 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:00 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1003.wikimedia.org - slyngshede@cumin1002"
  • 11:59 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1003.wikimedia.org - slyngshede@cumin1002"
  • 11:58 claime: Expanding root lv on mw2281,mw2278 by 20G
  • 11:57 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 11:57 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp-test1003.wikimedia.org
  • 11:52 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 11:51 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 11:47 akosiaris@deploy2002: Synchronized tests/src/ClusterConfigTest.php: (no justification provided) (duration: 09m 36s)
  • 11:46 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host es1025.eqiad.wmnet
  • 11:44 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 11:43 kamila@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 11:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 11:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 11:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 11:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2186.codfw.wmnet
  • 11:09 jynus@cumin1002: dbctl commit (dc=all): 'Repool db2117', diff saved to https://phabricator.wikimedia.org/P57997 and previous config saved to /var/cache/conftool/dbconfig/20240227-110952-jynus.json
  • 11:08 jynus@cumin1002: dbctl commit (dc=all): 'Depool db2117', diff saved to https://phabricator.wikimedia.org/P57996 and previous config saved to /var/cache/conftool/dbconfig/20240227-110828-jynus.json
  • 10:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2186.codfw.wmnet
  • 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2185.codfw.wmnet
  • 10:41 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2185.codfw.wmnet
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2173.codfw.wmnet
  • 10:20 marostegui@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1037.eqiad.wmnet with OS bookworm
  • 10:20 marostegui@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1039.eqiad.wmnet with OS bookworm
  • 10:17 marostegui@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1038.eqiad.wmnet with OS bookworm
  • 10:16 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1039.eqiad.wmnet with OS bookworm
  • 10:15 marostegui@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1039.eqiad.wmnet with OS bookworm
  • 10:14 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1037.eqiad.wmnet with OS bookworm
  • 10:13 marostegui@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1037.eqiad.wmnet with OS bookworm
  • 10:11 klausman@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 10:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1040.eqiad.wmnet with OS bookworm
  • 10:11 marostegui@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - marostegui@cumin1002"
  • 10:10 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2173.codfw.wmnet
  • 10:10 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1038.eqiad.wmnet with OS bookworm
  • 10:10 marostegui@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - marostegui@cumin1002"
  • 10:09 marostegui@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1038.eqiad.wmnet with OS bookworm
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db1151.eqiad.wmnet
  • 10:06 klausman@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 10:03 jnuche@deploy2002: Finished scap: Backport for In RequestContext::setUser() also reset $this->skinName (T336504) (duration: 10m 12s)
  • 10:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1036.eqiad.wmnet with OS bookworm
  • 10:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 10:02 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:01 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:01 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 10:01 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:01 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:01 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:00 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 09:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1040.eqiad.wmnet with reason: host reimage
  • 09:55 jnuche@deploy2002: jnuche and tstarling: Continuing with sync
  • 09:54 jnuche@deploy2002: jnuche and tstarling: Backport for In RequestContext::setUser() also reset $this->skinName (T336504) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1036.eqiad.wmnet with OS bookworm
  • 09:53 marostegui@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - marostegui@cumin1002"
  • 09:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1040.eqiad.wmnet with reason: host reimage
  • 09:53 jnuche@deploy2002: Started scap: Backport for In RequestContext::setUser() also reset $this->skinName (T336504)
  • 09:44 marostegui@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - marostegui@cumin1002"
  • 09:39 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1040.eqiad.wmnet with OS bookworm
  • 09:39 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1039.eqiad.wmnet with OS bookworm
  • 09:38 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1038.eqiad.wmnet with OS bookworm
  • 09:37 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1037.eqiad.wmnet with OS bookworm
  • 09:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1036.eqiad.wmnet with reason: host reimage
  • 09:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1036.eqiad.wmnet with reason: host reimage
  • 09:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1035.eqiad.wmnet with OS bookworm
  • 09:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db1151.eqiad.wmnet
  • 09:14 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1036.eqiad.wmnet with OS bookworm
  • 09:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1035.eqiad.wmnet with reason: host reimage
  • 09:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1035.eqiad.wmnet with reason: host reimage
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2134.codfw.wmnet
  • 08:56 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1035.eqiad.wmnet with OS bookworm
  • 08:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2134.codfw.wmnet
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: After migration to 10.6 T358180', diff saved to https://phabricator.wikimedia.org/P57995 and previous config saved to /var/cache/conftool/dbconfig/20240227-085113-root.json
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host db2132.codfw.wmnet
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: After migration to 10.6 T358180', diff saved to https://phabricator.wikimedia.org/P57994 and previous config saved to /var/cache/conftool/dbconfig/20240227-083608-root.json
  • 08:36 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host db2132.codfw.wmnet
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: After migration to 10.6 T358180', diff saved to https://phabricator.wikimedia.org/P57992 and previous config saved to /var/cache/conftool/dbconfig/20240227-080559-root.json
  • 08:05 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s7
  • 08:05 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s2
  • 08:00 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host dbproxy2001.codfw.wmnet
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: After migration to 10.6 T358180', diff saved to https://phabricator.wikimedia.org/P57991 and previous config saved to /var/cache/conftool/dbconfig/20240227-075054-root.json
  • 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 5%: After migration to 10.6 T358180', diff saved to https://phabricator.wikimedia.org/P57990 and previous config saved to /var/cache/conftool/dbconfig/20240227-073549-root.json
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 1%: After migration to 10.6 T358180', diff saved to https://phabricator.wikimedia.org/P57989 and previous config saved to /var/cache/conftool/dbconfig/20240227-072044-root.json
  • 07:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2029.codfw.wmnet with OS bookworm
  • 06:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:42 XioNoX: Netbox: set ENFORCE_GLOBAL_UNIQUE to True - T336275
  • 06:41 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es2029.codfw.wmnet with OS bookworm
  • 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2029 T358180', diff saved to https://phabricator.wikimedia.org/P57988 and previous config saved to /var/cache/conftool/dbconfig/20240227-063707-root.json
  • 06:35 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s2
  • 06:35 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s7
  • 06:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Master upgrade x2 T353499
  • 06:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Master upgrade x2 T353499
  • 06:06 kart_: cxserver: Removed dictionary support
  • 05:49 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:48 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:46 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:46 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 04:56 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.20 refs T354438 (duration: 52m 18s)
  • 04:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T357189)', diff saved to https://phabricator.wikimedia.org/P57987 and previous config saved to /var/cache/conftool/dbconfig/20240227-042703-arnaudb.json
  • 04:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P57986 and previous config saved to /var/cache/conftool/dbconfig/20240227-041156-arnaudb.json
  • 04:04 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.20 refs T354438
  • 04:02 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.17 (duration: 02m 00s)
  • 03:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P57985 and previous config saved to /var/cache/conftool/dbconfig/20240227-035650-arnaudb.json
  • 03:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T357189)', diff saved to https://phabricator.wikimedia.org/P57984 and previous config saved to /var/cache/conftool/dbconfig/20240227-034144-arnaudb.json
  • 03:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T357189)', diff saved to https://phabricator.wikimedia.org/P57983 and previous config saved to /var/cache/conftool/dbconfig/20240227-032037-arnaudb.json
  • 03:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 03:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 03:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T357189)', diff saved to https://phabricator.wikimedia.org/P57982 and previous config saved to /var/cache/conftool/dbconfig/20240227-032015-arnaudb.json
  • 03:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P57981 and previous config saved to /var/cache/conftool/dbconfig/20240227-030508-arnaudb.json
  • 02:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P57980 and previous config saved to /var/cache/conftool/dbconfig/20240227-025002-arnaudb.json
  • 02:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T357189)', diff saved to https://phabricator.wikimedia.org/P57979 and previous config saved to /var/cache/conftool/dbconfig/20240227-023456-arnaudb.json
  • 02:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T357189)', diff saved to https://phabricator.wikimedia.org/P57978 and previous config saved to /var/cache/conftool/dbconfig/20240227-021357-arnaudb.json
  • 02:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 02:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 02:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T357189)', diff saved to https://phabricator.wikimedia.org/P57977 and previous config saved to /var/cache/conftool/dbconfig/20240227-021333-arnaudb.json
  • 01:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P57976 and previous config saved to /var/cache/conftool/dbconfig/20240227-015827-arnaudb.json
  • 01:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P57975 and previous config saved to /var/cache/conftool/dbconfig/20240227-014321-arnaudb.json
  • 01:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T357189)', diff saved to https://phabricator.wikimedia.org/P57974 and previous config saved to /var/cache/conftool/dbconfig/20240227-012814-arnaudb.json
  • 01:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T357189)', diff saved to https://phabricator.wikimedia.org/P57973 and previous config saved to /var/cache/conftool/dbconfig/20240227-010344-arnaudb.json
  • 01:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 01:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 01:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T357189)', diff saved to https://phabricator.wikimedia.org/P57972 and previous config saved to /var/cache/conftool/dbconfig/20240227-010321-arnaudb.json
  • 00:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P57971 and previous config saved to /var/cache/conftool/dbconfig/20240227-004815-arnaudb.json
  • 00:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P57970 and previous config saved to /var/cache/conftool/dbconfig/20240227-003309-arnaudb.json
  • 00:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T357189)', diff saved to https://phabricator.wikimedia.org/P57969 and previous config saved to /var/cache/conftool/dbconfig/20240227-001802-arnaudb.json
  • 00:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1035.eqiad.wmnet with reason: host reimage
  • 00:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1035.eqiad.wmnet with reason: host reimage

2024-02-26

  • 23:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1035.eqiad.wmnet with OS bookworm
  • 23:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T357189)', diff saved to https://phabricator.wikimedia.org/P57968 and previous config saved to /var/cache/conftool/dbconfig/20240226-235539-arnaudb.json
  • 23:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 23:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 23:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T357189)', diff saved to https://phabricator.wikimedia.org/P57967 and previous config saved to /var/cache/conftool/dbconfig/20240226-235500-arnaudb.json
  • 23:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P57966 and previous config saved to /var/cache/conftool/dbconfig/20240226-233953-arnaudb.json
  • 23:26 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-redacteddb1001.eqiad.wmnet with OS bookworm
  • 23:26 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
  • 23:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P57965 and previous config saved to /var/cache/conftool/dbconfig/20240226-232443-arnaudb.json
  • 23:11 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
  • 23:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T357189)', diff saved to https://phabricator.wikimedia.org/P57964 and previous config saved to /var/cache/conftool/dbconfig/20240226-230934-arnaudb.json
  • 23:06 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - ryankemper@cumin2002 - T356651
  • 23:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1040.eqiad.wmnet with reason: host reimage
  • 22:57 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-redacteddb1001.eqiad.wmnet with reason: host reimage
  • 22:55 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1040.eqiad.wmnet with reason: host reimage
  • 22:54 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-redacteddb1001.eqiad.wmnet with reason: host reimage
  • 22:46 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - ryankemper@cumin2002 - T356651
  • 22:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T357189)', diff saved to https://phabricator.wikimedia.org/P57963 and previous config saved to /var/cache/conftool/dbconfig/20240226-224557-arnaudb.json
  • 22:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 22:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 22:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1036.eqiad.wmnet with reason: host reimage
  • 22:42 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1036.eqiad.wmnet with reason: host reimage
  • 22:42 TimStarling: on snapshot1010 killed PHP processes left over from kill -9 of python parents T358458
  • 22:42 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-redacteddb1001.eqiad.wmnet with OS bookworm
  • 22:41 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-redacteddb1001.eqiad.wmnet with OS bookworm
  • 22:38 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1040.eqiad.wmnet with OS bookworm
  • 22:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: cloudelastic restart
  • 22:28 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: cloudelastic restart
  • 22:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1035.eqiad.wmnet with reason: host reimage
  • 22:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 22:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 22:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T357189)', diff saved to https://phabricator.wikimedia.org/P57962 and previous config saved to /var/cache/conftool/dbconfig/20240226-222435-arnaudb.json
  • 22:24 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1035.eqiad.wmnet with reason: host reimage
  • 22:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1036.eqiad.wmnet with OS bookworm
  • 22:18 ryankemper@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - ryankemper@cumin2002 - T356651
  • 22:15 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P57961 and previous config saved to /var/cache/conftool/dbconfig/20240226-220928-arnaudb.json
  • 22:06 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (2 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic plugin upgrade - ryankemper@cumin2002 - T356651
  • 22:02 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 37s)
  • 21:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1035.eqiad.wmnet with OS bookworm
  • 21:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P57960 and previous config saved to /var/cache/conftool/dbconfig/20240226-215422-arnaudb.json
  • 21:54 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 26s)
  • 21:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T357189)', diff saved to https://phabricator.wikimedia.org/P57959 and previous config saved to /var/cache/conftool/dbconfig/20240226-213916-arnaudb.json
  • 21:38 cjming@deploy2002: Finished scap: Backport for Fix regression in WebM transcodes breaking audio (T358342) (duration: 11m 14s)
  • 21:30 cjming@deploy2002: cjming and bvibber: Continuing with sync
  • 21:29 cjming@deploy2002: cjming and bvibber: Backport for Fix regression in WebM transcodes breaking audio (T358342) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:27 cjming@deploy2002: Started scap: Backport for Fix regression in WebM transcodes breaking audio (T358342)
  • 21:22 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host contint1004.eqiad.wmnet with OS bullseye
  • 21:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2109 (T357189)', diff saved to https://phabricator.wikimedia.org/P57958 and previous config saved to /var/cache/conftool/dbconfig/20240226-211619-arnaudb.json
  • 21:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 21:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 21:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T357189)', diff saved to https://phabricator.wikimedia.org/P57957 and previous config saved to /var/cache/conftool/dbconfig/20240226-211557-arnaudb.json
  • 21:10 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on contint1004.eqiad.wmnet with reason: host reimage
  • 21:07 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on contint1004.eqiad.wmnet with reason: host reimage
  • 21:02 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:02 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P57956 and previous config saved to /var/cache/conftool/dbconfig/20240226-210050-arnaudb.json
  • 20:58 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host contint1004.eqiad.wmnet with OS bullseye
  • 20:58 dzahn@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host contint1004.eqiad.wmnet
  • 20:57 dzahn@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host contint1004.eqiad.wmnet with OS bullseye
  • 20:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1040.eqiad.wmnet with OS bookworm
  • 20:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1039.eqiad.wmnet with OS bookworm
  • 20:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1038.eqiad.wmnet with OS bookworm
  • 20:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1036.eqiad.wmnet with OS bookworm
  • 20:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1037.eqiad.wmnet with OS bookworm
  • 20:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P57955 and previous config saved to /var/cache/conftool/dbconfig/20240226-204544-arnaudb.json
  • 20:44 mutante: T358237 used the next hostname number,1004, to avoid the duplicate IP issue. makevm cookbook is at attempt 103/240 to detect a reboot of the VM and uptime just keeps going up. used the "gnt-instance console --show-cmd " trick to get a console despite https://phabricator.wikimedia.org/T309724 - was missing partman config
  • 20:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1035.eqiad.wmnet with OS bookworm
  • 20:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T357189)', diff saved to https://phabricator.wikimedia.org/P57954 and previous config saved to /var/cache/conftool/dbconfig/20240226-203038-arnaudb.json
  • 20:19 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host contint1004.eqiad.wmnet with OS bullseye
  • 20:18 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2003.codfw.wmnet with OS bookworm
  • 20:18 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM contint1004.eqiad.wmnet - dzahn@cumin1002"
  • 20:17 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM contint1004.eqiad.wmnet - dzahn@cumin1002"
  • 20:17 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) contint1004.eqiad.wmnet on all recursors
  • 20:17 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache contint1004.eqiad.wmnet on all recursors
  • 20:17 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:17 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM contint1004.eqiad.wmnet - dzahn@cumin1002"
  • 20:16 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM contint1004.eqiad.wmnet - dzahn@cumin1002"
  • 20:14 sukhe: running dummy authdns-update
  • 20:12 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 20:12 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host contint1004.eqiad.wmnet
  • 20:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2105 (T357189)', diff saved to https://phabricator.wikimedia.org/P57953 and previous config saved to /var/cache/conftool/dbconfig/20240226-200734-arnaudb.json
  • 20:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 20:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 20:07 bblack@cumin1002: conftool action : set/pooled=no; selector: cluster=dnsbox,service=authdns-update,name=dns3001.wikimedia.org
  • 20:03 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 20:02 bblack@cumin1002: conftool action : set/pooled=yes; selector: cluster=dnsbox,service=authdns-update,name=dns3003.wikimedia.org
  • 20:01 bblack@cumin1002: conftool action : set/pooled=no; selector: cluster=dnsbox,service=authdns-update,name=dns3003.wikimedia.org
  • 20:00 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 20:00 bblack@cumin1002: conftool action : set/pooled=no; selector: cluster=dnsbox,service=authdns-update,name=dns3001.wikimedia.org
  • 19:59 bblack@cumin1002: conftool action : set/pooled=yes; selector: cluster=dnsbox,service=authdns-update,name=dns6002.wikimedia.org
  • 19:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:45 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bookworm
  • 19:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 19:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 19:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T357189)', diff saved to https://phabricator.wikimedia.org/P57952 and previous config saved to /var/cache/conftool/dbconfig/20240226-194427-arnaudb.json
  • 19:43 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=authdns-update
  • 19:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1040.eqiad.wmnet with OS bookworm
  • 19:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1039.eqiad.wmnet with OS bookworm
  • 19:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1038.eqiad.wmnet with OS bookworm
  • 19:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1036.eqiad.wmnet with OS bookworm
  • 19:30 cmooney@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2004.wikimedia.org with OS bookworm
  • 19:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2003.codfw.wmnet with OS bookworm
  • 19:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P57951 and previous config saved to /var/cache/conftool/dbconfig/20240226-192920-arnaudb.json
  • 19:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1037.eqiad.wmnet with OS bookworm
  • 19:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1035.eqiad.wmnet with OS bookworm
  • 19:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6002.wikimedia.org,service=authdns-update
  • 19:15 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 19:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P57950 and previous config saved to /var/cache/conftool/dbconfig/20240226-191414-arnaudb.json
  • 19:13 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
  • 19:11 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 19:10 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync - dzahn@cumin1002"
  • 19:09 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync - dzahn@cumin1002"
  • 19:09 mutante: decom cookbook finishes with 0 but does not remove DNS record of virtual machine T358237
  • 19:06 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:06 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:05 dzahn@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host contint1003.eqiad.wmnet
  • 19:04 dzahn@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 19:04 mutante: T358237 - makevm cookbook was interrupted by accident. re-running it would create a second IP with the same DNS name, running decom cookbook also fails, stuck
  • 19:02 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 19:02 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host contint1003.eqiad.wmnet
  • 19:02 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1040
  • 19:02 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host es1040
  • 19:02 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1039
  • 19:01 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1038
  • 19:01 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host es1039
  • 19:01 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host es1038
  • 19:01 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1037
  • 19:01 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1036
  • 19:01 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host es1037
  • 19:00 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host es1036
  • 19:00 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es1035
  • 19:00 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host es1035
  • 18:59 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:59 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for es1036-40 - jclark@cumin1002"
  • 18:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T357189)', diff saved to https://phabricator.wikimedia.org/P57949 and previous config saved to /var/cache/conftool/dbconfig/20240226-185907-arnaudb.json
  • 18:56 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for es1036-40 - jclark@cumin1002"
  • 18:55 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host contint1003.eqiad.wmnet with OS bullseye
  • 18:54 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 18:51 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bookworm
  • 18:49 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2004.wikimedia.org with OS bookworm
  • 18:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T357189)', diff saved to https://phabricator.wikimedia.org/P57948 and previous config saved to /var/cache/conftool/dbconfig/20240226-184903-arnaudb.json
  • 18:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 18:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 18:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T357189)', diff saved to https://phabricator.wikimedia.org/P57947 and previous config saved to /var/cache/conftool/dbconfig/20240226-184841-arnaudb.json
  • 18:48 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2004.wikimedia.org
  • 18:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host es1038.eqiad.wmnet with OS bookworm
  • 18:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host es1039.eqiad.wmnet with OS bookworm
  • 18:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host es1040.eqiad.wmnet with OS bookworm
  • 18:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host es1037.eqiad.wmnet with OS bookworm
  • 18:42 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2004.wikimedia.org
  • 18:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host es1035.eqiad.wmnet with OS bookworm
  • 18:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host es1036.eqiad.wmnet with OS bookworm
  • 18:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P57946 and previous config saved to /var/cache/conftool/dbconfig/20240226-183334-arnaudb.json
  • 18:29 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:29 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:28 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2004.wikimedia.org with OS bookworm
  • 18:19 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:18 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P57945 and previous config saved to /var/cache/conftool/dbconfig/20240226-181827-arnaudb.json
  • 18:16 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:16 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:14 Daimona: T357007 Running mwscript CampaignEvents:GenerateInvitationList --wiki=metawiki --listfile=/home/daimona/list.txt
  • 18:13 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host contint1003.eqiad.wmnet with OS bullseye
  • 18:11 dzahn@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host contint1003.eqiad.wmnet
  • 18:11 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) contint1003.eqiad.wmnet on all recursors
  • 18:11 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache contint1003.eqiad.wmnet on all recursors
  • 18:11 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:09 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 18:09 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) contint1003.eqiad.wmnet on all recursors
  • 18:09 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache contint1003.eqiad.wmnet on all recursors
  • 18:09 dzahn@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:07 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 18:07 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host contint1003.eqiad.wmnet
  • 18:07 dzahn@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host contint1003.eqiad.wmnet
  • 18:07 dzahn@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host contint1003.eqiad.wmnet with OS bullseye
  • 18:06 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host contint1003.eqiad.wmnet with OS bullseye
  • 18:06 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM contint1003.eqiad.wmnet - dzahn@cumin1002"
  • 18:06 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM contint1003.eqiad.wmnet - dzahn@cumin1002"
  • 18:06 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) contint1003.eqiad.wmnet on all recursors
  • 18:05 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache contint1003.eqiad.wmnet on all recursors
  • 18:05 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:05 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM contint1003.eqiad.wmnet - dzahn@cumin1002"
  • 18:04 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM contint1003.eqiad.wmnet - dzahn@cumin1002"
  • 18:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T357189)', diff saved to https://phabricator.wikimedia.org/P57944 and previous config saved to /var/cache/conftool/dbconfig/20240226-180321-arnaudb.json
  • 18:01 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 18:00 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host contint1003.eqiad.wmnet
  • 17:59 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:58 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 17:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T357189)', diff saved to https://phabricator.wikimedia.org/P57943 and previous config saved to /var/cache/conftool/dbconfig/20240226-175315-arnaudb.json
  • 17:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 17:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 17:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T357189)', diff saved to https://phabricator.wikimedia.org/P57942 and previous config saved to /var/cache/conftool/dbconfig/20240226-175231-arnaudb.json
  • 17:51 sukhe: running dummy authdns-update to confirm working ferm rules
  • 17:41 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:38 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:38 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P57941 and previous config saved to /var/cache/conftool/dbconfig/20240226-173725-arnaudb.json
  • 17:35 denisse: Enabled meta-monitoring for alert1001 - T333615
  • 17:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1040.eqiad.wmnet with OS bookworm
  • 17:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1039.eqiad.wmnet with OS bookworm
  • 17:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1038.eqiad.wmnet with OS bookworm
  • 17:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1037.eqiad.wmnet with OS bookworm
  • 17:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1036.eqiad.wmnet with OS bookworm
  • 17:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host es1035.eqiad.wmnet with OS bookworm
  • 17:22 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on alert2001.wikimedia.org with reason: host reimage
  • 17:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P57940 and previous config saved to /var/cache/conftool/dbconfig/20240226-172218-arnaudb.json
  • 17:18 denisse@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on alert2001.wikimedia.org with reason: host reimage
  • 17:16 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T357189)', diff saved to https://phabricator.wikimedia.org/P57939 and previous config saved to /var/cache/conftool/dbconfig/20240226-170712-arnaudb.json
  • 17:05 vriley@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['logging-hd1001']
  • 17:04 vriley@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['logging-hd1001']
  • 17:04 vriley@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:03 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host logging-hd1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:00 denisse@cumin2002: START - Cookbook sre.hosts.reimage for host alert2001.wikimedia.org with OS bookworm
  • 16:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T357189)', diff saved to https://phabricator.wikimedia.org/P57938 and previous config saved to /var/cache/conftool/dbconfig/20240226-165730-arnaudb.json
  • 16:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 16:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 16:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T357189)', diff saved to https://phabricator.wikimedia.org/P57937 and previous config saved to /var/cache/conftool/dbconfig/20240226-165707-arnaudb.json
  • 16:55 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host logging-hd1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:55 sukhe: sudo cumin 'A:dns-rec and not P{dns6001*}' "run-puppet-agent --enable 'merging CR'"
  • 16:54 sukhe: re-enable Puppet on A:dns-rec and run agent
  • 16:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:47 sukhe: disable puppet on A:dns-rec to merge CR 1006532
  • 16:46 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:46 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt logging-hd1003 - vriley@cumin1002"
  • 16:46 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: cluster=dnsbox
  • 16:46 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt logging-hd1003 - vriley@cumin1002"
  • 16:45 vriley@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:43 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 16:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P57936 and previous config saved to /var/cache/conftool/dbconfig/20240226-164201-arnaudb.json
  • 16:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:39 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:38 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 16:37 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:37 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt logging-hd1002 - vriley@cumin1002"
  • 16:36 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt logging-hd1002 - vriley@cumin1002"
  • 16:34 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 16:30 denisse@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host alert2001.wikimedia.org with OS bookworm
  • 16:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P57935 and previous config saved to /var/cache/conftool/dbconfig/20240226-162655-arnaudb.json
  • 16:23 vriley@cumin1002: START - Cookbook sre.hosts.provision for host logging-hd1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:22 sukhe: etcd: purging /conftool/v1/dnsbox: old schema, deprecated: T347054
  • 16:20 vriley@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:19 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 16:18 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es1037']
  • 16:13 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es1037']
  • 16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T357189)', diff saved to https://phabricator.wikimedia.org/P57933 and previous config saved to /var/cache/conftool/dbconfig/20240226-161148-arnaudb.json
  • 16:09 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2004.wikimedia.org with OS bookworm
  • 16:05 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es1035']
  • 16:05 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es1035']
  • 16:04 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es1036']
  • 16:04 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es1036']
  • 16:04 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es1037']
  • 16:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es1035']
  • 16:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es1036']
  • 16:03 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es1037']
  • 16:03 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es1036']
  • 16:02 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es1035']
  • 16:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T357189)', diff saved to https://phabricator.wikimedia.org/P57932 and previous config saved to /var/cache/conftool/dbconfig/20240226-160206-arnaudb.json
  • 16:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 16:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 16:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T357189)', diff saved to https://phabricator.wikimedia.org/P57931 and previous config saved to /var/cache/conftool/dbconfig/20240226-160143-arnaudb.json
  • 15:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:59 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P57930 and previous config saved to /var/cache/conftool/dbconfig/20240226-154637-arnaudb.json
  • 15:45 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P57929 and previous config saved to /var/cache/conftool/dbconfig/20240226-153131-arnaudb.json
  • 15:28 klausman@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 15:27 klausman@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 15:25 klausman@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 15:25 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@b115452]: Deploy Refine job POC on test cluster - update 3 (duration: 00m 12s)
  • 15:25 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@b115452]: Deploy Refine job POC on test cluster - update 3
  • 15:23 klausman@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 15:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:18 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:17 klausman@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 15:16 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 15:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T357189)', diff saved to https://phabricator.wikimedia.org/P57928 and previous config saved to /var/cache/conftool/dbconfig/20240226-151624-arnaudb.json
  • 15:16 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:16 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for es1036-40 - jclark@cumin1002"
  • 15:15 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@b115452]: Deploy Refine job POC on test cluster - update 2 (duration: 00m 12s)
  • 15:15 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@b115452]: Deploy Refine job POC on test cluster - update 2
  • 15:15 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for es1036-40 - jclark@cumin1002"
  • 15:13 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 15:13 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 15:12 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 15:11 denisse@cumin2002: START - Cookbook sre.hosts.reimage for host alert2001.wikimedia.org with OS bookworm
  • 15:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T357189)', diff saved to https://phabricator.wikimedia.org/P57927 and previous config saved to /var/cache/conftool/dbconfig/20240226-150639-arnaudb.json
  • 15:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 15:06 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 15:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T357189)', diff saved to https://phabricator.wikimedia.org/P57926 and previous config saved to /var/cache/conftool/dbconfig/20240226-150606-arnaudb.json
  • 15:03 denisse: Disabling meta-monitoring for the alert hosts - T333615
  • 15:02 denisse: Disabling meta-monitoring for the alert hosts - T333615
  • 14:52 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 14:52 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 14:51 fabfur: repooled and reactivate puppet on cp4037 to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1006489 (T358105)
  • 14:51 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:51 fabfur@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=(cdn|ats-be)
  • 14:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P57925 and previous config saved to /var/cache/conftool/dbconfig/20240226-145059-arnaudb.json
  • 14:48 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:48 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 14:48 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 14:48 cgoubert@deploy2002: Finished scap: Backport for Enable $wgLocalHTTPProxy on all wikis (T298265) (duration: 13m 24s)
  • 14:47 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 14:47 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:46 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 14:46 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:46 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 14:42 fabfur: depooled and deactivated puppet on cp4037 to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1006489 (T358105)
  • 14:39 cgoubert@deploy2002: cgoubert: Continuing with sync
  • 14:36 cgoubert@deploy2002: cgoubert: Backport for Enable $wgLocalHTTPProxy on all wikis (T298265) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P57924 and previous config saved to /var/cache/conftool/dbconfig/20240226-143553-arnaudb.json
  • 14:34 cgoubert@deploy2002: Started scap: Backport for Enable $wgLocalHTTPProxy on all wikis (T298265)
  • 14:26 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for Remove the Collection extension from wikisource (T358437) (duration: 11m 49s)
  • 14:23 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2004.wikimedia.org with OS bookworm
  • 14:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T357189)', diff saved to https://phabricator.wikimedia.org/P57923 and previous config saved to /var/cache/conftool/dbconfig/20240226-142046-arnaudb.json
  • 14:17 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and soda: Continuing with sync
  • 14:15 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and soda: Backport for Remove the Collection extension from wikisource (T358437) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:14 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for Remove the Collection extension from wikisource (T358437)
  • 14:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T357189)', diff saved to https://phabricator.wikimedia.org/P57922 and previous config saved to /var/cache/conftool/dbconfig/20240226-141107-arnaudb.json
  • 14:11 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 14:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 13:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 13:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 13:09 claime: trafficserver: move 50% of traffic to mw on k8s - T357507
  • 13:06 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 13:06 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2004.wikimedia.org with OS bookworm
  • 13:06 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 13:06 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2004.wikimedia.org on all recursors
  • 13:05 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache sretest2004.wikimedia.org on all recursors
  • 13:04 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 13:04 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 13:04 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:04 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:04 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 13:04 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:04 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2004 - cmooney@cumin1002"
  • 13:03 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 13:03 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2004 - cmooney@cumin1002"
  • 13:00 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 12:55 Dreamy_Jazz: Restarting MediaModeration scanning maintenance script - See https://wikitech.wikimedia.org/wiki/MediaModeration
  • 12:07 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-redacteddb1001.eqiad.wmnet with reason: host reimage
  • 12:04 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-redacteddb1001.eqiad.wmnet with reason: host reimage
  • 11:44 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-redacteddb1001.eqiad.wmnet with OS bookworm
  • 11:42 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-redacteddb1001.eqiad.wmnet with OS bookworm
  • 11:41 claime: Restarting failed mediawiki_job_generatecaptcha
  • 11:20 Lucas_WMDE: STOP persistRevisionThreadItems on viwiki for T315510 again, tons of errors (didn’t even respond to Ctrl+C so I `sudo -u www-data kill`’ed it)
  • 11:18 fabfur: enabled puppet on 'A:cp' to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1005548 (T358105, T358107)
  • 11:18 btullis@cumin1002: END (ERROR) - Cookbook sre.presto.roll-restart-workers (exit_code=97) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 11:13 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 10:47 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-redacteddb1001.eqiad.wmnet with OS bookworm
  • 10:36 fabfur: enabled puppet on 'A:cp-ulsfo' to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1005548 (T358105, T358107)
  • 10:29 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2442.codfw.wmnet
  • 10:27 taavi: upgrading wikitech-static to mediawiki 1.41 T357880
  • 10:07 moritzm: installing perl security updates
  • 10:04 fabfur: disabled puppet on all cp hosts to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1005548 (T358105, T358107)
  • 09:23 Emperor: unmute the outbound port utilisation over 80% alert T358455
  • 09:12 jayme@cumin1002: START - Cookbook sre.hosts.reboot-single for host mw2442.codfw.wmnet
  • 09:10 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mw2442.codfw.wmnet
  • 09:10 jayme@cumin1002: START - Cookbook sre.hosts.reboot-single for host mw2442.codfw.wmnet
  • 09:00 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on etherpad1003.eqiad.wmnet with reason: Upgrade etherpad and switch to bookworm
  • 09:00 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on etherpad1003.eqiad.wmnet with reason: Upgrade etherpad and switch to bookworm
  • 08:58 slyngs: IDP switchover to idp2002
  • 08:51 XioNoX: deploy "facebookexternalhit" varnish 403 - T358455

2024-02-25

  • 22:47 Emperor: mute the outbound port utilisation over 80% alert
  • 00:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T357189)', diff saved to https://phabricator.wikimedia.org/P57920 and previous config saved to /var/cache/conftool/dbconfig/20240225-005423-arnaudb.json
  • 00:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P57919 and previous config saved to /var/cache/conftool/dbconfig/20240225-003916-arnaudb.json
  • 00:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P57918 and previous config saved to /var/cache/conftool/dbconfig/20240225-002410-arnaudb.json
  • 00:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T357189)', diff saved to https://phabricator.wikimedia.org/P57917 and previous config saved to /var/cache/conftool/dbconfig/20240225-000904-arnaudb.json

2024-02-24

  • 23:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T357189)', diff saved to https://phabricator.wikimedia.org/P57916 and previous config saved to /var/cache/conftool/dbconfig/20240224-230912-arnaudb.json
  • 23:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 23:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 23:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T357189)', diff saved to https://phabricator.wikimedia.org/P57915 and previous config saved to /var/cache/conftool/dbconfig/20240224-230850-arnaudb.json
  • 22:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P57914 and previous config saved to /var/cache/conftool/dbconfig/20240224-225343-arnaudb.json
  • 22:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P57913 and previous config saved to /var/cache/conftool/dbconfig/20240224-223837-arnaudb.json
  • 22:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T357189)', diff saved to https://phabricator.wikimedia.org/P57912 and previous config saved to /var/cache/conftool/dbconfig/20240224-222331-arnaudb.json
  • 21:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T357189)', diff saved to https://phabricator.wikimedia.org/P57911 and previous config saved to /var/cache/conftool/dbconfig/20240224-212414-arnaudb.json
  • 21:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 21:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 21:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 21:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 21:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T357189)', diff saved to https://phabricator.wikimedia.org/P57910 and previous config saved to /var/cache/conftool/dbconfig/20240224-212336-arnaudb.json
  • 21:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P57909 and previous config saved to /var/cache/conftool/dbconfig/20240224-210830-arnaudb.json
  • 20:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P57908 and previous config saved to /var/cache/conftool/dbconfig/20240224-205323-arnaudb.json
  • 20:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T357189)', diff saved to https://phabricator.wikimedia.org/P57907 and previous config saved to /var/cache/conftool/dbconfig/20240224-203816-arnaudb.json
  • 19:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T357189)', diff saved to https://phabricator.wikimedia.org/P57906 and previous config saved to /var/cache/conftool/dbconfig/20240224-193712-arnaudb.json
  • 19:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 19:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 19:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T357189)', diff saved to https://phabricator.wikimedia.org/P57905 and previous config saved to /var/cache/conftool/dbconfig/20240224-193651-arnaudb.json
  • 19:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P57904 and previous config saved to /var/cache/conftool/dbconfig/20240224-192144-arnaudb.json
  • 19:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P57903 and previous config saved to /var/cache/conftool/dbconfig/20240224-190638-arnaudb.json
  • 18:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T357189)', diff saved to https://phabricator.wikimedia.org/P57902 and previous config saved to /var/cache/conftool/dbconfig/20240224-185132-arnaudb.json
  • 17:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T357189)', diff saved to https://phabricator.wikimedia.org/P57901 and previous config saved to /var/cache/conftool/dbconfig/20240224-174941-arnaudb.json
  • 17:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 17:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 16:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T357189)', diff saved to https://phabricator.wikimedia.org/P57900 and previous config saved to /var/cache/conftool/dbconfig/20240224-165636-arnaudb.json
  • 16:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P57899 and previous config saved to /var/cache/conftool/dbconfig/20240224-164129-arnaudb.json
  • 16:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P57898 and previous config saved to /var/cache/conftool/dbconfig/20240224-162623-arnaudb.json
  • 16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T357189)', diff saved to https://phabricator.wikimedia.org/P57897 and previous config saved to /var/cache/conftool/dbconfig/20240224-161117-arnaudb.json
  • 15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2137 (T357189)', diff saved to https://phabricator.wikimedia.org/P57896 and previous config saved to /var/cache/conftool/dbconfig/20240224-151234-arnaudb.json
  • 15:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 15:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T357189)', diff saved to https://phabricator.wikimedia.org/P57895 and previous config saved to /var/cache/conftool/dbconfig/20240224-151212-arnaudb.json
  • 14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P57894 and previous config saved to /var/cache/conftool/dbconfig/20240224-145706-arnaudb.json
  • 14:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P57893 and previous config saved to /var/cache/conftool/dbconfig/20240224-144200-arnaudb.json
  • 14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T357189)', diff saved to https://phabricator.wikimedia.org/P57892 and previous config saved to /var/cache/conftool/dbconfig/20240224-142653-arnaudb.json
  • 12:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T357189)', diff saved to https://phabricator.wikimedia.org/P57891 and previous config saved to /var/cache/conftool/dbconfig/20240224-124741-arnaudb.json
  • 12:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 12:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 12:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T357189)', diff saved to https://phabricator.wikimedia.org/P57890 and previous config saved to /var/cache/conftool/dbconfig/20240224-124709-arnaudb.json
  • 12:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P57889 and previous config saved to /var/cache/conftool/dbconfig/20240224-123203-arnaudb.json
  • 12:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P57888 and previous config saved to /var/cache/conftool/dbconfig/20240224-121657-arnaudb.json
  • 12:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2196.codfw.wmnet with OS bookworm
  • 12:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T357189)', diff saved to https://phabricator.wikimedia.org/P57887 and previous config saved to /var/cache/conftool/dbconfig/20240224-120150-arnaudb.json
  • 10:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2119 (T357189)', diff saved to https://phabricator.wikimedia.org/P57886 and previous config saved to /var/cache/conftool/dbconfig/20240224-105413-arnaudb.json
  • 10:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 10:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 10:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T357189)', diff saved to https://phabricator.wikimedia.org/P57885 and previous config saved to /var/cache/conftool/dbconfig/20240224-105351-arnaudb.json
  • 10:48 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2121 from API', diff saved to https://phabricator.wikimedia.org/P57884 and previous config saved to /var/cache/conftool/dbconfig/20240224-104824-marostegui.json
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2118 T358423', diff saved to https://phabricator.wikimedia.org/P57883 and previous config saved to /var/cache/conftool/dbconfig/20240224-104617-root.json
  • 10:45 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2121 to s7 primary and set section read-write T358423', diff saved to https://phabricator.wikimedia.org/P57882 and previous config saved to /var/cache/conftool/dbconfig/20240224-104522-marostegui.json
  • 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'Set s7 codfw as read-only for maintenance - T358423', diff saved to https://phabricator.wikimedia.org/P57881 and previous config saved to /var/cache/conftool/dbconfig/20240224-104440-marostegui.json
  • 10:44 marostegui: Starting s7 codfw emergency failover from db2118 to db2121 - T358423
  • 10:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P57880 and previous config saved to /var/cache/conftool/dbconfig/20240224-103845-arnaudb.json
  • 10:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T358423
  • 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2121 with weight 0 T358423', diff saved to https://phabricator.wikimedia.org/P57879 and previous config saved to /var/cache/conftool/dbconfig/20240224-102401-root.json
  • 10:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P57878 and previous config saved to /var/cache/conftool/dbconfig/20240224-102338-arnaudb.json
  • 10:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T358423
  • 10:10 taavi: powercycle db2118
  • 10:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T357189)', diff saved to https://phabricator.wikimedia.org/P57877 and previous config saved to /var/cache/conftool/dbconfig/20240224-100832-arnaudb.json
  • 09:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2110 (T357189)', diff saved to https://phabricator.wikimedia.org/P57876 and previous config saved to /var/cache/conftool/dbconfig/20240224-090212-arnaudb.json
  • 09:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 09:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 09:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T357189)', diff saved to https://phabricator.wikimedia.org/P57875 and previous config saved to /var/cache/conftool/dbconfig/20240224-090150-arnaudb.json
  • 08:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P57874 and previous config saved to /var/cache/conftool/dbconfig/20240224-084644-arnaudb.json
  • 08:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P57873 and previous config saved to /var/cache/conftool/dbconfig/20240224-083138-arnaudb.json
  • 07:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2106 (T357189)', diff saved to https://phabricator.wikimedia.org/P57871 and previous config saved to /var/cache/conftool/dbconfig/20240224-071221-arnaudb.json
  • 07:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 07:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 06:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 06:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 05:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 05:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 05:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T357189)', diff saved to https://phabricator.wikimedia.org/P57870 and previous config saved to /var/cache/conftool/dbconfig/20240224-052320-arnaudb.json
  • 05:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P57869 and previous config saved to /var/cache/conftool/dbconfig/20240224-050814-arnaudb.json
  • 04:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P57868 and previous config saved to /var/cache/conftool/dbconfig/20240224-045307-arnaudb.json
  • 04:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T357189)', diff saved to https://phabricator.wikimedia.org/P57867 and previous config saved to /var/cache/conftool/dbconfig/20240224-043801-arnaudb.json
  • 03:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T357189)', diff saved to https://phabricator.wikimedia.org/P57866 and previous config saved to /var/cache/conftool/dbconfig/20240224-033304-arnaudb.json
  • 03:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 03:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 03:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T357189)', diff saved to https://phabricator.wikimedia.org/P57865 and previous config saved to /var/cache/conftool/dbconfig/20240224-033241-arnaudb.json
  • 03:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P57864 and previous config saved to /var/cache/conftool/dbconfig/20240224-031735-arnaudb.json
  • 03:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P57863 and previous config saved to /var/cache/conftool/dbconfig/20240224-030228-arnaudb.json
  • 02:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T357189)', diff saved to https://phabricator.wikimedia.org/P57862 and previous config saved to /var/cache/conftool/dbconfig/20240224-024722-arnaudb.json
  • 01:47 brett: Upload ncmonitor 0.0.3 to bookworm-wikimedia
  • 01:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T357189)', diff saved to https://phabricator.wikimedia.org/P57861 and previous config saved to /var/cache/conftool/dbconfig/20240224-014734-arnaudb.json
  • 01:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 01:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 01:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T357189)', diff saved to https://phabricator.wikimedia.org/P57860 and previous config saved to /var/cache/conftool/dbconfig/20240224-014711-arnaudb.json
  • 01:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P57859 and previous config saved to /var/cache/conftool/dbconfig/20240224-013205-arnaudb.json
  • 01:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P57858 and previous config saved to /var/cache/conftool/dbconfig/20240224-011658-arnaudb.json
  • 01:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T357189)', diff saved to https://phabricator.wikimedia.org/P57857 and previous config saved to /var/cache/conftool/dbconfig/20240224-010152-arnaudb.json

2024-02-23

  • 23:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T357189)', diff saved to https://phabricator.wikimedia.org/P57856 and previous config saved to /var/cache/conftool/dbconfig/20240223-235919-arnaudb.json
  • 23:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 23:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 23:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 23:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 23:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T357189)', diff saved to https://phabricator.wikimedia.org/P57855 and previous config saved to /var/cache/conftool/dbconfig/20240223-231440-arnaudb.json
  • 22:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P57854 and previous config saved to /var/cache/conftool/dbconfig/20240223-225933-arnaudb.json
  • 22:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P57853 and previous config saved to /var/cache/conftool/dbconfig/20240223-224427-arnaudb.json
  • 22:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T357189)', diff saved to https://phabricator.wikimedia.org/P57852 and previous config saved to /var/cache/conftool/dbconfig/20240223-222920-arnaudb.json
  • 21:49 sbassett: Deployed updated security mitigation for T336027
  • 21:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T357189)', diff saved to https://phabricator.wikimedia.org/P57850 and previous config saved to /var/cache/conftool/dbconfig/20240223-214211-arnaudb.json
  • 21:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 21:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 21:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T357189)', diff saved to https://phabricator.wikimedia.org/P57848 and previous config saved to /var/cache/conftool/dbconfig/20240223-214149-arnaudb.json
  • 21:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P57847 and previous config saved to /var/cache/conftool/dbconfig/20240223-212643-arnaudb.json
  • 21:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P57846 and previous config saved to /var/cache/conftool/dbconfig/20240223-211136-arnaudb.json
  • 21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2196.codfw.wmnet with reason: host reimage
  • 21:04 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2196.codfw.wmnet with reason: host reimage
  • 20:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T357189)', diff saved to https://phabricator.wikimedia.org/P57845 and previous config saved to /var/cache/conftool/dbconfig/20240223-205630-arnaudb.json
  • 20:48 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2196.codfw.wmnet with OS bookworm
  • 20:42 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 20:23 RhinosF1: [relog due to stashbot errors] jhancock@cumin2002 ran cookbook SRE.hardware.upgrade-firmware for hosts db2201/db2204/db2197/db2198/db2202/db2203/db2205 and all END PASS
  • 20:00 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2205']
  • 20:00 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2204']
  • 19:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2203']
  • 19:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2202']
  • 19:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2201']
  • 19:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2199']
  • 19:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2198']
  • 19:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2197']
  • 19:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T357189)', diff saved to https://phabricator.wikimedia.org/P57844 and previous config saved to /var/cache/conftool/dbconfig/20240223-195835-arnaudb.json
  • 19:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 19:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 19:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T357189)', diff saved to https://phabricator.wikimedia.org/P57843 and previous config saved to /var/cache/conftool/dbconfig/20240223-195802-arnaudb.json
  • 19:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P57841 and previous config saved to /var/cache/conftool/dbconfig/20240223-192749-arnaudb.json
  • 19:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T357189)', diff saved to https://phabricator.wikimedia.org/P57840 and previous config saved to /var/cache/conftool/dbconfig/20240223-191243-arnaudb.json
  • 19:04 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1036.eqiad.wmnet with reason: Bootstrapping — T354560
  • 19:04 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase1036.eqiad.wmnet with reason: Bootstrapping — T354560
  • 19:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2220']
  • 19:03 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2220']
  • 19:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2203.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2220']
  • 19:03 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2220']
  • 19:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2219']
  • 19:02 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2219']
  • 19:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2218']
  • 19:02 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2218']
  • 19:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2217']
  • 19:02 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2217']
  • 19:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2216']
  • 19:01 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2216']
  • 19:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2215']
  • 19:01 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2215']
  • 18:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2200']
  • 18:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2214']
  • 18:57 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2214']
  • 18:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2213']
  • 18:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2213']
  • 18:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2212']
  • 18:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2212']
  • 18:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2211']
  • 18:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2211']
  • 18:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2210']
  • 18:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2210']
  • 18:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2209']
  • 18:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2209']
  • 18:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2208']
  • 18:54 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2208']
  • 18:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2207']
  • 18:53 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2200']
  • 18:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2207']
  • 18:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2207']
  • 18:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2207']
  • 18:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2206']
  • 18:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2196']
  • 18:51 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2206']
  • 18:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2205']
  • 18:51 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2205']
  • 18:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2204']
  • 18:50 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2204']
  • 18:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2202']
  • 18:50 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2202']
  • 18:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2201']
  • 18:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2201']
  • 18:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2200']
  • 18:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2200']
  • 18:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2199']
  • 18:48 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2199']
  • 18:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2198']
  • 18:47 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2198']
  • 18:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2197']
  • 18:47 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2197']
  • 18:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2196']
  • 18:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2196']
  • 18:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2196']
  • 18:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2203.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:42 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 18:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2203.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2203.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T357189)', diff saved to https://phabricator.wikimedia.org/P57839 and previous config saved to /var/cache/conftool/dbconfig/20240223-181437-arnaudb.json
  • 18:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 18:14 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 18:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T357189)', diff saved to https://phabricator.wikimedia.org/P57838 and previous config saved to /var/cache/conftool/dbconfig/20240223-181416-arnaudb.json
  • 17:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P57835 and previous config saved to /var/cache/conftool/dbconfig/20240223-175909-arnaudb.json
  • 17:55 Daimona: T357007 Running mwscript CampaignEvents:GenerateInvitationList --wiki=metawiki --listfile=/home/daimona/list.txt
  • 17:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P57834 and previous config saved to /var/cache/conftool/dbconfig/20240223-174403-arnaudb.json
  • 17:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T357189)', diff saved to https://phabricator.wikimedia.org/P57833 and previous config saved to /var/cache/conftool/dbconfig/20240223-172856-arnaudb.json
  • 16:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T357189)', diff saved to https://phabricator.wikimedia.org/P57832 and previous config saved to /var/cache/conftool/dbconfig/20240223-162426-arnaudb.json
  • 16:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 16:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 16:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T357189)', diff saved to https://phabricator.wikimedia.org/P57831 and previous config saved to /var/cache/conftool/dbconfig/20240223-162351-arnaudb.json
  • 16:09 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1100.eqiad.wmnet,service=(cdn|ats-be)
  • 16:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P57830 and previous config saved to /var/cache/conftool/dbconfig/20240223-160845-arnaudb.json
  • 15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P57829 and previous config saved to /var/cache/conftool/dbconfig/20240223-155338-arnaudb.json
  • 15:44 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 15:43 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 15:40 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 15:39 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 15:39 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 15:38 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 15:38 claime: Deploying 1005974 to eventgate-main - T249745
  • 15:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T357189)', diff saved to https://phabricator.wikimedia.org/P57828 and previous config saved to /var/cache/conftool/dbconfig/20240223-153832-arnaudb.json
  • 15:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2196.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2196.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2203.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:48 hnowlan@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=(mw2351.codfw.wmnet|mw2353.codfw.wmnet|mw2382.codfw.wmnet|mw2394.codfw.wmnet|mw2419.codfw.wmnet|mw2426.codfw.wmnet|mw2428.codfw.wmnet|mw2444.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 14:43 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@b115452]: Deploy Refine job POC on test cluster - update (duration: 00m 12s)
  • 14:43 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@b115452]: Deploy Refine job POC on test cluster - update
  • 14:42 hnowlan: running `homer 'cr*codfw*' commit 'T354791'` for reclaimed codfw jobrunners moving to k8s workers
  • 14:37 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
  • 14:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T357189)', diff saved to https://phabricator.wikimedia.org/P57827 and previous config saved to /var/cache/conftool/dbconfig/20240223-143337-arnaudb.json
  • 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 14:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 14:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T357189)', diff saved to https://phabricator.wikimedia.org/P57826 and previous config saved to /var/cache/conftool/dbconfig/20240223-143246-arnaudb.json
  • 14:24 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 14:22 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 14:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P57825 and previous config saved to /var/cache/conftool/dbconfig/20240223-141740-arnaudb.json
  • 14:12 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2002.codfw.wmnet with OS bullseye
  • 14:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P57824 and previous config saved to /var/cache/conftool/dbconfig/20240223-140233-arnaudb.json
  • 13:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T357189)', diff saved to https://phabricator.wikimedia.org/P57823 and previous config saved to /var/cache/conftool/dbconfig/20240223-134727-arnaudb.json
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cumin1001.eqiad.wmnet
  • 13:22 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cumin1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cumin1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 13:19 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2394.codfw.wmnet with OS bullseye
  • 13:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:11 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cumin1001.eqiad.wmnet
  • 13:09 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2419.codfw.wmnet with OS bullseye
  • 13:08 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2428.codfw.wmnet with OS bullseye
  • 13:07 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2426.codfw.wmnet with OS bullseye
  • 13:04 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2382.codfw.wmnet with OS bullseye
  • 12:57 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2351.codfw.wmnet with OS bullseye
  • 12:55 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2444.codfw.wmnet with OS bullseye
  • 12:53 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2394.codfw.wmnet with reason: host reimage
  • 12:52 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2353.codfw.wmnet with OS bullseye
  • 12:49 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2428.codfw.wmnet with reason: host reimage
  • 12:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T357189)', diff saved to https://phabricator.wikimedia.org/P57822 and previous config saved to /var/cache/conftool/dbconfig/20240223-124710-arnaudb.json
  • 12:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 12:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 12:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T357189)', diff saved to https://phabricator.wikimedia.org/P57821 and previous config saved to /var/cache/conftool/dbconfig/20240223-124648-arnaudb.json
  • 12:46 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2419.codfw.wmnet with reason: host reimage
  • 12:43 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2426.codfw.wmnet with reason: host reimage
  • 12:41 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2382.codfw.wmnet with reason: host reimage
  • 12:39 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2351.codfw.wmnet with reason: host reimage
  • 12:36 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2444.codfw.wmnet with reason: host reimage
  • 12:34 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2353.codfw.wmnet with reason: host reimage
  • 12:34 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2428.codfw.wmnet with reason: host reimage
  • 12:32 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2394.codfw.wmnet with reason: host reimage
  • 12:32 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2444.codfw.wmnet with reason: host reimage
  • 12:32 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2419.codfw.wmnet with reason: host reimage
  • 12:32 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2426.codfw.wmnet with reason: host reimage
  • 12:32 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2382.codfw.wmnet with reason: host reimage
  • 12:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P57820 and previous config saved to /var/cache/conftool/dbconfig/20240223-123141-arnaudb.json
  • 12:31 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2351.codfw.wmnet with reason: host reimage
  • 12:31 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2353.codfw.wmnet with reason: host reimage
  • 12:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P57819 and previous config saved to /var/cache/conftool/dbconfig/20240223-121635-arnaudb.json
  • 12:16 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2444.codfw.wmnet with OS bullseye
  • 12:16 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2428.codfw.wmnet with OS bullseye
  • 12:15 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2426.codfw.wmnet with OS bullseye
  • 12:15 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2419.codfw.wmnet with OS bullseye
  • 12:15 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2394.codfw.wmnet with OS bullseye
  • 12:15 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2382.codfw.wmnet with OS bullseye
  • 12:15 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2353.codfw.wmnet with OS bullseye
  • 12:15 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2351.codfw.wmnet with OS bullseye
  • 12:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T357189)', diff saved to https://phabricator.wikimedia.org/P57818 and previous config saved to /var/cache/conftool/dbconfig/20240223-120129-arnaudb.json
  • 11:58 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --start '["75194261"]' | tee -a ~/T315510-enwiki-2 # in tmux
  • 11:52 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki viwiki --current --all --touched-after=20230613000000 --start '["7939741"]' 2>&1 | tee ~/T315510-viwiki # in tmux
  • 11:49 Lucas_WMDE: STOP persistRevisionThreadItems on viwiki for T315510, had been throwing tons of errors since at least Wednesday
  • 11:32 hnowlan@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(mw2384.codfw.wmnet|mw2385.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 11:07 hnowlan: running `homer 'cr*codfw*' commit 'T351074'` for two more appservers becoming k8s workers
  • 11:01 hnowlan@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(mw2369.codfw.wmnet|mw2367.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 10:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T357189)', diff saved to https://phabricator.wikimedia.org/P57816 and previous config saved to /var/cache/conftool/dbconfig/20240223-105929-arnaudb.json
  • 10:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 10:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 10:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T357189)', diff saved to https://phabricator.wikimedia.org/P57815 and previous config saved to /var/cache/conftool/dbconfig/20240223-105907-arnaudb.json
  • 10:52 hnowlan: running homer 'cr*codfw*' commit 'T351074' for new appservers being migrated to k8s workers
  • 10:49 hnowlan@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(mw1458.eqiad.wmnet|mw1467.eqiad.wmnet|mw1468.eqiad.wmnet|mw1483.eqiad.wmnet|mw1484.eqiad.wmnet|mw1485.eqiad.wmnet|mw1494.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 10:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P57814 and previous config saved to /var/cache/conftool/dbconfig/20240223-104401-arnaudb.json
  • 10:41 hnowlan: running homer 'cr*eqiad*' commit 'T351074' && homer 'lsw1-f2-eqiad*' commit 'T351074' for jobrunners being migrated to k8s workers
  • 10:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P57813 and previous config saved to /var/cache/conftool/dbconfig/20240223-102854-arnaudb.json
  • 10:26 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 10:26 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 10:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T357189)', diff saved to https://phabricator.wikimedia.org/P57811 and previous config saved to /var/cache/conftool/dbconfig/20240223-101348-arnaudb.json
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57810 and previous config saved to /var/cache/conftool/dbconfig/20240223-093559-root.json
  • 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57809 and previous config saved to /var/cache/conftool/dbconfig/20240223-092053-root.json
  • 09:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T357189)', diff saved to https://phabricator.wikimedia.org/P57808 and previous config saved to /var/cache/conftool/dbconfig/20240223-090913-arnaudb.json
  • 09:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57807 and previous config saved to /var/cache/conftool/dbconfig/20240223-090549-root.json
  • 08:54 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging GoranSMilovanovic out of all services on: 8 hosts
  • 08:53 root@cumin2002: START - Cookbook sre.idm.logout Logging GoranSMilovanovic out of all services on: 8 hosts
  • 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57806 and previous config saved to /var/cache/conftool/dbconfig/20240223-085043-root.json
  • 08:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57805 and previous config saved to /var/cache/conftool/dbconfig/20240223-083538-root.json
  • 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57804 and previous config saved to /var/cache/conftool/dbconfig/20240223-082033-root.json
  • 08:20 godog: rollout prometheus-rsyslog-exporter new version to remaining hosts, caching sites - T357616
  • 08:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 1%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57803 and previous config saved to /var/cache/conftool/dbconfig/20240223-080528-root.json
  • 08:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1031.eqiad.wmnet with OS bookworm
  • 07:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1031.eqiad.wmnet with reason: host reimage
  • 07:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1031.eqiad.wmnet with reason: host reimage
  • 07:40 marostegui: Install 10.6.17 on pc1014 T357089
  • 07:28 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1031.eqiad.wmnet with OS bookworm
  • 07:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1031 T358180', diff saved to https://phabricator.wikimedia.org/P57802 and previous config saved to /var/cache/conftool/dbconfig/20240223-071952-root.json
  • 01:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T357189)', diff saved to https://phabricator.wikimedia.org/P57801 and previous config saved to /var/cache/conftool/dbconfig/20240223-015907-arnaudb.json
  • 01:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P57800 and previous config saved to /var/cache/conftool/dbconfig/20240223-014400-arnaudb.json
  • 01:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P57799 and previous config saved to /var/cache/conftool/dbconfig/20240223-012853-arnaudb.json
  • 01:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T357189)', diff saved to https://phabricator.wikimedia.org/P57798 and previous config saved to /var/cache/conftool/dbconfig/20240223-011347-arnaudb.json
  • 01:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T357189)', diff saved to https://phabricator.wikimedia.org/P57797 and previous config saved to /var/cache/conftool/dbconfig/20240223-011128-arnaudb.json
  • 01:11 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 01:11 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 01:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T357189)', diff saved to https://phabricator.wikimedia.org/P57796 and previous config saved to /var/cache/conftool/dbconfig/20240223-011107-arnaudb.json
  • 00:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P57795 and previous config saved to /var/cache/conftool/dbconfig/20240223-005601-arnaudb.json
  • 00:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P57794 and previous config saved to /var/cache/conftool/dbconfig/20240223-004054-arnaudb.json
  • 00:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T357189)', diff saved to https://phabricator.wikimedia.org/P57793 and previous config saved to /var/cache/conftool/dbconfig/20240223-002547-arnaudb.json
  • 00:14 zabe@deploy2002: Finished scap: Backport for block: Pass wikiId to DatabaseBlock::getId in DatabaseBlockStore (T358208) (duration: 11m 02s)
  • 00:12 zabe: zabe@mwmaint2002:/tmp/uploads$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user="Grandmaster Huon" . # T358022
  • 00:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T357189)', diff saved to https://phabricator.wikimedia.org/P57791 and previous config saved to /var/cache/conftool/dbconfig/20240223-000920-arnaudb.json
  • 00:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 00:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 00:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T357189)', diff saved to https://phabricator.wikimedia.org/P57790 and previous config saved to /var/cache/conftool/dbconfig/20240223-000858-arnaudb.json
  • 00:06 zabe@deploy2002: zabe: Continuing with sync
  • 00:04 zabe@deploy2002: zabe: Backport for block: Pass wikiId to DatabaseBlock::getId in DatabaseBlockStore (T358208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 00:03 zabe@deploy2002: Started scap: Backport for block: Pass wikiId to DatabaseBlock::getId in DatabaseBlockStore (T358208)

2024-02-22

  • 23:59 tstarling@deploy2002: Finished scap: (no justification provided) (duration: 09m 40s)
  • 23:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P57789 and previous config saved to /var/cache/conftool/dbconfig/20240222-235351-arnaudb.json
  • 23:49 tstarling@deploy2002: Started scap: (no justification provided)
  • 23:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P57788 and previous config saved to /var/cache/conftool/dbconfig/20240222-233845-arnaudb.json
  • 23:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1035.eqiad.wmnet with reason: Bootstrapping — T354560
  • 23:35 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase1035.eqiad.wmnet with reason: Bootstrapping — T354560
  • 23:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T357189)', diff saved to https://phabricator.wikimedia.org/P57787 and previous config saved to /var/cache/conftool/dbconfig/20240222-232338-arnaudb.json
  • 23:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T357189)', diff saved to https://phabricator.wikimedia.org/P57786 and previous config saved to /var/cache/conftool/dbconfig/20240222-232118-arnaudb.json
  • 23:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 23:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 23:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T357189)', diff saved to https://phabricator.wikimedia.org/P57785 and previous config saved to /var/cache/conftool/dbconfig/20240222-232056-arnaudb.json
  • 23:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P57784 and previous config saved to /var/cache/conftool/dbconfig/20240222-230549-arnaudb.json
  • 22:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P57783 and previous config saved to /var/cache/conftool/dbconfig/20240222-225042-arnaudb.json
  • 22:41 cjming: end of UTC late backport window
  • 22:40 cjming@deploy2002: Finished scap: Backport for Improve chunked upload jobs and abort assemble job if already in progress (T200820) (duration: 09m 46s)
  • 22:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T357189)', diff saved to https://phabricator.wikimedia.org/P57782 and previous config saved to /var/cache/conftool/dbconfig/20240222-223536-arnaudb.json
  • 22:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T357189)', diff saved to https://phabricator.wikimedia.org/P57781 and previous config saved to /var/cache/conftool/dbconfig/20240222-223314-arnaudb.json
  • 22:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 22:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 22:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T357189)', diff saved to https://phabricator.wikimedia.org/P57780 and previous config saved to /var/cache/conftool/dbconfig/20240222-223251-arnaudb.json
  • 22:32 cjming@deploy2002: bawolff and cjming: Continuing with sync
  • 22:32 cjming@deploy2002: bawolff and cjming: Backport for Improve chunked upload jobs and abort assemble job if already in progress (T200820) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:30 cjming@deploy2002: Started scap: Backport for Improve chunked upload jobs and abort assemble job if already in progress (T200820)
  • 22:30 cjming@deploy2002: Finished scap: Backport for testwiki: Allow modifying email in account vanishing contact form. (T343536) (duration: 09m 58s)
  • 22:22 cjming@deploy2002: cjming and dbrant: Continuing with sync
  • 22:21 cjming@deploy2002: cjming and dbrant: Backport for testwiki: Allow modifying email in account vanishing contact form. (T343536) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:20 cjming@deploy2002: Started scap: Backport for testwiki: Allow modifying email in account vanishing contact form. (T343536)
  • 22:18 cjming@deploy2002: Finished scap: Backport for Add verbiage for Account Vanishing contact page. (T343536) (duration: 27m 47s)
  • 22:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P57779 and previous config saved to /var/cache/conftool/dbconfig/20240222-221745-arnaudb.json
  • 22:06 cjming@deploy2002: dbrant and cjming: Continuing with sync
  • 22:05 cjming@deploy2002: dbrant and cjming: Backport for Add verbiage for Account Vanishing contact page. (T343536) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P57778 and previous config saved to /var/cache/conftool/dbconfig/20240222-220238-arnaudb.json
  • 21:51 cjming@deploy2002: Started scap: Backport for Add verbiage for Account Vanishing contact page. (T343536)
  • 21:50 cjming@deploy2002: Finished scap: Backport for Change font-size "Small" label to "Standard" (T358074) (duration: 29m 07s)
  • 21:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T357189)', diff saved to https://phabricator.wikimedia.org/P57777 and previous config saved to /var/cache/conftool/dbconfig/20240222-214732-arnaudb.json
  • 21:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T357189)', diff saved to https://phabricator.wikimedia.org/P57776 and previous config saved to /var/cache/conftool/dbconfig/20240222-214310-arnaudb.json
  • 21:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 21:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 21:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T357189)', diff saved to https://phabricator.wikimedia.org/P57775 and previous config saved to /var/cache/conftool/dbconfig/20240222-214221-arnaudb.json
  • 21:39 cjming@deploy2002: cjming and jdlrobson: Continuing with sync
  • 21:35 cjming@deploy2002: cjming and jdlrobson: Backport for Change font-size "Small" label to "Standard" (T358074) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P57774 and previous config saved to /var/cache/conftool/dbconfig/20240222-212715-arnaudb.json
  • 21:21 cjming@deploy2002: Started scap: Backport for Change font-size "Small" label to "Standard" (T358074)
  • 21:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
  • 21:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P57773 and previous config saved to /var/cache/conftool/dbconfig/20240222-211208-arnaudb.json
  • 21:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 20:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 20:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T357189)', diff saved to https://phabricator.wikimedia.org/P57772 and previous config saved to /var/cache/conftool/dbconfig/20240222-205701-arnaudb.json
  • 20:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T357189)', diff saved to https://phabricator.wikimedia.org/P57771 and previous config saved to /var/cache/conftool/dbconfig/20240222-205440-arnaudb.json
  • 20:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 20:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 20:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T357189)', diff saved to https://phabricator.wikimedia.org/P57770 and previous config saved to /var/cache/conftool/dbconfig/20240222-205417-arnaudb.json
  • 20:45 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2002.codfw.wmnet with OS bullseye
  • 20:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P57769 and previous config saved to /var/cache/conftool/dbconfig/20240222-203911-arnaudb.json
  • 20:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P57768 and previous config saved to /var/cache/conftool/dbconfig/20240222-202404-arnaudb.json
  • 20:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T357189)', diff saved to https://phabricator.wikimedia.org/P57767 and previous config saved to /var/cache/conftool/dbconfig/20240222-200858-arnaudb.json
  • 20:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T357189)', diff saved to https://phabricator.wikimedia.org/P57766 and previous config saved to /var/cache/conftool/dbconfig/20240222-200636-arnaudb.json
  • 20:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 20:06 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host testvm2002.codfw.wmnet with OS bullseye
  • 20:06 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 20:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T357189)', diff saved to https://phabricator.wikimedia.org/P57765 and previous config saved to /var/cache/conftool/dbconfig/20240222-200614-arnaudb.json
  • 20:00 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 19:58 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 19:58 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 19:57 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 19:56 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:56 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 19:53 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 19:52 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 19:52 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 19:52 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 19:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P57764 and previous config saved to /var/cache/conftool/dbconfig/20240222-195108-arnaudb.json
  • 19:50 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 19:50 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:49 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:40 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2002.codfw.wmnet with OS bullseye
  • 19:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P57763 and previous config saved to /var/cache/conftool/dbconfig/20240222-193601-arnaudb.json
  • 19:30 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:30 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup incorrect asset tags - robh@cumin2002"
  • 19:29 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cleanup incorrect asset tags - robh@cumin2002"
  • 19:27 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 19:23 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 19:22 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 19:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T357189)', diff saved to https://phabricator.wikimedia.org/P57762 and previous config saved to /var/cache/conftool/dbconfig/20240222-192055-arnaudb.json
  • 19:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T357189)', diff saved to https://phabricator.wikimedia.org/P57761 and previous config saved to /var/cache/conftool/dbconfig/20240222-191834-arnaudb.json
  • 19:18 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.19 refs T354437
  • 19:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 19:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 19:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T357189)', diff saved to https://phabricator.wikimedia.org/P57760 and previous config saved to /var/cache/conftool/dbconfig/20240222-191810-arnaudb.json
  • 19:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2385.codfw.wmnet with OS bullseye
  • 19:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P57759 and previous config saved to /var/cache/conftool/dbconfig/20240222-190304-arnaudb.json
  • 18:49 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2385.codfw.wmnet with reason: host reimage
  • 18:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P57758 and previous config saved to /var/cache/conftool/dbconfig/20240222-184757-arnaudb.json
  • 18:46 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2385.codfw.wmnet with reason: host reimage
  • 18:44 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2384.codfw.wmnet with OS bullseye
  • 18:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T357189)', diff saved to https://phabricator.wikimedia.org/P57757 and previous config saved to /var/cache/conftool/dbconfig/20240222-183251-arnaudb.json
  • 18:31 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw2385.codfw.wmnet with OS bullseye
  • 18:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T357189)', diff saved to https://phabricator.wikimedia.org/P57756 and previous config saved to /var/cache/conftool/dbconfig/20240222-183030-arnaudb.json
  • 18:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 18:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 18:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T357189)', diff saved to https://phabricator.wikimedia.org/P57755 and previous config saved to /var/cache/conftool/dbconfig/20240222-183009-arnaudb.json
  • 18:28 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1485.eqiad.wmnet with OS bullseye
  • 18:25 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1467.eqiad.wmnet with OS bullseye
  • 18:24 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1494.eqiad.wmnet with OS bullseye
  • 18:22 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
  • 18:22 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
  • 18:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1484.eqiad.wmnet with OS bullseye
  • 18:21 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2384.codfw.wmnet with reason: host reimage
  • 18:18 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2384.codfw.wmnet with reason: host reimage
  • 18:17 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1468.eqiad.wmnet with OS bullseye
  • 18:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P57753 and previous config saved to /var/cache/conftool/dbconfig/20240222-181502-arnaudb.json
  • 18:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1483.eqiad.wmnet with OS bullseye
  • 18:12 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1458.eqiad.wmnet with OS bullseye
  • 18:11 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1485.eqiad.wmnet with reason: host reimage
  • 18:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1467.eqiad.wmnet with reason: host reimage
  • 18:04 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1484.eqiad.wmnet with reason: host reimage
  • 18:04 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:04 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:04 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:03 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw2384.codfw.wmnet with OS bullseye
  • 18:03 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:03 hnowlan@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host mw2384.codfw.wmnet with OS bullseye
  • 18:03 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:02 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:01 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1494.eqiad.wmnet with reason: host reimage
  • 17:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P57752 and previous config saved to /var/cache/conftool/dbconfig/20240222-175956-arnaudb.json
  • 17:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1468.eqiad.wmnet with reason: host reimage
  • 17:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1483.eqiad.wmnet with reason: host reimage
  • 17:54 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1494.eqiad.wmnet with reason: host reimage
  • 17:54 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1483.eqiad.wmnet with reason: host reimage
  • 17:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1458.eqiad.wmnet with reason: host reimage
  • 17:54 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1484.eqiad.wmnet with reason: host reimage
  • 17:54 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1485.eqiad.wmnet with reason: host reimage
  • 17:54 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1468.eqiad.wmnet with reason: host reimage
  • 17:52 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1467.eqiad.wmnet with reason: host reimage
  • 17:52 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1458.eqiad.wmnet with reason: host reimage
  • 17:51 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw2384.codfw.wmnet with OS bullseye
  • 17:45 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 17:44 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 17:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T357189)', diff saved to https://phabricator.wikimedia.org/P57751 and previous config saved to /var/cache/conftool/dbconfig/20240222-174449-arnaudb.json
  • 17:44 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 17:43 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 17:43 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 17:43 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 17:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T357189)', diff saved to https://phabricator.wikimedia.org/P57750 and previous config saved to /var/cache/conftool/dbconfig/20240222-174328-arnaudb.json
  • 17:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 17:43 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 17:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 17:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 17:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 17:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 17:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 17:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 17:42 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1494.eqiad.wmnet with OS bullseye
  • 17:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 17:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T357189)', diff saved to https://phabricator.wikimedia.org/P57749 and previous config saved to /var/cache/conftool/dbconfig/20240222-174138-arnaudb.json
  • 17:41 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1485.eqiad.wmnet with OS bullseye
  • 17:41 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1484.eqiad.wmnet with OS bullseye
  • 17:41 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1483.eqiad.wmnet with OS bullseye
  • 17:41 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1468.eqiad.wmnet with OS bullseye
  • 17:40 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 17:39 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1467.eqiad.wmnet with OS bullseye
  • 17:39 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1458.eqiad.wmnet with OS bullseye
  • 17:39 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 17:36 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 17:35 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host testvm2002.codfw.wmnet with OS bullseye
  • 17:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P57748 and previous config saved to /var/cache/conftool/dbconfig/20240222-172632-arnaudb.json
  • 17:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P57747 and previous config saved to /var/cache/conftool/dbconfig/20240222-171125-arnaudb.json
  • 17:05 topranks: disabling IPv6 RAs for private1-a-codfw vlan on codfw core routers T355544
  • 16:58 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Remove legacy codfw vc switches from synced hiera data after netbox status change - cmooney@cumin1002 - T355544"
  • 16:57 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Remove legacy codfw vc switches from synced hiera data after netbox status change - cmooney@cumin1002 - T355544"
  • 16:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T357189)', diff saved to https://phabricator.wikimedia.org/P57746 and previous config saved to /var/cache/conftool/dbconfig/20240222-165619-arnaudb.json
  • 16:56 topranks: disabling link from asw-a-codfw vc to ssw1-a1-codfw and ssw1-a8-codfw T355544
  • 16:54 dancy@deploy2002: Finished scap: testing T357402 again (duration: 08m 58s)
  • 16:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T357189)', diff saved to https://phabricator.wikimedia.org/P57745 and previous config saved to /var/cache/conftool/dbconfig/20240222-165401-arnaudb.json
  • 16:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 16:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 16:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 16:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 16:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T357189)', diff saved to https://phabricator.wikimedia.org/P57744 and previous config saved to /var/cache/conftool/dbconfig/20240222-165312-arnaudb.json
  • 16:45 dancy@deploy2002: Started scap: testing T357402 again
  • 16:43 dancy@deploy2002: sync-world aborted: testing T357402 (duration: 14m 57s)
  • 16:42 akosiaris@cumin1002: conftool action : set/pooled=inactive; selector: service=parsoid-php,name=kubernetes.*
  • 16:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P57743 and previous config saved to /var/cache/conftool/dbconfig/20240222-163806-arnaudb.json
  • 16:36 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:36 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:30 fabfur@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=(cdn|ats-be)
  • 16:30 fabfur@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=(cdn|ats-be)
  • 16:28 fabfur@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp[2031-2032].codfw.wmnet
  • 16:28 fabfur@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp[2031-2032].codfw.wmnet
  • 16:28 dancy@deploy2002: Started scap: testing T357402
  • 16:26 dancy@deploy2002: Installation of scap version "4.66.0" completed for 458 hosts
  • 16:25 dancy@deploy2002: Installing scap version "4.66.0" for 458 hosts
  • 16:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P57742 and previous config saved to /var/cache/conftool/dbconfig/20240222-162300-arnaudb.json
  • 16:22 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 16:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: After recloning', diff saved to https://phabricator.wikimedia.org/P57741 and previous config saved to /var/cache/conftool/dbconfig/20240222-162151-root.json
  • 16:19 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2002.codfw.wmnet with OS bullseye
  • 16:16 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 16:11 mvernon@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
  • 16:11 Emperor: repool codfs-mw T355868
  • 16:10 Emperor: repool thanos-fe2002 T355868
  • 16:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T357189)', diff saved to https://phabricator.wikimedia.org/P57740 and previous config saved to /var/cache/conftool/dbconfig/20240222-160753-arnaudb.json
  • 16:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: After recloning', diff saved to https://phabricator.wikimedia.org/P57739 and previous config saved to /var/cache/conftool/dbconfig/20240222-160646-root.json
  • 16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T357189)', diff saved to https://phabricator.wikimedia.org/P57738 and previous config saved to /var/cache/conftool/dbconfig/20240222-160534-arnaudb.json
  • 16:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 16:05 volans@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T357189)', diff saved to https://phabricator.wikimedia.org/P57737 and previous config saved to /var/cache/conftool/dbconfig/20240222-160512-arnaudb.json
  • 16:04 volans@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 16:00 topranks: Commencing network maintenance migrating servers to new switch codfw rack B2 T355868
  • 15:58 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host testvm2002.codfw.wmnet with OS bullseye
  • 15:57 hnowlan: depooling mw[1458,1467-1468,1483-1485,1494].eqiad.wmnet in advance of reimaging
  • 15:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 25 hosts with reason: Migrating servers in codfw rack B2 to lsw1-b2-codfw
  • 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 25 hosts with reason: Migrating servers in codfw rack B2 to lsw1-b2-codfw
  • 15:54 mvernon@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=swift,name=codfw
  • 15:54 Emperor: depool codfs-mw T355868
  • 15:53 Emperor: depool thanos-fe2002 T355868
  • 15:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 50%: After recloning', diff saved to https://phabricator.wikimedia.org/P57736 and previous config saved to /var/cache/conftool/dbconfig/20240222-155141-root.json
  • 15:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P57735 and previous config saved to /var/cache/conftool/dbconfig/20240222-155005-arnaudb.json
  • 15:48 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-b-codfw,cr[1-2]-codfw,lsw1-b2-codfw.mgmt with reason: prepping for server uplink migration codfw rack b2
  • 15:48 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on asw-b-codfw,cr[1-2]-codfw,lsw1-b2-codfw.mgmt with reason: prepping for server uplink migration codfw rack b2
  • 15:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp[2031-2032].codfw.wmnet with reason: T355868
  • 15:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp[2031-2032].codfw.wmnet with reason: T355868
  • 15:39 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@b115452]: Deploy Refine job POC on test cluster (duration: 00m 16s)
  • 15:39 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@b115452]: Deploy Refine job POC on test cluster
  • 15:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: After recloning', diff saved to https://phabricator.wikimedia.org/P57734 and previous config saved to /var/cache/conftool/dbconfig/20240222-153636-root.json
  • 15:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P57733 and previous config saved to /var/cache/conftool/dbconfig/20240222-153459-arnaudb.json
  • 15:32 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 15:27 moritzm: installing glib2.0 security updates on bullseye
  • 15:27 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 15:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 10%: After recloning', diff saved to https://phabricator.wikimedia.org/P57732 and previous config saved to /var/cache/conftool/dbconfig/20240222-152131-root.json
  • 15:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T357189)', diff saved to https://phabricator.wikimedia.org/P57731 and previous config saved to /var/cache/conftool/dbconfig/20240222-151952-arnaudb.json
  • 15:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T357189)', diff saved to https://phabricator.wikimedia.org/P57730 and previous config saved to /var/cache/conftool/dbconfig/20240222-151733-arnaudb.json
  • 15:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 15:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 15:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T357189)', diff saved to https://phabricator.wikimedia.org/P57729 and previous config saved to /var/cache/conftool/dbconfig/20240222-151701-arnaudb.json
  • 15:15 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2002.codfw.wmnet with OS bullseye
  • 15:15 akosiaris@cumin1002: conftool action : set/pooled=yes; selector: service=parsoid-php,name=kubernetes.*
  • 15:15 akosiaris: T357392 pool 46 kubernetes hosts of parsoid-php with a weight of 1. Since the 42 parse hosts are at weight 110, that means 1% goes to mw-parsoid deployment, aka mw-on-k8s
  • 15:13 akosiaris@cumin1002: conftool action : set/weight=1; selector: service=parsoid-php,name=kubernetes.*
  • 15:12 akosiaris@cumin1002: conftool action : set/weight=110; selector: service=parsoid-php,name=(pars.*|mw.*)
  • 15:12 akosiaris: Bump weight of old parsoid hosts from 10 to 110. This is a noop right now but will makes calculations later spelled out in T357392 possible.
  • 14:55 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 14:55 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 14:55 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 14:55 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 14:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2149 (re)pooling @ 1%: After recloning', diff saved to https://phabricator.wikimedia.org/P57726 and previous config saved to /var/cache/conftool/dbconfig/20240222-145120-root.json
  • 14:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P57725 and previous config saved to /var/cache/conftool/dbconfig/20240222-144648-arnaudb.json
  • 14:45 cgoubert@deploy2002: Finished scap: Backport for Enable $wgLocalHTTPProxy on group1 wikis (T298265) (duration: 17m 46s)
  • 14:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-redacteddb1001.eqiad.wmnet with OS bullseye
  • 14:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-redacteddb1001.eqiad.wmnet with OS bullseye
  • 14:37 cgoubert@deploy2002: cgoubert: Continuing with sync
  • 14:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T357189)', diff saved to https://phabricator.wikimedia.org/P57724 and previous config saved to /var/cache/conftool/dbconfig/20240222-143141-arnaudb.json
  • 14:29 cgoubert@deploy2002: cgoubert: Backport for Enable $wgLocalHTTPProxy on group1 wikis (T298265) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T357189)', diff saved to https://phabricator.wikimedia.org/P57723 and previous config saved to /var/cache/conftool/dbconfig/20240222-142921-arnaudb.json
  • 14:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 14:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 14:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T357189)', diff saved to https://phabricator.wikimedia.org/P57722 and previous config saved to /var/cache/conftool/dbconfig/20240222-142859-arnaudb.json
  • 14:28 cgoubert@deploy2002: Started scap: Backport for Enable $wgLocalHTTPProxy on group1 wikis (T298265)
  • 14:15 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57721 and previous config saved to /var/cache/conftool/dbconfig/20240222-141508-root.json
  • 14:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P57720 and previous config saved to /var/cache/conftool/dbconfig/20240222-141353-arnaudb.json
  • 14:03 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 14:00 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57719 and previous config saved to /var/cache/conftool/dbconfig/20240222-140003-root.json
  • 13:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P57718 and previous config saved to /var/cache/conftool/dbconfig/20240222-135846-arnaudb.json
  • 13:53 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 13:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 13:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 13:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:51 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 13:51 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:51 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 13:51 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:46 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bookworm
  • 13:46 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 13:45 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:45 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 13:45 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57717 and previous config saved to /var/cache/conftool/dbconfig/20240222-134458-root.json
  • 13:44 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T357189)', diff saved to https://phabricator.wikimedia.org/P57716 and previous config saved to /var/cache/conftool/dbconfig/20240222-134340-arnaudb.json
  • 13:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1193 (T357189)', diff saved to https://phabricator.wikimedia.org/P57715 and previous config saved to /var/cache/conftool/dbconfig/20240222-134120-arnaudb.json
  • 13:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 13:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 13:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T357189)', diff saved to https://phabricator.wikimedia.org/P57714 and previous config saved to /var/cache/conftool/dbconfig/20240222-134059-arnaudb.json
  • 13:40 aborrero@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1034
  • 13:40 aborrero@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1034
  • 13:34 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57713 and previous config saved to /var/cache/conftool/dbconfig/20240222-132953-root.json
  • 13:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P57712 and previous config saved to /var/cache/conftool/dbconfig/20240222-132551-arnaudb.json
  • 13:20 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:20 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 13:18 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 13:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57711 and previous config saved to /var/cache/conftool/dbconfig/20240222-131448-root.json
  • 13:13 godog: bounce grafana to apply new datasources
  • 13:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P57710 and previous config saved to /var/cache/conftool/dbconfig/20240222-131045-arnaudb.json
  • 13:05 Emperor: ms-codfw set ACL {"read-only":["mw:backup"]} T269108
  • 13:03 Emperor: ms-eqiad set ACL {"read-only":["mw:backup"]} T269108
  • 13:02 eoghan@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading gitlab
  • 13:01 aborrero@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bookworm
  • 12:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57709 and previous config saved to /var/cache/conftool/dbconfig/20240222-125943-root.json
  • 12:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T357189)', diff saved to https://phabricator.wikimedia.org/P57708 and previous config saved to /var/cache/conftool/dbconfig/20240222-125538-arnaudb.json
  • 12:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T357189)', diff saved to https://phabricator.wikimedia.org/P57707 and previous config saved to /var/cache/conftool/dbconfig/20240222-125319-arnaudb.json
  • 12:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 12:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 12:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T357189)', diff saved to https://phabricator.wikimedia.org/P57706 and previous config saved to /var/cache/conftool/dbconfig/20240222-125257-arnaudb.json
  • 12:52 eoghan@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrading gitlab
  • 12:45 eoghan@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrading gitlab
  • 12:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 1%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57705 and previous config saved to /var/cache/conftool/dbconfig/20240222-124438-root.json
  • 12:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P57704 and previous config saved to /var/cache/conftool/dbconfig/20240222-123750-arnaudb.json
  • 12:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P57703 and previous config saved to /var/cache/conftool/dbconfig/20240222-122244-arnaudb.json
  • 12:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T357189)', diff saved to https://phabricator.wikimedia.org/P57702 and previous config saved to /var/cache/conftool/dbconfig/20240222-120737-arnaudb.json
  • 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T357189)', diff saved to https://phabricator.wikimedia.org/P57701 and previous config saved to /var/cache/conftool/dbconfig/20240222-120518-arnaudb.json
  • 12:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 12:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 12:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T357189)', diff saved to https://phabricator.wikimedia.org/P57700 and previous config saved to /var/cache/conftool/dbconfig/20240222-120445-arnaudb.json
  • 12:02 eoghan@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrading gitlab
  • 11:55 eoghan@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrading gitlab
  • 11:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 11:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 11:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 11:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 11:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1028.eqiad.wmnet with OS bookworm
  • 11:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P57699 and previous config saved to /var/cache/conftool/dbconfig/20240222-114938-arnaudb.json
  • 11:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P57698 and previous config saved to /var/cache/conftool/dbconfig/20240222-113432-arnaudb.json
  • 11:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1028.eqiad.wmnet with reason: host reimage
  • 11:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1028.eqiad.wmnet with reason: host reimage
  • 11:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T357189)', diff saved to https://phabricator.wikimedia.org/P57697 and previous config saved to /var/cache/conftool/dbconfig/20240222-111925-arnaudb.json
  • 11:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T357189)', diff saved to https://phabricator.wikimedia.org/P57696 and previous config saved to /var/cache/conftool/dbconfig/20240222-111706-arnaudb.json
  • 11:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 11:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 11:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T357189)', diff saved to https://phabricator.wikimedia.org/P57695 and previous config saved to /var/cache/conftool/dbconfig/20240222-111644-arnaudb.json
  • 11:12 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1028.eqiad.wmnet with OS bookworm
  • 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1028 T358180', diff saved to https://phabricator.wikimedia.org/P57694 and previous config saved to /var/cache/conftool/dbconfig/20240222-110914-root.json
  • 11:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P57693 and previous config saved to /var/cache/conftool/dbconfig/20240222-110138-arnaudb.json
  • 10:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P57692 and previous config saved to /var/cache/conftool/dbconfig/20240222-104632-arnaudb.json
  • 10:35 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s5
  • 10:35 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s8
  • 10:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T357189)', diff saved to https://phabricator.wikimedia.org/P57690 and previous config saved to /var/cache/conftool/dbconfig/20240222-103125-arnaudb.json
  • 10:31 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s8
  • 10:31 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s5
  • 10:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T357189)', diff saved to https://phabricator.wikimedia.org/P57689 and previous config saved to /var/cache/conftool/dbconfig/20240222-102906-arnaudb.json
  • 10:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 10:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 10:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 10:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 10:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T357189)', diff saved to https://phabricator.wikimedia.org/P57688 and previous config saved to /var/cache/conftool/dbconfig/20240222-102817-arnaudb.json
  • 10:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P57687 and previous config saved to /var/cache/conftool/dbconfig/20240222-101310-arnaudb.json
  • 10:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2195 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57686 and previous config saved to /var/cache/conftool/dbconfig/20240222-101123-arnaudb.json
  • 10:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57685 and previous config saved to /var/cache/conftool/dbconfig/20240222-101018-arnaudb.json
  • 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57684 and previous config saved to /var/cache/conftool/dbconfig/20240222-100140-root.json
  • 09:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P57683 and previous config saved to /var/cache/conftool/dbconfig/20240222-095804-arnaudb.json
  • 09:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2195 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57682 and previous config saved to /var/cache/conftool/dbconfig/20240222-095619-arnaudb.json
  • 09:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57681 and previous config saved to /var/cache/conftool/dbconfig/20240222-095513-arnaudb.json
  • 09:46 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57680 and previous config saved to /var/cache/conftool/dbconfig/20240222-094635-root.json
  • 09:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T357189)', diff saved to https://phabricator.wikimedia.org/P57679 and previous config saved to /var/cache/conftool/dbconfig/20240222-094257-arnaudb.json
  • 09:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2195 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57678 and previous config saved to /var/cache/conftool/dbconfig/20240222-094114-arnaudb.json
  • 09:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57677 and previous config saved to /var/cache/conftool/dbconfig/20240222-094008-arnaudb.json
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57675 and previous config saved to /var/cache/conftool/dbconfig/20240222-093130-root.json
  • 09:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db2195 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57674 and previous config saved to /var/cache/conftool/dbconfig/20240222-092609-arnaudb.json
  • 09:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57673 and previous config saved to /var/cache/conftool/dbconfig/20240222-092503-arnaudb.json
  • 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57672 and previous config saved to /var/cache/conftool/dbconfig/20240222-091626-root.json
  • 09:03 jayme: restart prometheus@k8s in eqiad - T343529
  • 09:01 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57671 and previous config saved to /var/cache/conftool/dbconfig/20240222-090121-root.json
  • 09:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2143.codfw.wmnet
  • 09:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2195.codfw.wmnet
  • 08:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1180.eqiad.wmnet
  • 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: After migration', diff saved to https://phabricator.wikimedia.org/P57670 and previous config saved to /var/cache/conftool/dbconfig/20240222-085800-root.json
  • 08:56 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2195.codfw.wmnet
  • 08:55 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2143.codfw.wmnet
  • 08:55 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db1180.eqiad.wmnet
  • 08:55 arnaudb@cumin1002: dbctl commit (dc=all): 'T356240 - depooling db1187 db2143 db2195', diff saved to https://phabricator.wikimedia.org/P57669 and previous config saved to /var/cache/conftool/dbconfig/20240222-085521-arnaudb.json
  • 08:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2143,2195].codfw.wmnet,db1187.eqiad.wmnet with reason: Silence for reboot T356240
  • 08:52 jayme: rolling out prometheus-rsyslog-exporter 1.0.0+git20221110-1 to wikikube nodes - T357616
  • 08:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2143,2195].codfw.wmnet,db1187.eqiad.wmnet with reason: Silence for reboot T356240
  • 08:46 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57668 and previous config saved to /var/cache/conftool/dbconfig/20240222-084616-root.json
  • 08:44 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetmaster1002.eqiad.wmnet
  • 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: After migration', diff saved to https://phabricator.wikimedia.org/P57667 and previous config saved to /var/cache/conftool/dbconfig/20240222-084255-root.json
  • 08:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T357189)', diff saved to https://phabricator.wikimedia.org/P57666 and previous config saved to /var/cache/conftool/dbconfig/20240222-084235-arnaudb.json
  • 08:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:42 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host puppetmaster1002.eqiad.wmnet
  • 08:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:31 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 1%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57665 and previous config saved to /var/cache/conftool/dbconfig/20240222-083111-root.json
  • 08:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2033.codfw.wmnet with OS bookworm
  • 08:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 18779
  • 08:28 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 18779
  • 08:27 marostegui@cumin1002: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: After migration', diff saved to https://phabricator.wikimedia.org/P57664 and previous config saved to /var/cache/conftool/dbconfig/20240222-082750-root.json
  • 08:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138997
  • 08:24 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 138997
  • 08:24 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 138997
  • 08:23 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 138997
  • 08:21 hoo@deploy2002: Finished scap: Backport for Migrate to virtual domain mapping (T348526), Migrate to virtual domain mapping (T348526) (duration: 14m 44s)
  • 08:20 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 08:20 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s3
  • 08:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2033.codfw.wmnet with reason: host reimage
  • 08:13 hoo@deploy2002: hoo: Continuing with sync
  • 08:12 marostegui@cumin1002: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: After migration', diff saved to https://phabricator.wikimedia.org/P57663 and previous config saved to /var/cache/conftool/dbconfig/20240222-081243-root.json
  • 08:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2033.codfw.wmnet with reason: host reimage
  • 08:08 hoo@deploy2002: hoo: Backport for Migrate to virtual domain mapping (T348526), Migrate to virtual domain mapping (T348526) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:06 hoo@deploy2002: Started scap: Backport for Migrate to virtual domain mapping (T348526), Migrate to virtual domain mapping (T348526)
  • 07:58 taavi: taavi@puppetmaster1002 ~ $ sudo systemctl restart apache2 # lots of 'Error 500 on SERVER: Server Error: undefined method `content' for nil:NilClass' in the logs, seems to have helped
  • 07:57 marostegui@cumin1002: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: After migration', diff saved to https://phabricator.wikimedia.org/P57662 and previous config saved to /var/cache/conftool/dbconfig/20240222-075738-root.json
  • 07:54 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es2033.codfw.wmnet with OS bookworm
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2137 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57661 and previous config saved to /var/cache/conftool/dbconfig/20240222-075448-root.json
  • 07:42 marostegui@cumin1002: dbctl commit (dc=all): 'es1033 (re)pooling @ 5%: After migration', diff saved to https://phabricator.wikimedia.org/P57660 and previous config saved to /var/cache/conftool/dbconfig/20240222-074233-root.json
  • 07:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2033 T358080', diff saved to https://phabricator.wikimedia.org/P57659 and previous config saved to /var/cache/conftool/dbconfig/20240222-074042-root.json
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2137 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57658 and previous config saved to /var/cache/conftool/dbconfig/20240222-073943-root.json
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2026 as es2 codfw master T358080', diff saved to https://phabricator.wikimedia.org/P57657 and previous config saved to /var/cache/conftool/dbconfig/20240222-073017-marostegui.json
  • 07:27 marostegui@cumin1002: dbctl commit (dc=all): 'es1033 (re)pooling @ 1%: After migration', diff saved to https://phabricator.wikimedia.org/P57656 and previous config saved to /var/cache/conftool/dbconfig/20240222-072729-root.json
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2137 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57655 and previous config saved to /var/cache/conftool/dbconfig/20240222-072438-root.json
  • 07:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1033.eqiad.wmnet with OS bookworm
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2137 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57654 and previous config saved to /var/cache/conftool/dbconfig/20240222-070933-root.json
  • 06:58 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on es1033.eqiad.wmnet with reason: host reimage
  • 06:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1033.eqiad.wmnet with reason: host reimage
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2137 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57653 and previous config saved to /var/cache/conftool/dbconfig/20240222-065428-root.json
  • 06:48 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
  • 06:48 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 06:48 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 06:47 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s3
  • 06:47 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s1
  • 06:46 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s1
  • 06:46 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s3
  • 06:44 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1033.eqiad.wmnet with OS bookworm
  • 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1033 T358080', diff saved to https://phabricator.wikimedia.org/P57652 and previous config saved to /var/cache/conftool/dbconfig/20240222-064253-root.json
  • 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1030 as es2 master T358080', diff saved to https://phabricator.wikimedia.org/P57651 and previous config saved to /var/cache/conftool/dbconfig/20240222-064205-marostegui.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2137 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57650 and previous config saved to /var/cache/conftool/dbconfig/20240222-063923-root.json
  • 01:29 eileen: config revision changed from 5bdfab7a to b221a95a
  • 01:28 eileen: config revision changed from 5bdfab7a to b221a95a
  • 01:27 eileen: civicrm upgraded from cd839468 to c50fcae3
  • 00:43 rzl: rzl@lists1001:~$ sudo systemctl restart mailman3 # T358020
  • 00:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T357189)', diff saved to https://phabricator.wikimedia.org/P57649 and previous config saved to /var/cache/conftool/dbconfig/20240222-001210-arnaudb.json

2024-02-21

  • 23:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P57648 and previous config saved to /var/cache/conftool/dbconfig/20240221-235703-arnaudb.json
  • 23:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P57647 and previous config saved to /var/cache/conftool/dbconfig/20240221-234156-arnaudb.json
  • 23:37 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:37 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:28 eileen: config revision changed from c6fc16bb to 5bdfab7a
  • 23:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T357189)', diff saved to https://phabricator.wikimedia.org/P57646 and previous config saved to /var/cache/conftool/dbconfig/20240221-232649-arnaudb.json
  • 23:24 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:24 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T357189)', diff saved to https://phabricator.wikimedia.org/P57645 and previous config saved to /var/cache/conftool/dbconfig/20240221-225350-arnaudb.json
  • 22:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T357189)', diff saved to https://phabricator.wikimedia.org/P57644 and previous config saved to /var/cache/conftool/dbconfig/20240221-225326-arnaudb.json
  • 22:51 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:50 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P57643 and previous config saved to /var/cache/conftool/dbconfig/20240221-223819-arnaudb.json
  • 22:29 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@8a290df]: new allowlisted endpoints for wdqs (duration: 11m 59s)
  • 22:25 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:25 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P57642 and previous config saved to /var/cache/conftool/dbconfig/20240221-222313-arnaudb.json
  • 22:20 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:20 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:17 ryankemper@deploy2002: Started deploy [wdqs/wdqs@8a290df]: new allowlisted endpoints for wdqs
  • 22:12 Dreamy_Jazz: Evening UTC backport window done
  • 22:10 ryankemper: [WDQS] T355868 Depooling `wdqs2024`, `wdqs2014,` `wdqs2010` in anticipation of row maintenance
  • 22:08 dreamyjazz@deploy2002: Finished scap: Backport for Pin wgGlobalBlockingAllowGlobalAccountBlocks as false on WMF wikis (T356923 T356924) (duration: 10m 16s)
  • 22:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T357189)', diff saved to https://phabricator.wikimedia.org/P57641 and previous config saved to /var/cache/conftool/dbconfig/20240221-220807-arnaudb.json
  • 22:02 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2041*,elastic2042*,elastic2057*,elastic2063*,elastic2064*,elastic2077*,elastic2078*,elastic2092*,elastic2093*,elastic2094* for switch maintenance - bking@cumin2002 - T355860
  • 22:02 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2041*,elastic2042*,elastic2057*,elastic2063*,elastic2064*,elastic2077*,elastic2078*,elastic2092*,elastic2093*,elastic2094* for switch maintenance - bking@cumin2002 - T355860
  • 22:00 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
  • 22:00 dreamyjazz@deploy2002: dreamyjazz: Backport for Pin wgGlobalBlockingAllowGlobalAccountBlocks as false on WMF wikis (T356923 T356924) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:58 dreamyjazz@deploy2002: Started scap: Backport for Pin wgGlobalBlockingAllowGlobalAccountBlocks as false on WMF wikis (T356923 T356924)
  • 21:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T357189)', diff saved to https://phabricator.wikimedia.org/P57640 and previous config saved to /var/cache/conftool/dbconfig/20240221-215620-arnaudb.json
  • 21:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 21:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 21:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T357189)', diff saved to https://phabricator.wikimedia.org/P57639 and previous config saved to /var/cache/conftool/dbconfig/20240221-215558-arnaudb.json
  • 21:54 jhuneidi@deploy2002: Finished scap: Backport for cswiki, commonswiki, enwiki: fix IP cap date and IP for WikiGap Editathon (T357978) (duration: 10m 47s)
  • 21:52 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1034.eqiad.wmnet with reason: Bootstrapping — T354560
  • 21:52 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase1034.eqiad.wmnet with reason: Bootstrapping — T354560
  • 21:51 urandom: boostrapping Cassandra, restbase1034-{a,b,c} — T354560
  • 21:46 jhuneidi@deploy2002: anzx and jhuneidi: Continuing with sync
  • 21:45 jhuneidi@deploy2002: anzx and jhuneidi: Backport for cswiki, commonswiki, enwiki: fix IP cap date and IP for WikiGap Editathon (T357978) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncmonitor1001.eqiad.wmnet with OS bookworm
  • 21:43 jhuneidi@deploy2002: Started scap: Backport for cswiki, commonswiki, enwiki: fix IP cap date and IP for WikiGap Editathon (T357978)
  • 21:42 jhuneidi@deploy2002: Finished scap: Backport for Remove Japanese Wikipedia from projects sharing user scripts (T301212), Enable night mode on beta cluster (T357759) (duration: 15m 25s)
  • 21:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P57638 and previous config saved to /var/cache/conftool/dbconfig/20240221-214052-arnaudb.json
  • 21:34 jhuneidi@deploy2002: jdlrobson and jhuneidi: Continuing with sync
  • 21:32 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 21:31 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 21:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncmonitor1001.eqiad.wmnet with reason: host reimage
  • 21:29 jhuneidi@deploy2002: jdlrobson and jhuneidi: Backport for Remove Japanese Wikipedia from projects sharing user scripts (T301212), Enable night mode on beta cluster (T357759) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:27 jhuneidi@deploy2002: Started scap: Backport for Remove Japanese Wikipedia from projects sharing user scripts (T301212), Enable night mode on beta cluster (T357759)
  • 21:27 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncmonitor1001.eqiad.wmnet with reason: host reimage
  • 21:26 jhuneidi@deploy2002: Finished scap: Backport for Turn on Parsoid read views by default on officewiki (T355566) (duration: 15m 19s)
  • 21:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P57637 and previous config saved to /var/cache/conftool/dbconfig/20240221-212546-arnaudb.json
  • 21:24 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 21:24 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 21:19 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:18 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 21:18 jhuneidi@deploy2002: cscott and jhuneidi: Continuing with sync
  • 21:17 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 21:17 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncmonitor1001.eqiad.wmnet with OS bookworm
  • 21:17 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 21:12 jhuneidi@deploy2002: cscott and jhuneidi: Backport for Turn on Parsoid read views by default on officewiki (T355566) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:11 jhuneidi@deploy2002: Started scap: Backport for Turn on Parsoid read views by default on officewiki (T355566)
  • 21:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T357189)', diff saved to https://phabricator.wikimedia.org/P57636 and previous config saved to /var/cache/conftool/dbconfig/20240221-211039-arnaudb.json
  • 21:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T357189)', diff saved to https://phabricator.wikimedia.org/P57635 and previous config saved to /var/cache/conftool/dbconfig/20240221-210001-arnaudb.json
  • 20:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 20:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 20:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T357189)', diff saved to https://phabricator.wikimedia.org/P57634 and previous config saved to /var/cache/conftool/dbconfig/20240221-205922-arnaudb.json
  • 20:54 jhuneidi@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.19 refs T354437 (duration: 08m 35s)
  • 20:46 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.19 refs T354437
  • 20:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P57633 and previous config saved to /var/cache/conftool/dbconfig/20240221-204415-arnaudb.json
  • 20:39 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:39 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:33 ejegg: turned off nightly recurring charge job for Autorescue deployment
  • 20:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P57632 and previous config saved to /var/cache/conftool/dbconfig/20240221-202906-arnaudb.json
  • 20:16 jhuneidi@deploy2002: scap failed: average error rate on 4/4 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
  • 20:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T357189)', diff saved to https://phabricator.wikimedia.org/P57631 and previous config saved to /var/cache/conftool/dbconfig/20240221-201400-arnaudb.json
  • 20:11 jhuneidi@deploy2002: Finished scap: Backport for CentralAuthHooks::onGetUserBlock: Only run for reg. users (T358112) (duration: 14m 09s)
  • 20:03 jhuneidi@deploy2002: jhuneidi and matmarex: Continuing with sync
  • 20:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T357189)', diff saved to https://phabricator.wikimedia.org/P57630 and previous config saved to /var/cache/conftool/dbconfig/20240221-200209-arnaudb.json
  • 20:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 20:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 20:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T357189)', diff saved to https://phabricator.wikimedia.org/P57629 and previous config saved to /var/cache/conftool/dbconfig/20240221-200148-arnaudb.json
  • 19:58 jhuneidi@deploy2002: jhuneidi and matmarex: Backport for CentralAuthHooks::onGetUserBlock: Only run for reg. users (T358112) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:57 jhuneidi@deploy2002: Started scap: Backport for CentralAuthHooks::onGetUserBlock: Only run for reg. users (T358112)
  • 19:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T355609)', diff saved to https://phabricator.wikimedia.org/P57628 and previous config saved to /var/cache/conftool/dbconfig/20240221-195157-marostegui.json
  • 19:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P57627 and previous config saved to /var/cache/conftool/dbconfig/20240221-194641-arnaudb.json
  • 19:38 inflatador: bking@deploy2002 deleting old flink data from thanos-swift T348685
  • 19:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P57626 and previous config saved to /var/cache/conftool/dbconfig/20240221-193650-marostegui.json
  • 19:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P57625 and previous config saved to /var/cache/conftool/dbconfig/20240221-193135-arnaudb.json
  • 19:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P57624 and previous config saved to /var/cache/conftool/dbconfig/20240221-192144-marostegui.json
  • 19:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T357189)', diff saved to https://phabricator.wikimedia.org/P57623 and previous config saved to /var/cache/conftool/dbconfig/20240221-191628-arnaudb.json
  • 19:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T355609)', diff saved to https://phabricator.wikimedia.org/P57622 and previous config saved to /var/cache/conftool/dbconfig/20240221-190637-marostegui.json
  • 19:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T357189)', diff saved to https://phabricator.wikimedia.org/P57621 and previous config saved to /var/cache/conftool/dbconfig/20240221-190311-arnaudb.json
  • 19:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 19:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 19:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T357189)', diff saved to https://phabricator.wikimedia.org/P57620 and previous config saved to /var/cache/conftool/dbconfig/20240221-190249-arnaudb.json
  • 18:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P57619 and previous config saved to /var/cache/conftool/dbconfig/20240221-184743-arnaudb.json
  • 18:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T355609)', diff saved to https://phabricator.wikimedia.org/P57618 and previous config saved to /var/cache/conftool/dbconfig/20240221-184144-marostegui.json
  • 18:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T355609)', diff saved to https://phabricator.wikimedia.org/P57617 and previous config saved to /var/cache/conftool/dbconfig/20240221-184120-marostegui.json
  • 18:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P57616 and previous config saved to /var/cache/conftool/dbconfig/20240221-183236-arnaudb.json
  • 18:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P57615 and previous config saved to /var/cache/conftool/dbconfig/20240221-182614-marostegui.json
  • 18:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T357189)', diff saved to https://phabricator.wikimedia.org/P57614 and previous config saved to /var/cache/conftool/dbconfig/20240221-181729-arnaudb.json
  • 18:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P57613 and previous config saved to /var/cache/conftool/dbconfig/20240221-181107-marostegui.json
  • 18:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2121 (T357189)', diff saved to https://phabricator.wikimedia.org/P57612 and previous config saved to /var/cache/conftool/dbconfig/20240221-180103-arnaudb.json
  • 18:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 18:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 18:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T357189)', diff saved to https://phabricator.wikimedia.org/P57611 and previous config saved to /var/cache/conftool/dbconfig/20240221-180041-arnaudb.json
  • 17:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T355609)', diff saved to https://phabricator.wikimedia.org/P57610 and previous config saved to /var/cache/conftool/dbconfig/20240221-175601-marostegui.json
  • 17:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P57609 and previous config saved to /var/cache/conftool/dbconfig/20240221-174534-arnaudb.json
  • 17:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P57608 and previous config saved to /var/cache/conftool/dbconfig/20240221-173028-arnaudb.json
  • 17:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T355609)', diff saved to https://phabricator.wikimedia.org/P57607 and previous config saved to /var/cache/conftool/dbconfig/20240221-172731-marostegui.json
  • 17:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 17:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 17:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T355609)', diff saved to https://phabricator.wikimedia.org/P57606 and previous config saved to /var/cache/conftool/dbconfig/20240221-172709-marostegui.json
  • 17:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T357189)', diff saved to https://phabricator.wikimedia.org/P57605 and previous config saved to /var/cache/conftool/dbconfig/20240221-171521-arnaudb.json
  • 17:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P57604 and previous config saved to /var/cache/conftool/dbconfig/20240221-171203-marostegui.json
  • 17:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-redacteddb1001.eqiad.wmnet with OS bullseye
  • 17:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2120 (T357189)', diff saved to https://phabricator.wikimedia.org/P57603 and previous config saved to /var/cache/conftool/dbconfig/20240221-170157-arnaudb.json
  • 17:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 17:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 17:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T357189)', diff saved to https://phabricator.wikimedia.org/P57602 and previous config saved to /var/cache/conftool/dbconfig/20240221-170134-arnaudb.json
  • 16:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P57601 and previous config saved to /var/cache/conftool/dbconfig/20240221-165657-marostegui.json
  • 16:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2106 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57600 and previous config saved to /var/cache/conftool/dbconfig/20240221-165651-arnaudb.json
  • 16:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57599 and previous config saved to /var/cache/conftool/dbconfig/20240221-165644-arnaudb.json
  • 16:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P57598 and previous config saved to /var/cache/conftool/dbconfig/20240221-164628-arnaudb.json
  • 16:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T355609)', diff saved to https://phabricator.wikimedia.org/P57597 and previous config saved to /var/cache/conftool/dbconfig/20240221-164150-marostegui.json
  • 16:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2106 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57596 and previous config saved to /var/cache/conftool/dbconfig/20240221-164146-arnaudb.json
  • 16:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57595 and previous config saved to /var/cache/conftool/dbconfig/20240221-164140-arnaudb.json
  • 16:34 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57594 and previous config saved to /var/cache/conftool/dbconfig/20240221-163433-root.json
  • 16:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P57593 and previous config saved to /var/cache/conftool/dbconfig/20240221-163122-arnaudb.json
  • 16:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db2106 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57592 and previous config saved to /var/cache/conftool/dbconfig/20240221-162641-arnaudb.json
  • 16:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57591 and previous config saved to /var/cache/conftool/dbconfig/20240221-162635-arnaudb.json
  • 16:25 claime: Uncordoning kubernetes2025.codfw.wmnet kubernetes2026.codfw.wmnet following codfw A8 network migration - T355874
  • 16:24 cgoubert@cumin2002: conftool action : set/pooled=yes; selector: name=parse200(4|5).*
  • 16:24 claime: Repooling parse2004.codfw.wmnet parse2005.codfw.wmnet following codfw A8 network migration - T355874
  • 16:19 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57590 and previous config saved to /var/cache/conftool/dbconfig/20240221-161928-root.json
  • 16:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T357189)', diff saved to https://phabricator.wikimedia.org/P57589 and previous config saved to /var/cache/conftool/dbconfig/20240221-161615-arnaudb.json
  • 16:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T355609)', diff saved to https://phabricator.wikimedia.org/P57588 and previous config saved to /var/cache/conftool/dbconfig/20240221-161407-marostegui.json
  • 16:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 16:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 16:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T355609)', diff saved to https://phabricator.wikimedia.org/P57587 and previous config saved to /var/cache/conftool/dbconfig/20240221-161345-marostegui.json
  • 16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2106 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57586 and previous config saved to /var/cache/conftool/dbconfig/20240221-161136-arnaudb.json
  • 16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2146 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57585 and previous config saved to /var/cache/conftool/dbconfig/20240221-161129-arnaudb.json
  • 16:09 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2137.codfw.wmnet with OS bookworm
  • 16:06 jayme: imported prometheus-rsyslog-exporter 1.0.0+git20221110-1 to buster,bullseye,bookworm - T357616
  • 16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2108 (T357189)', diff saved to https://phabricator.wikimedia.org/P57584 and previous config saved to /var/cache/conftool/dbconfig/20240221-160511-arnaudb.json
  • 16:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 16:04 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 16:04 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 16:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57583 and previous config saved to /var/cache/conftool/dbconfig/20240221-160423-root.json
  • 16:03 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 16:03 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 16:02 topranks: Commencing network maintenance migrating servers to new switch codfw rack A8 T355874
  • 15:59 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: Migrating servers in codfw rack A7 to lsw1-a7-codfw
  • 15:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: Migrating servers in codfw rack A7 to lsw1-a7-codfw
  • 15:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P57582 and previous config saved to /var/cache/conftool/dbconfig/20240221-155839-marostegui.json
  • 15:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-a-codfw,cr[1-2]-codfw,lsw1-a8-codfw.mgmt with reason: prepping for server uplink migration codfw rack a8
  • 15:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on asw-a-codfw,cr[1-2]-codfw,lsw1-a8-codfw.mgmt with reason: prepping for server uplink migration codfw rack a8
  • 15:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 15:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 15:55 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 15:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 15:52 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 15:51 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 15:49 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57581 and previous config saved to /var/cache/conftool/dbconfig/20240221-154918-root.json
  • 15:47 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2137.codfw.wmnet with reason: host reimage
  • 15:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:44 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2137.codfw.wmnet with reason: host reimage
  • 15:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P57580 and previous config saved to /var/cache/conftool/dbconfig/20240221-154333-marostegui.json
  • 15:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on db2106.codfw.wmnet with reason: T355874 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on db2106.codfw.wmnet with reason: T355874 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on db2146.codfw.wmnet with reason: T355874 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on db2146.codfw.wmnet with reason: T355874 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:40 arnaudb@cumin1002: dbctl commit (dc=all): 'T355874 - depooling db2146 db2106', diff saved to https://phabricator.wikimedia.org/P57579 and previous config saved to /var/cache/conftool/dbconfig/20240221-154056-arnaudb.json
  • 15:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 15:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 15:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T357189)', diff saved to https://phabricator.wikimedia.org/P57578 and previous config saved to /var/cache/conftool/dbconfig/20240221-153926-arnaudb.json
  • 15:34 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57577 and previous config saved to /var/cache/conftool/dbconfig/20240221-153414-root.json
  • 15:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T355609)', diff saved to https://phabricator.wikimedia.org/P57576 and previous config saved to /var/cache/conftool/dbconfig/20240221-152826-marostegui.json
  • 15:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P57575 and previous config saved to /var/cache/conftool/dbconfig/20240221-152420-arnaudb.json
  • 15:21 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host db2137.codfw.wmnet with OS bookworm
  • 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57574 and previous config saved to /var/cache/conftool/dbconfig/20240221-151909-root.json
  • 15:12 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:12 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 15:10 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:10 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for private1-b-codfw - cmooney@cumin1002"
  • 14:55 TheresNoTime: UTC afternoon backport window done
  • 14:54 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:54 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:54 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for private1-a-codfw - cmooney@cumin1002"
  • 14:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T357189)', diff saved to https://phabricator.wikimedia.org/P57570 and previous config saved to /var/cache/conftool/dbconfig/20240221-145407-arnaudb.json
  • 14:53 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for private1-a-codfw - cmooney@cumin1002"
  • 14:53 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:52 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:49 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:47 TheresNoTime: [samtar@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki hewikinews --fix #T349581
  • 14:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T357189)', diff saved to https://phabricator.wikimedia.org/P57569 and previous config saved to /var/cache/conftool/dbconfig/20240221-144702-arnaudb.json
  • 14:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 14:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 14:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T357189)', diff saved to https://phabricator.wikimedia.org/P57568 and previous config saved to /var/cache/conftool/dbconfig/20240221-144641-arnaudb.json
  • 14:46 samtar@deploy2002: Finished scap: Backport for cswiki, commonswiki, enwiki: Lift IP cap for WikiGap Editathon, mywiki: create portal and draft namespace (T352424) (duration: 20m 23s)
  • 14:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P57567 and previous config saved to /var/cache/conftool/dbconfig/20240221-144536-marostegui.json
  • 14:44 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2026.codfw.wmnet with reason: host reimage
  • 14:44 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 14:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:42 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2026.codfw.wmnet with reason: host reimage
  • 14:40 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 14:38 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:38 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for private1-a-codfw - cmooney@cumin1002"
  • 14:37 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for private1-a-codfw - cmooney@cumin1002"
  • 14:37 samtar@deploy2002: samtar and anzx: Continuing with sync
  • 14:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 14:33 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:33 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:33 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57566 and previous config saved to /var/cache/conftool/dbconfig/20240221-143239-arnaudb.json
  • 14:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P57565 and previous config saved to /var/cache/conftool/dbconfig/20240221-143133-arnaudb.json
  • 14:31 arnaudb@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57564 and previous config saved to /var/cache/conftool/dbconfig/20240221-143120-arnaudb.json
  • 14:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P57563 and previous config saved to /var/cache/conftool/dbconfig/20240221-143030-marostegui.json
  • 14:27 samtar@deploy2002: samtar and anzx: Backport for cswiki, commonswiki, enwiki: Lift IP cap for WikiGap Editathon, mywiki: create portal and draft namespace (T352424) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:26 samtar@deploy2002: Started scap: Backport for cswiki, commonswiki, enwiki: Lift IP cap for WikiGap Editathon, mywiki: create portal and draft namespace (T352424)
  • 14:24 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host es2026.codfw.wmnet with OS bookworm
  • 14:23 samtar@deploy2002: Finished scap: Backport for zhwiki: Create group ipblock-exempt-grantor (T357991) (duration: 11m 05s)
  • 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new apt server in codfw - jmm@cumin2002 - T331613"
  • 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new apt server in codfw - jmm@cumin2002 - T331613"
  • 14:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57562 and previous config saved to /var/cache/conftool/dbconfig/20240221-141734-arnaudb.json
  • 14:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P57561 and previous config saved to /var/cache/conftool/dbconfig/20240221-141627-arnaudb.json
  • 14:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57560 and previous config saved to /var/cache/conftool/dbconfig/20240221-141615-arnaudb.json
  • 14:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T355609)', diff saved to https://phabricator.wikimedia.org/P57559 and previous config saved to /var/cache/conftool/dbconfig/20240221-141523-marostegui.json
  • 14:15 samtar@deploy2002: stang and samtar: Continuing with sync
  • 14:13 samtar@deploy2002: stang and samtar: Backport for zhwiki: Create group ipblock-exempt-grantor (T357991) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:12 samtar@deploy2002: Started scap: Backport for zhwiki: Create group ipblock-exempt-grantor (T357991)
  • 14:10 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 14:08 claime: restarted ferm.service on kubernetes2055.codfw.wmnet mw2440.codfw.wmnet mw2297.codfw.wmnet kubernetes2016.codfw.wmnet - T354855
  • 14:07 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 14:05 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host apt2002.wikimedia.org
  • 14:05 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host apt2002.wikimedia.org with OS bookworm
  • 14:02 arnaudb@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57558 and previous config saved to /var/cache/conftool/dbconfig/20240221-140229-arnaudb.json
  • 14:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T357189)', diff saved to https://phabricator.wikimedia.org/P57557 and previous config saved to /var/cache/conftool/dbconfig/20240221-140120-arnaudb.json
  • 14:01 arnaudb@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57556 and previous config saved to /var/cache/conftool/dbconfig/20240221-140110-arnaudb.json
  • 13:59 topranks: adding IRB anycast interface on private1-a-codfw vlan to lsw1-a4-codfw
  • 13:50 aborrero@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T357189)', diff saved to https://phabricator.wikimedia.org/P57555 and previous config saved to /var/cache/conftool/dbconfig/20240221-135031-arnaudb.json
  • 13:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 13:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T357189)', diff saved to https://phabricator.wikimedia.org/P57554 and previous config saved to /var/cache/conftool/dbconfig/20240221-135009-arnaudb.json
  • 13:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57553 and previous config saved to /var/cache/conftool/dbconfig/20240221-134724-arnaudb.json
  • 13:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2142.codfw.wmnet
  • 13:46 arnaudb@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57552 and previous config saved to /var/cache/conftool/dbconfig/20240221-134605-arnaudb.json
  • 13:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1180.eqiad.wmnet
  • 13:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1213.eqiad.wmnet
  • 13:41 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2142.codfw.wmnet
  • 13:41 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db1213.eqiad.wmnet
  • 13:40 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db1180.eqiad.wmnet
  • 13:40 Dreamy_Jazz: Re-started MediaModeration scanning script using `mwscript extensions/MediaModeration/maintenance/scanFilesInScanTable.php --wiki=commonswiki --use-jobqueue --sleep 30 --verbose 2>&1 | tee ~/scan-files-in-scan-table-commonswiki-sleep-30-no-render-now.txt` - See T351400
  • 13:40 arnaudb@cumin1002: dbctl commit (dc=all): 'T356240 - depooling db1180 db1213 db2142', diff saved to https://phabricator.wikimedia.org/P57551 and previous config saved to /var/cache/conftool/dbconfig/20240221-134015-arnaudb.json
  • 13:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db2142.codfw.wmnet,db[1180,1213].eqiad.wmnet with reason: Silence for reboot T356240
  • 13:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db2142.codfw.wmnet,db[1180,1213].eqiad.wmnet with reason: Silence for reboot T356240
  • 13:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P57550 and previous config saved to /var/cache/conftool/dbconfig/20240221-133503-arnaudb.json
  • 13:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apt2002.wikimedia.org with reason: host reimage
  • 13:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on apt2002.wikimedia.org with reason: host reimage
  • 13:22 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 13:22 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 13:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T355609)', diff saved to https://phabricator.wikimedia.org/P57549 and previous config saved to /var/cache/conftool/dbconfig/20240221-132156-marostegui.json
  • 13:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T355609)', diff saved to https://phabricator.wikimedia.org/P57548 and previous config saved to /var/cache/conftool/dbconfig/20240221-132134-marostegui.json
  • 13:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P57547 and previous config saved to /var/cache/conftool/dbconfig/20240221-131957-arnaudb.json
  • 13:18 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host apt2002.wikimedia.org with OS bookworm
  • 13:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt2002.wikimedia.org - jmm@cumin2002"
  • 13:15 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt2002.wikimedia.org - jmm@cumin2002"
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apt2002.wikimedia.org on all recursors
  • 13:14 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache apt2002.wikimedia.org on all recursors
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt2002.wikimedia.org - jmm@cumin2002"
  • 13:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt2002.wikimedia.org - jmm@cumin2002"
  • 13:11 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host apt2002.wikimedia.org
  • 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 13:08 samtar@deploy2002: Finished scap: Backport for InitialiseSettings: Enable Edit Recovery on 3 projects (T355548) (duration: 14m 36s)
  • 13:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P57546 and previous config saved to /var/cache/conftool/dbconfig/20240221-130628-marostegui.json
  • 13:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T357189)', diff saved to https://phabricator.wikimedia.org/P57545 and previous config saved to /var/cache/conftool/dbconfig/20240221-130450-arnaudb.json
  • 13:03 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudvirt1033 - aborrero@cumin1002"
  • 13:02 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudvirt1033 - aborrero@cumin1002"
  • 13:00 samtar@deploy2002: samtar: Continuing with sync
  • 12:57 Daimona: T357007 Running mwscript /home/daimona/GenerateInvitationList.php --wiki=metawiki --listfile=/home/daimona/list.txt (same as current master)
  • 12:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T357189)', diff saved to https://phabricator.wikimedia.org/P57544 and previous config saved to /var/cache/conftool/dbconfig/20240221-125711-arnaudb.json
  • 12:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 12:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T357189)', diff saved to https://phabricator.wikimedia.org/P57543 and previous config saved to /var/cache/conftool/dbconfig/20240221-125648-arnaudb.json
  • 12:55 samtar@deploy2002: samtar: Backport for InitialiseSettings: Enable Edit Recovery on 3 projects (T355548) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:53 samtar@deploy2002: Started scap: Backport for InitialiseSettings: Enable Edit Recovery on 3 projects (T355548)
  • 12:52 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P57542 and previous config saved to /var/cache/conftool/dbconfig/20240221-125121-marostegui.json
  • 12:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P57541 and previous config saved to /var/cache/conftool/dbconfig/20240221-124142-arnaudb.json
  • 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T355609)', diff saved to https://phabricator.wikimedia.org/P57540 and previous config saved to /var/cache/conftool/dbconfig/20240221-123615-marostegui.json
  • 12:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57539 and previous config saved to /var/cache/conftool/dbconfig/20240221-123439-arnaudb.json
  • 12:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57538 and previous config saved to /var/cache/conftool/dbconfig/20240221-123423-arnaudb.json
  • 12:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db2191 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57537 and previous config saved to /var/cache/conftool/dbconfig/20240221-123410-arnaudb.json
  • 12:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P57536 and previous config saved to /var/cache/conftool/dbconfig/20240221-122636-arnaudb.json
  • 12:24 akosiaris@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-parsoid,name=codfw
  • 12:24 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 12:22 kart_: Updated cxserver to 2024-02-21-112101-production (T357769)
  • 12:21 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 12:21 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2026.codfw.wmnet with OS bookworm
  • 12:20 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:20 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:20 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:20 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:20 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:20 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57535 and previous config saved to /var/cache/conftool/dbconfig/20240221-121934-arnaudb.json
  • 12:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57534 and previous config saved to /var/cache/conftool/dbconfig/20240221-121918-arnaudb.json
  • 12:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db2191 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57533 and previous config saved to /var/cache/conftool/dbconfig/20240221-121906-arnaudb.json
  • 12:18 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:18 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:15 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:15 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:15 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es2026.codfw.wmnet with OS bookworm
  • 12:15 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2026.codfw.wmnet with OS bookworm
  • 12:14 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:14 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:13 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:13 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:12 claime: mw-page-content-change-enrich: Switch to mw-api-int-async - T357785
  • 12:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T357189)', diff saved to https://phabricator.wikimedia.org/P57532 and previous config saved to /var/cache/conftool/dbconfig/20240221-121129-arnaudb.json
  • 12:10 akosiaris: restart pybal on lvs2013, lvs 1019 to pickup mw-parsoid service. T357392
  • 12:09 aborrero@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1033
  • 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T355609)', diff saved to https://phabricator.wikimedia.org/P57531 and previous config saved to /var/cache/conftool/dbconfig/20240221-120949-marostegui.json
  • 12:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 12:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 12:09 aborrero@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1033
  • 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T355609)', diff saved to https://phabricator.wikimedia.org/P57530 and previous config saved to /var/cache/conftool/dbconfig/20240221-120927-marostegui.json
  • 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57529 and previous config saved to /var/cache/conftool/dbconfig/20240221-120429-arnaudb.json
  • 12:05 aborrero@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 12:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57528 and previous config saved to /var/cache/conftool/dbconfig/20240221-120414-arnaudb.json
  • 12:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db2191 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57527 and previous config saved to /var/cache/conftool/dbconfig/20240221-120401-arnaudb.json
  • 12:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T357189)', diff saved to https://phabricator.wikimedia.org/P57526 and previous config saved to /var/cache/conftool/dbconfig/20240221-120345-arnaudb.json
  • 12:04 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es2026.codfw.wmnet with OS bookworm
  • 12:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 12:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 12:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T357189)', diff saved to https://phabricator.wikimedia.org/P57525 and previous config saved to /var/cache/conftool/dbconfig/20240221-120324-arnaudb.json
  • 12:02 akosiaris: restart pybal on lvs2014 to pickup mw-parsoid service. T357392
  • 12:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2026 T358080', diff saved to https://phabricator.wikimedia.org/P57524 and previous config saved to /var/cache/conftool/dbconfig/20240221-120202-root.json
  • 12:01 akosiaris: restart pybal on lvs1020 to pickup mw-parsoid service. T357392
  • 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57523 and previous config saved to /var/cache/conftool/dbconfig/20240221-120051-root.json
  • 11:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P57522 and previous config saved to /var/cache/conftool/dbconfig/20240221-115421-marostegui.json
  • 11:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57521 and previous config saved to /var/cache/conftool/dbconfig/20240221-114925-arnaudb.json
  • 11:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db2192 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57520 and previous config saved to /var/cache/conftool/dbconfig/20240221-114909-arnaudb.json
  • 11:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db2191 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57519 and previous config saved to /var/cache/conftool/dbconfig/20240221-114856-arnaudb.json
  • 11:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P57518 and previous config saved to /var/cache/conftool/dbconfig/20240221-114817-arnaudb.json
  • 11:45 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57517 and previous config saved to /var/cache/conftool/dbconfig/20240221-114546-root.json
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P57516 and previous config saved to /var/cache/conftool/dbconfig/20240221-113914-marostegui.json
  • 11:36 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:36 volans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added cassandra IPs for restbase10[34-42] - volans@cumin1002"
  • 11:35 volans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Added cassandra IPs for restbase10[34-42] - volans@cumin1002"
  • 11:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P57515 and previous config saved to /var/cache/conftool/dbconfig/20240221-113311-arnaudb.json
  • 11:32 volans@cumin1002: START - Cookbook sre.dns.netbox
  • 11:32 volans@cumin1002: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "Added cassandra IPs for restbase10[34-42] - volans@cumin1002"
  • 11:32 volans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Added cassandra IPs for restbase10[34-42] - volans@cumin1002"
  • 11:30 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57514 and previous config saved to /var/cache/conftool/dbconfig/20240221-113041-root.json
  • 11:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T355609)', diff saved to https://phabricator.wikimedia.org/P57513 and previous config saved to /var/cache/conftool/dbconfig/20240221-112408-marostegui.json
  • 11:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T357189)', diff saved to https://phabricator.wikimedia.org/P57512 and previous config saved to /var/cache/conftool/dbconfig/20240221-111805-arnaudb.json
  • 11:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1151.eqiad.wmnet
  • 11:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2191.codfw.wmnet
  • 11:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2192.codfw.wmnet
  • 11:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2193.codfw.wmnet
  • 11:15 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57511 and previous config saved to /var/cache/conftool/dbconfig/20240221-111536-root.json
  • 11:13 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57510 and previous config saved to /var/cache/conftool/dbconfig/20240221-111348-root.json
  • 11:13 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2192.codfw.wmnet
  • 11:12 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2193.codfw.wmnet
  • 11:12 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db1151.eqiad.wmnet
  • 11:12 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2191.codfw.wmnet
  • 11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'T356240 - depooling db2191 db2192 db2193 db1151', diff saved to https://phabricator.wikimedia.org/P57508 and previous config saved to /var/cache/conftool/dbconfig/20240221-111023-arnaudb.json
  • 11:11 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2191-2193].codfw.wmnet,db1151.eqiad.wmnet with reason: Silence for reboot T356240
  • 11:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T357189)', diff saved to https://phabricator.wikimedia.org/P57507 and previous config saved to /var/cache/conftool/dbconfig/20240221-111012-arnaudb.json
  • 11:11 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 11:11 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2191-2193].codfw.wmnet,db1151.eqiad.wmnet with reason: Silence for reboot T356240
  • 11:10 stran@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 11:10 stran@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 11:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 11:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T357189)', diff saved to https://phabricator.wikimedia.org/P57506 and previous config saved to /var/cache/conftool/dbconfig/20240221-110951-arnaudb.json
  • 11:09 stran@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver1001.eqiad.wmnet
  • 11:08 stran@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 11:08 stran@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 11:07 stran@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 11:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetserver1001.eqiad.wmnet
  • 11:05 stran@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 11:05 stran@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver2002.codfw.wmnet
  • 11:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetserver2002.codfw.wmnet
  • 11:00 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57505 and previous config saved to /var/cache/conftool/dbconfig/20240221-110031-root.json
  • 10:58 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57504 and previous config saved to /var/cache/conftool/dbconfig/20240221-105844-root.json
  • 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T355609)', diff saved to https://phabricator.wikimedia.org/P57503 and previous config saved to /var/cache/conftool/dbconfig/20240221-105654-marostegui.json
  • 10:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T355609)', diff saved to https://phabricator.wikimedia.org/P57502 and previous config saved to /var/cache/conftool/dbconfig/20240221-105630-marostegui.json
  • 10:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P57501 and previous config saved to /var/cache/conftool/dbconfig/20240221-105445-arnaudb.json
  • 10:45 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57500 and previous config saved to /var/cache/conftool/dbconfig/20240221-104526-root.json
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57499 and previous config saved to /var/cache/conftool/dbconfig/20240221-104339-root.json
  • 10:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P57498 and previous config saved to /var/cache/conftool/dbconfig/20240221-104124-marostegui.json
  • 10:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P57497 and previous config saved to /var/cache/conftool/dbconfig/20240221-103938-arnaudb.json
  • 10:37 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:36 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:35 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2031.codfw.wmnet with OS bookworm
  • 10:34 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:34 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:32 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:32 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57496 and previous config saved to /var/cache/conftool/dbconfig/20240221-102833-root.json
  • 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P57495 and previous config saved to /var/cache/conftool/dbconfig/20240221-102618-marostegui.json
  • 10:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T357189)', diff saved to https://phabricator.wikimedia.org/P57494 and previous config saved to /var/cache/conftool/dbconfig/20240221-102432-arnaudb.json
  • 10:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T357189)', diff saved to https://phabricator.wikimedia.org/P57493 and previous config saved to /var/cache/conftool/dbconfig/20240221-101646-arnaudb.json
  • 10:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2031.codfw.wmnet with reason: host reimage
  • 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57492 and previous config saved to /var/cache/conftool/dbconfig/20240221-101328-root.json
  • 10:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2031.codfw.wmnet with reason: host reimage
  • 10:12 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bookworm
  • 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T355609)', diff saved to https://phabricator.wikimedia.org/P57491 and previous config saved to /var/cache/conftool/dbconfig/20240221-101111-marostegui.json
  • 10:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 10:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 10:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T357189)', diff saved to https://phabricator.wikimedia.org/P57490 and previous config saved to /var/cache/conftool/dbconfig/20240221-100815-arnaudb.json
  • 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57489 and previous config saved to /var/cache/conftool/dbconfig/20240221-095823-root.json
  • 09:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 09:53 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 09:53 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es2031.codfw.wmnet with OS bookworm
  • 09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P57488 and previous config saved to /var/cache/conftool/dbconfig/20240221-095309-arnaudb.json
  • 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2031 T358080', diff saved to https://phabricator.wikimedia.org/P57487 and previous config saved to /var/cache/conftool/dbconfig/20240221-095205-root.json
  • 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T355609)', diff saved to https://phabricator.wikimedia.org/P57486 and previous config saved to /var/cache/conftool/dbconfig/20240221-094516-marostegui.json
  • 09:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57485 and previous config saved to /var/cache/conftool/dbconfig/20240221-094319-root.json
  • 09:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1030.eqiad.wmnet with OS bookworm
  • 09:40 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bookworm
  • 09:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P57484 and previous config saved to /var/cache/conftool/dbconfig/20240221-093802-arnaudb.json
  • 09:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1030.eqiad.wmnet with reason: host reimage
  • 09:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1030.eqiad.wmnet with reason: host reimage
  • 09:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T357189)', diff saved to https://phabricator.wikimedia.org/P57482 and previous config saved to /var/cache/conftool/dbconfig/20240221-092256-arnaudb.json
  • 09:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T355609)', diff saved to https://phabricator.wikimedia.org/P57481 and previous config saved to /var/cache/conftool/dbconfig/20240221-092251-marostegui.json
  • 09:15 arnaudb@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57480 and previous config saved to /var/cache/conftool/dbconfig/20240221-091531-arnaudb.json
  • 09:15 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57479 and previous config saved to /var/cache/conftool/dbconfig/20240221-091521-arnaudb.json
  • 09:15 arnaudb@cumin1002: dbctl commit (dc=all): 'db2188 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57478 and previous config saved to /var/cache/conftool/dbconfig/20240221-091509-arnaudb.json
  • 09:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57477 and previous config saved to /var/cache/conftool/dbconfig/20240221-091449-arnaudb.json
  • 09:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T357189)', diff saved to https://phabricator.wikimedia.org/P57476 and previous config saved to /var/cache/conftool/dbconfig/20240221-091358-arnaudb.json
  • 09:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T357189)', diff saved to https://phabricator.wikimedia.org/P57475 and previous config saved to /var/cache/conftool/dbconfig/20240221-091337-arnaudb.json
  • 09:10 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1030.eqiad.wmnet with OS bookworm
  • 09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1030 T358080', diff saved to https://phabricator.wikimedia.org/P57474 and previous config saved to /var/cache/conftool/dbconfig/20240221-090957-root.json
  • 09:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P57473 and previous config saved to /var/cache/conftool/dbconfig/20240221-090744-marostegui.json
  • 09:06 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 09:06 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s7
  • 09:00 arnaudb@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57472 and previous config saved to /var/cache/conftool/dbconfig/20240221-090026-arnaudb.json
  • 09:00 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57471 and previous config saved to /var/cache/conftool/dbconfig/20240221-090016-arnaudb.json
  • 09:00 hashar: Restarted CI Jenkins on contint2002 to update the timestamper plugin
  • 09:00 arnaudb@cumin1002: dbctl commit (dc=all): 'db2188 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57470 and previous config saved to /var/cache/conftool/dbconfig/20240221-090004-arnaudb.json
  • 08:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57469 and previous config saved to /var/cache/conftool/dbconfig/20240221-085944-arnaudb.json
  • 08:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P57468 and previous config saved to /var/cache/conftool/dbconfig/20240221-085830-arnaudb.json
  • 08:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P57467 and previous config saved to /var/cache/conftool/dbconfig/20240221-085238-marostegui.json
  • 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57466 and previous config saved to /var/cache/conftool/dbconfig/20240221-084521-arnaudb.json
  • 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57465 and previous config saved to /var/cache/conftool/dbconfig/20240221-084511-arnaudb.json
  • 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'db2188 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57464 and previous config saved to /var/cache/conftool/dbconfig/20240221-084459-arnaudb.json
  • 08:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57463 and previous config saved to /var/cache/conftool/dbconfig/20240221-084440-arnaudb.json
  • 08:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P57462 and previous config saved to /var/cache/conftool/dbconfig/20240221-084325-arnaudb.json
  • 08:43 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts sretest2005.codfw.wmnet
  • 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:41 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T355609)', diff saved to https://phabricator.wikimedia.org/P57461 and previous config saved to /var/cache/conftool/dbconfig/20240221-083731-marostegui.json
  • 08:36 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts sretest2005.codfw.wmnet
  • 08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57460 and previous config saved to /var/cache/conftool/dbconfig/20240221-083016-arnaudb.json
  • 08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57459 and previous config saved to /var/cache/conftool/dbconfig/20240221-083006-arnaudb.json
  • 08:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2188 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57458 and previous config saved to /var/cache/conftool/dbconfig/20240221-082955-arnaudb.json
  • 08:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57457 and previous config saved to /var/cache/conftool/dbconfig/20240221-082935-arnaudb.json
  • 08:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2180.codfw.wmnet
  • 08:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2190.codfw.wmnet
  • 08:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T357189)', diff saved to https://phabricator.wikimedia.org/P57456 and previous config saved to /var/cache/conftool/dbconfig/20240221-082818-arnaudb.json
  • 08:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2189.codfw.wmnet
  • 08:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2188.codfw.wmnet
  • 08:23 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2189.codfw.wmnet
  • 08:23 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2190.codfw.wmnet
  • 08:23 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2188.codfw.wmnet
  • 08:23 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2180.codfw.wmnet
  • 08:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db2180 db2188 db2189 db2190 depool for T356240', diff saved to https://phabricator.wikimedia.org/P57455 and previous config saved to /var/cache/conftool/dbconfig/20240221-082219-arnaudb.json
  • 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[2180,2188-2190].codfw.wmnet with reason: Silence for reboot T356240
  • 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db[2180,2188-2190].codfw.wmnet with reason: Silence for reboot T356240
  • 08:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T357189)', diff saved to https://phabricator.wikimedia.org/P57454 and previous config saved to /var/cache/conftool/dbconfig/20240221-082029-arnaudb.json
  • 08:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 08:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T355609)', diff saved to https://phabricator.wikimedia.org/P57452 and previous config saved to /var/cache/conftool/dbconfig/20240221-080836-marostegui.json
  • 08:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T355609)', diff saved to https://phabricator.wikimedia.org/P57451 and previous config saved to /var/cache/conftool/dbconfig/20240221-080814-marostegui.json
  • 07:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P57450 and previous config saved to /var/cache/conftool/dbconfig/20240221-075307-marostegui.json
  • 07:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57449 and previous config saved to /var/cache/conftool/dbconfig/20240221-074452-root.json
  • 07:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P57448 and previous config saved to /var/cache/conftool/dbconfig/20240221-073801-marostegui.json
  • 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1026 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57447 and previous config saved to /var/cache/conftool/dbconfig/20240221-072948-root.json
  • 07:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T355609)', diff saved to https://phabricator.wikimedia.org/P57446 and previous config saved to /var/cache/conftool/dbconfig/20240221-072255-marostegui.json
  • 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57445 and previous config saved to /var/cache/conftool/dbconfig/20240221-071443-root.json
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1026 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57444 and previous config saved to /var/cache/conftool/dbconfig/20240221-065938-root.json
  • 06:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T355609)', diff saved to https://phabricator.wikimedia.org/P57443 and previous config saved to /var/cache/conftool/dbconfig/20240221-065508-marostegui.json
  • 06:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 06:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T355609)', diff saved to https://phabricator.wikimedia.org/P57442 and previous config saved to /var/cache/conftool/dbconfig/20240221-065447-marostegui.json
  • 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57441 and previous config saved to /var/cache/conftool/dbconfig/20240221-064433-root.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P57440 and previous config saved to /var/cache/conftool/dbconfig/20240221-063940-marostegui.json
  • 06:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1026 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57439 and previous config saved to /var/cache/conftool/dbconfig/20240221-062928-root.json
  • 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P57438 and previous config saved to /var/cache/conftool/dbconfig/20240221-062434-marostegui.json
  • 06:13 marostegui@cumin1002: dbctl commit (dc=all): 'es1026 (re)pooling @ 1%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57437 and previous config saved to /var/cache/conftool/dbconfig/20240221-061325-root.json
  • 06:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1026.eqiad.wmnet with OS bookworm
  • 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T355609)', diff saved to https://phabricator.wikimedia.org/P57436 and previous config saved to /var/cache/conftool/dbconfig/20240221-060928-marostegui.json
  • 05:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1026.eqiad.wmnet with reason: host reimage
  • 05:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1026.eqiad.wmnet with reason: host reimage
  • 05:45 kart_: Updated MinT to 2024-02-20-062448-production (T333969, T354666)
  • 05:42 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2103 (T355609)', diff saved to https://phabricator.wikimedia.org/P57435 and previous config saved to /var/cache/conftool/dbconfig/20240221-054136-marostegui.json
  • 05:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 05:39 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1026.eqiad.wmnet with OS bookworm
  • 05:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1026 T358080', diff saved to https://phabricator.wikimedia.org/P57434 and previous config saved to /var/cache/conftool/dbconfig/20240221-053822-root.json
  • 05:33 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 05:21 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
  • 05:21 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 05:21 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 05:14 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:13 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:13 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:09 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 05:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2220.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2219.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2217.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2218.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2216.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:42 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2220.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:41 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:41 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2220 to codfw - jhancock@cumin2002"
  • 04:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2220 to codfw - jhancock@cumin2002"
  • 04:39 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 04:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2215.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2219.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:35 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2219 to codfw - jhancock@cumin2002"
  • 04:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2219 to codfw - jhancock@cumin2002"
  • 04:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 04:31 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 04:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2218.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:30 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2218 to codfw - jhancock@cumin2002"
  • 04:30 rzl@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 04:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2218 to codfw - jhancock@cumin2002"
  • 04:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 04:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2217.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2217 to codfw - jhancock@cumin2002"
  • 04:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2217 to codfw - jhancock@cumin2002"
  • 04:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2214.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 04:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2216.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2213.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:20 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2216 to codfw - jhancock@cumin2002"
  • 04:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2216 to codfw - jhancock@cumin2002"
  • 04:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2212.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 04:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2215.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:15 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:15 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2215 to codfw - jhancock@cumin2002"
  • 04:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2215 to codfw - jhancock@cumin2002"
  • 04:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2211.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 04:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2214.mgmt.codfw.wmnet with reboot policy FORCED
  • 04:09 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2214 to codfw - jhancock@cumin2002"
  • 04:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2214 to codfw - jhancock@cumin2002"
  • 04:06 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 04:00 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2213 to codfw - jhancock@cumin2002"
  • 03:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2213 to codfw - jhancock@cumin2002"
  • 03:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2212.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:57 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 03:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2212 to codfw - jhancock@cumin2002"
  • 03:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2212 to codfw - jhancock@cumin2002"
  • 03:55 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 03:54 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 03:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 03:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2209.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2211.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2210.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:42 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2211 to codfw - jhancock@cumin2002"
  • 03:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2211 to codfw - jhancock@cumin2002"
  • 03:39 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 03:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2210.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2210 to codfw - jhancock@cumin2002"
  • 03:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2210 to codfw - jhancock@cumin2002"
  • 03:33 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 03:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2209.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:30 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2209 to codfw - jhancock@cumin2002"
  • 03:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2209 to codfw - jhancock@cumin2002"
  • 03:29 rzl@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 03:28 rzl@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 03:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 03:26 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 03:26 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 03:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2208.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2206.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2207.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2208.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2207.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2208.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2207.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2208.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2207.mgmt.codfw.wmnet with reboot policy FORCED
  • 03:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2206.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2206 to codfw - jhancock@cumin2002"
  • 02:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2206 to codfw - jhancock@cumin2002"
  • 02:56 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 02:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2204.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2205.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2204.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2205.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:23 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 02:22 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 02:20 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 02:20 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 02:11 rzl@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 02:10 rzl@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 00:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 00:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 00:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance

2024-02-20

  • 23:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 23:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 23:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 23:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T355609)', diff saved to https://phabricator.wikimedia.org/P57433 and previous config saved to /var/cache/conftool/dbconfig/20240220-233832-marostegui.json
  • 23:25 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:24 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:24 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:24 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P57432 and previous config saved to /var/cache/conftool/dbconfig/20240220-232326-marostegui.json
  • 23:23 logmsgbot: @deploy2002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:23 logmsgbot: @deploy2002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P57431 and previous config saved to /var/cache/conftool/dbconfig/20240220-230817-marostegui.json
  • 22:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T355609)', diff saved to https://phabricator.wikimedia.org/P57430 and previous config saved to /var/cache/conftool/dbconfig/20240220-225311-marostegui.json
  • 22:52 sfaci: Deployed refinery using scap, then deployed onto hdfs
  • 22:39 sfaci@deploy2002: Finished deploy [analytics/refinery@d078656] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d0786561] (duration: 03m 29s)
  • 22:36 sfaci@deploy2002: Started deploy [analytics/refinery@d078656] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d0786561]
  • 22:36 sfaci@deploy2002: Finished deploy [analytics/refinery@d078656] (thin): Regular analytics weekly train THIN [analytics/refinery@d0786561] (duration: 00m 05s)
  • 22:35 sfaci@deploy2002: Started deploy [analytics/refinery@d078656] (thin): Regular analytics weekly train THIN [analytics/refinery@d0786561]
  • 22:35 sfaci@deploy2002: Finished deploy [analytics/refinery@d078656]: Regular analytics weekly train [analytics/refinery@d0786561] (duration: 00m 21s)
  • 22:35 sfaci@deploy2002: Started deploy [analytics/refinery@d078656]: Regular analytics weekly train [analytics/refinery@d0786561]
  • 22:34 sfaci@deploy2002: Finished deploy [analytics/refinery@d078656]: Regular analytics weekly train [analytics/refinery@d0786561] (duration: 13m 19s)
  • 22:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T355609)', diff saved to https://phabricator.wikimedia.org/P57429 and previous config saved to /var/cache/conftool/dbconfig/20240220-222445-marostegui.json
  • 22:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 22:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 22:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T355609)', diff saved to https://phabricator.wikimedia.org/P57428 and previous config saved to /var/cache/conftool/dbconfig/20240220-222423-marostegui.json
  • 22:20 sfaci@deploy2002: Started deploy [analytics/refinery@d078656]: Regular analytics weekly train [analytics/refinery@d0786561]
  • 22:18 sfaci: Starting refinery deployment
  • 22:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P57427 and previous config saved to /var/cache/conftool/dbconfig/20240220-220917-marostegui.json
  • 22:00 cjming: end of UTC late backport window
  • 21:58 cjming@deploy2002: Finished scap: Backport for Fix for regression in audio track suppression logic (T357942), Fix for regression in audio track suppression logic (T357942) (duration: 09m 24s)
  • 21:56 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 21:56 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 21:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P57426 and previous config saved to /var/cache/conftool/dbconfig/20240220-215410-marostegui.json
  • 21:51 cjming@deploy2002: brion and cjming: Continuing with sync
  • 21:50 cjming@deploy2002: brion and cjming: Backport for Fix for regression in audio track suppression logic (T357942), Fix for regression in audio track suppression logic (T357942) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:49 cjming@deploy2002: Started scap: Backport for Fix for regression in audio track suppression logic (T357942), Fix for regression in audio track suppression logic (T357942)
  • 21:48 cjming@deploy2002: Finished scap: Backport for Enable night mode on mobile test servers (T357759) (duration: 11m 01s)
  • 21:48 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 21:48 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 21:47 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:47 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:47 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 21:47 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 21:42 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 21:42 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 21:40 cjming@deploy2002: cjming and jdlrobson: Continuing with sync
  • 21:39 cjming@deploy2002: cjming and jdlrobson: Backport for Enable night mode on mobile test servers (T357759) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T355609)', diff saved to https://phabricator.wikimedia.org/P57424 and previous config saved to /var/cache/conftool/dbconfig/20240220-213904-marostegui.json
  • 21:37 cjming@deploy2002: Started scap: Backport for Enable night mode on mobile test servers (T357759)
  • 21:35 cjming@deploy2002: Finished scap: Backport for Enable desktop diff for anonymous users on enwiki (T350181) (duration: 13m 19s)
  • 21:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 21:28 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:28 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:27 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
  • 21:24 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:24 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:23 cjming@deploy2002: jdlrobson and cjming: Backport for Enable desktop diff for anonymous users on enwiki (T350181) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:22 cjming@deploy2002: Started scap: Backport for Enable desktop diff for anonymous users on enwiki (T350181)
  • 21:20 cjming@deploy2002: Finished scap: Backport for Correctly turn on Parsoid read views by default on wikitech Talk pages (duration: 12m 53s)
  • 21:11 cjming@deploy2002: cscott and cjming: Continuing with sync
  • 21:08 cjming@deploy2002: cscott and cjming: Backport for Correctly turn on Parsoid read views by default on wikitech Talk pages synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T355609)', diff saved to https://phabricator.wikimedia.org/P57423 and previous config saved to /var/cache/conftool/dbconfig/20240220-210840-marostegui.json
  • 21:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 21:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 21:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T355609)', diff saved to https://phabricator.wikimedia.org/P57422 and previous config saved to /var/cache/conftool/dbconfig/20240220-210819-marostegui.json
  • 21:07 cjming@deploy2002: Started scap: Backport for Correctly turn on Parsoid read views by default on wikitech Talk pages
  • 21:04 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:04 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:56 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 20:56 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 20:56 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 20:55 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 20:55 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:55 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P57421 and previous config saved to /var/cache/conftool/dbconfig/20240220-205312-marostegui.json
  • 20:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P57420 and previous config saved to /var/cache/conftool/dbconfig/20240220-203806-marostegui.json
  • 20:35 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:35 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:32 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudelastic[1001-1004].wikimedia.org
  • 20:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic[1001-1004].wikimedia.org decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
  • 20:31 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:31 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:30 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic[1001-1004].wikimedia.org decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
  • 20:27 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 20:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T355609)', diff saved to https://phabricator.wikimedia.org/P57419 and previous config saved to /var/cache/conftool/dbconfig/20240220-202300-marostegui.json
  • 20:01 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudelastic[1001-1004].wikimedia.org
  • 19:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T355609)', diff saved to https://phabricator.wikimedia.org/P57417 and previous config saved to /var/cache/conftool/dbconfig/20240220-195303-marostegui.json
  • 19:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 19:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 19:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T355609)', diff saved to https://phabricator.wikimedia.org/P57416 and previous config saved to /var/cache/conftool/dbconfig/20240220-195242-marostegui.json
  • 19:48 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T347624, testing 961878 patch) xfer categories from wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 19:48 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T347624, testing 961878 patch) xfer categories from wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 19:43 ryankemper@cumin2002: START - Cookbook sre.wdqs.restart
  • 19:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T357189)', diff saved to https://phabricator.wikimedia.org/P57415 and previous config saved to /var/cache/conftool/dbconfig/20240220-193842-arnaudb.json
  • 19:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P57414 and previous config saved to /var/cache/conftool/dbconfig/20240220-193735-marostegui.json
  • 19:36 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:35 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 19:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P57413 and previous config saved to /var/cache/conftool/dbconfig/20240220-192335-arnaudb.json
  • 19:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P57412 and previous config saved to /var/cache/conftool/dbconfig/20240220-192229-marostegui.json
  • 19:12 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.19 refs T354437
  • 19:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P57411 and previous config saved to /var/cache/conftool/dbconfig/20240220-190829-arnaudb.json
  • 19:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T355609)', diff saved to https://phabricator.wikimedia.org/P57410 and previous config saved to /var/cache/conftool/dbconfig/20240220-190722-marostegui.json
  • 18:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T357189)', diff saved to https://phabricator.wikimedia.org/P57409 and previous config saved to /var/cache/conftool/dbconfig/20240220-185322-arnaudb.json
  • 18:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T357189)', diff saved to https://phabricator.wikimedia.org/P57408 and previous config saved to /var/cache/conftool/dbconfig/20240220-184925-arnaudb.json
  • 18:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 18:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 18:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T357189)', diff saved to https://phabricator.wikimedia.org/P57407 and previous config saved to /var/cache/conftool/dbconfig/20240220-184903-arnaudb.json
  • 18:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1228 (T355609)', diff saved to https://phabricator.wikimedia.org/P57406 and previous config saved to /var/cache/conftool/dbconfig/20240220-184157-marostegui.json
  • 18:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1228.eqiad.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1228.eqiad.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T355609)', diff saved to https://phabricator.wikimedia.org/P57405 and previous config saved to /var/cache/conftool/dbconfig/20240220-184124-marostegui.json
  • 18:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P57404 and previous config saved to /var/cache/conftool/dbconfig/20240220-183356-arnaudb.json
  • 18:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet,service=(cdn|ats-be)
  • 18:31 sukhe: pool cp4052: bookworm cp host with haproxy 2.6 built against OpenSSL 1.1.1: T352744
  • 18:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P57403 and previous config saved to /var/cache/conftool/dbconfig/20240220-182617-marostegui.json
  • 18:22 sukhe: reprepro -C component/haproxy26 include bookworm-wikimedia haproxy_2.6.16-1~bpo12+1_amd64.changes: T352744
  • 18:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P57402 and previous config saved to /var/cache/conftool/dbconfig/20240220-181850-arnaudb.json
  • 18:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P57401 and previous config saved to /var/cache/conftool/dbconfig/20240220-181111-marostegui.json
  • 18:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T357189)', diff saved to https://phabricator.wikimedia.org/P57400 and previous config saved to /var/cache/conftool/dbconfig/20240220-180342-arnaudb.json
  • 17:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T357189)', diff saved to https://phabricator.wikimedia.org/P57399 and previous config saved to /var/cache/conftool/dbconfig/20240220-175938-arnaudb.json
  • 17:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 17:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 17:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T357189)', diff saved to https://phabricator.wikimedia.org/P57398 and previous config saved to /var/cache/conftool/dbconfig/20240220-175917-arnaudb.json
  • 17:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T355609)', diff saved to https://phabricator.wikimedia.org/P57397 and previous config saved to /var/cache/conftool/dbconfig/20240220-175605-marostegui.json
  • 17:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P57396 and previous config saved to /var/cache/conftool/dbconfig/20240220-174411-arnaudb.json
  • 17:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P57395 and previous config saved to /var/cache/conftool/dbconfig/20240220-172904-arnaudb.json
  • 17:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T355609)', diff saved to https://phabricator.wikimedia.org/P57394 and previous config saved to /var/cache/conftool/dbconfig/20240220-172716-marostegui.json
  • 17:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 17:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 17:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T355609)', diff saved to https://phabricator.wikimedia.org/P57393 and previous config saved to /var/cache/conftool/dbconfig/20240220-172653-marostegui.json
  • 17:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS bookworm
  • 17:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T357189)', diff saved to https://phabricator.wikimedia.org/P57392 and previous config saved to /var/cache/conftool/dbconfig/20240220-171358-arnaudb.json
  • 17:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P57391 and previous config saved to /var/cache/conftool/dbconfig/20240220-171147-marostegui.json
  • 17:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T357189)', diff saved to https://phabricator.wikimedia.org/P57390 and previous config saved to /var/cache/conftool/dbconfig/20240220-170949-arnaudb.json
  • 17:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 17:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 17:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T357189)', diff saved to https://phabricator.wikimedia.org/P57389 and previous config saved to /var/cache/conftool/dbconfig/20240220-170928-arnaudb.json
  • 16:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 16:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P57388 and previous config saved to /var/cache/conftool/dbconfig/20240220-165641-marostegui.json
  • 16:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 16:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P57387 and previous config saved to /var/cache/conftool/dbconfig/20240220-165421-arnaudb.json
  • 16:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2005.codfw.wmnet with OS bookworm
  • 16:43 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,name=cp20(29|30).codfw.wmnet
  • 16:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp[2029-2030].codfw.wmnet
  • 16:42 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp[2029-2030].codfw.wmnet
  • 16:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T355609)', diff saved to https://phabricator.wikimedia.org/P57386 and previous config saved to /var/cache/conftool/dbconfig/20240220-164134-marostegui.json
  • 16:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P57385 and previous config saved to /var/cache/conftool/dbconfig/20240220-163915-arnaudb.json
  • 16:35 reedy@deploy2002: Synchronized php-1.42.0-wmf.19/extensions/AntiSpoof/: T357995 (duration: 11m 02s)
  • 16:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57384 and previous config saved to /var/cache/conftool/dbconfig/20240220-163451-arnaudb.json
  • 16:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57383 and previous config saved to /var/cache/conftool/dbconfig/20240220-163447-arnaudb.json
  • 16:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57382 and previous config saved to /var/cache/conftool/dbconfig/20240220-163447-arnaudb.json
  • 16:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db1226 (re)pooling @ 100%: maintenance done', diff saved to https://phabricator.wikimedia.org/P57381 and previous config saved to /var/cache/conftool/dbconfig/20240220-163442-arnaudb.json
  • 16:30 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1001.eqiad.wmnet
  • 16:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:27 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:24 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1001.eqiad.wmnet
  • 16:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T357189)', diff saved to https://phabricator.wikimedia.org/P57380 and previous config saved to /var/cache/conftool/dbconfig/20240220-162408-arnaudb.json
  • 16:21 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1002.eqiad.wmnet
  • 16:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T357189)', diff saved to https://phabricator.wikimedia.org/P57379 and previous config saved to /var/cache/conftool/dbconfig/20240220-161953-arnaudb.json
  • 16:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 16:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57378 and previous config saved to /var/cache/conftool/dbconfig/20240220-161946-arnaudb.json
  • 16:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57377 and previous config saved to /var/cache/conftool/dbconfig/20240220-161942-arnaudb.json
  • 16:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57376 and previous config saved to /var/cache/conftool/dbconfig/20240220-161942-arnaudb.json
  • 16:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db1226 (re)pooling @ 75%: maintenance done', diff saved to https://phabricator.wikimedia.org/P57375 and previous config saved to /var/cache/conftool/dbconfig/20240220-161937-arnaudb.json
  • 16:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 16:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T357189)', diff saved to https://phabricator.wikimedia.org/P57374 and previous config saved to /var/cache/conftool/dbconfig/20240220-161931-arnaudb.json
  • 16:18 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 16:14 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1002.eqiad.wmnet
  • 16:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T355609)', diff saved to https://phabricator.wikimedia.org/P57373 and previous config saved to /var/cache/conftool/dbconfig/20240220-161348-marostegui.json
  • 16:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 16:13 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 16:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T355609)', diff saved to https://phabricator.wikimedia.org/P57372 and previous config saved to /var/cache/conftool/dbconfig/20240220-161326-marostegui.json
  • 16:12 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1003.eqiad.wmnet
  • 16:11 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
  • 16:11 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
  • 16:09 hnowlan@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=(mw2312.codfw.wmnet|mw2313.codfw.wmnet|mw2367.codfw.wmnet|mw2369.codfw.wmnet)
  • 16:07 topranks: Commencing network maintenance migrating servers to new switch codfw rack A7 T355867
  • 16:06 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 22 hosts with reason: Migrating servers in codfw rack A7 to lsw1-a7-codfw
  • 16:06 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 22 hosts with reason: Migrating servers in codfw rack A7 to lsw1-a7-codfw
  • 16:05 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.eqiad.wmnet
  • 16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57371 and previous config saved to /var/cache/conftool/dbconfig/20240220-160438-arnaudb.json
  • 16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57370 and previous config saved to /var/cache/conftool/dbconfig/20240220-160437-arnaudb.json
  • 16:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1226 (re)pooling @ 50%: maintenance done', diff saved to https://phabricator.wikimedia.org/P57369 and previous config saved to /var/cache/conftool/dbconfig/20240220-160432-arnaudb.json
  • 16:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57368 and previous config saved to /var/cache/conftool/dbconfig/20240220-160429-arnaudb.json
  • 16:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P57367 and previous config saved to /var/cache/conftool/dbconfig/20240220-160423-arnaudb.json
  • 16:02 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-a-codfw,cr[1-2]-codfw,lsw1-a7-codfw.mgmt with reason: prepping for server uplink migration codfw rack a7
  • 16:02 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on asw-a-codfw,cr[1-2]-codfw,lsw1-a7-codfw.mgmt with reason: prepping for server uplink migration codfw rack a7
  • 16:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2005.codfw.wmnet with reason: host reimage
  • 16:00 hnowlan: running `homer 'cr*codfw*' commit 'T351074'` for new k8s workers
  • 16:00 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2089*,elastic2062*,elastic2061* for switch maintenance - bking@cumin2002 - T355860
  • 16:00 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2089*,elastic2062*,elastic2061* for switch maintenance - bking@cumin2002 - T355860
  • 15:59 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2005.codfw.wmnet with reason: host reimage
  • 15:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P57366 and previous config saved to /var/cache/conftool/dbconfig/20240220-155820-marostegui.json
  • 15:55 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@b115452]: (no justification provided) (duration: 00m 34s)
  • 15:55 Emperor: import ceph-reef packages to apt1001 T279621
  • 15:55 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@b115452]: (no justification provided)
  • 15:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:53 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:53 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:50 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:50 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:49 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57365 and previous config saved to /var/cache/conftool/dbconfig/20240220-154924-arnaudb.json
  • 15:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57364 and previous config saved to /var/cache/conftool/dbconfig/20240220-154920-arnaudb.json
  • 15:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 20%: Maintenance done', diff saved to https://phabricator.wikimedia.org/P57363 and previous config saved to /var/cache/conftool/dbconfig/20240220-154920-arnaudb.json
  • 15:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P57362 and previous config saved to /var/cache/conftool/dbconfig/20240220-154917-arnaudb.json
  • 15:46 denisse: When doing the alert hosts upgrade we encountered some issues that prevented us to properly reimage the hosts to proceed with the upgrade. We're investigating this issue and inform of the new alert hosts upgrade date ASAP. - T333615
  • 15:46 denisse: When doing the alert hosts upgrade we encountered some issues that prevented us to properly reimage the hosts to proceed with the upgrade. We're investigating this issue and inform of the new alert hosts upgrade date ASAP. - T333615
  • 15:46 godog: re-enable meta-monitoring on wikitech-static.w.o - T333615
  • 15:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P57361 and previous config saved to /var/cache/conftool/dbconfig/20240220-154313-marostegui.json
  • 15:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1233.eqiad.wmnet
  • 15:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1168.eqiad.wmnet
  • 15:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1226.eqiad.wmnet
  • 15:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1210.eqiad.wmnet
  • 15:37 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db1233.eqiad.wmnet
  • 15:37 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db1226.eqiad.wmnet
  • 15:37 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db1210.eqiad.wmnet
  • 15:36 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db1168.eqiad.wmnet
  • 15:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db1168 db1210 db1226 db1233 depool for T356240', diff saved to https://phabricator.wikimedia.org/P57359 and previous config saved to /var/cache/conftool/dbconfig/20240220-153557-arnaudb.json
  • 15:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T357189)', diff saved to https://phabricator.wikimedia.org/P57358 and previous config saved to /var/cache/conftool/dbconfig/20240220-153410-arnaudb.json
  • 15:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db[1168,1210,1226,1233].eqiad.wmnet with reason: Silence for reboot T356240
  • 15:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on db[1168,1210,1226,1233].eqiad.wmnet with reason: Silence for reboot T356240
  • 15:32 godog: temp disable meta-monitoring on wikitech-static.w.o - T333615
  • 15:30 Emperor: import ceph-reef packages to apt1001 T279621
  • 15:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T357189)', diff saved to https://phabricator.wikimedia.org/P57357 and previous config saved to /var/cache/conftool/dbconfig/20240220-153000-arnaudb.json
  • 15:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 15:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 15:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T357189)', diff saved to https://phabricator.wikimedia.org/P57356 and previous config saved to /var/cache/conftool/dbconfig/20240220-152933-arnaudb.json
  • 15:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T355609)', diff saved to https://phabricator.wikimedia.org/P57355 and previous config saved to /var/cache/conftool/dbconfig/20240220-152807-marostegui.json
  • 15:25 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 15:18 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 100%: After migration', diff saved to https://phabricator.wikimedia.org/P57354 and previous config saved to /var/cache/conftool/dbconfig/20240220-151812-root.json
  • 15:16 dcausse: depooled wdqs2009 & wdqs2020 (T355867)
  • 15:16 denisse_: starting the Alert hosts upgrade to Bookworm - T333615
  • 15:16 denisse_: starting the Alert hosts upgrade to Bookworm - T333615
  • 15:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P57353 and previous config saved to /var/cache/conftool/dbconfig/20240220-151426-arnaudb.json
  • 15:13 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2005.codfw.wmnet with OS bookworm
  • 15:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db[2146,2151].codfw.wmnet
  • 14:55 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:55 bking@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57346 and previous config saved to /var/cache/conftool/dbconfig/20240220-145124-root.json
  • 14:50 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:49 sukhe: disable puppet on A:cp to merge CR 1004126
  • 14:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp[2029-2030].codfw.wmnet with reason: T355867
  • 14:49 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp[2029-2030].codfw.wmnet with reason: T355867
  • 14:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1231.eqiad.wmnet
  • 14:48 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:48 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:48 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 50%: After migration', diff saved to https://phabricator.wikimedia.org/P57345 and previous config saved to /var/cache/conftool/dbconfig/20240220-144803-root.json
  • 14:48 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:48 brett@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,name=cp20(29|30).codfw.wmnet
  • 14:48 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57344 and previous config saved to /var/cache/conftool/dbconfig/20240220-144753-root.json
  • 14:46 sukhe: updating pdns-recursor to 4.8.6-1 on dns*
  • 14:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P57343 and previous config saved to /var/cache/conftool/dbconfig/20240220-144539-marostegui.json
  • 14:44 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db1231.eqiad.wmnet
  • 14:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T357189)', diff saved to https://phabricator.wikimedia.org/P57342 and previous config saved to /var/cache/conftool/dbconfig/20240220-144414-arnaudb.json
  • 14:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2123 (T357189)', diff saved to https://phabricator.wikimedia.org/P57341 and previous config saved to /var/cache/conftool/dbconfig/20240220-144001-arnaudb.json
  • 14:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 14:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T357189)', diff saved to https://phabricator.wikimedia.org/P57340 and previous config saved to /var/cache/conftool/dbconfig/20240220-143939-arnaudb.json
  • 14:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57339 and previous config saved to /var/cache/conftool/dbconfig/20240220-143619-root.json
  • 14:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 25%: After migration', diff saved to https://phabricator.wikimedia.org/P57338 and previous config saved to /var/cache/conftool/dbconfig/20240220-143258-root.json
  • 14:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57337 and previous config saved to /var/cache/conftool/dbconfig/20240220-143249-root.json
  • 14:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P57336 and previous config saved to /var/cache/conftool/dbconfig/20240220-143032-marostegui.json
  • 14:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P57334 and previous config saved to /var/cache/conftool/dbconfig/20240220-142433-arnaudb.json
  • 14:21 claime: launching build-production-images - T342346
  • 14:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57333 and previous config saved to /var/cache/conftool/dbconfig/20240220-142114-root.json
  • 14:20 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 14:19 ayounsi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2005.codfw.wmnet with OS bookworm
  • 14:18 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 10%: After migration', diff saved to https://phabricator.wikimedia.org/P57332 and previous config saved to /var/cache/conftool/dbconfig/20240220-141752-root.json
  • 14:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57331 and previous config saved to /var/cache/conftool/dbconfig/20240220-141744-root.json
  • 14:15 claime: Uncordoning mw2379
  • 14:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T355609)', diff saved to https://phabricator.wikimedia.org/P57330 and previous config saved to /var/cache/conftool/dbconfig/20240220-141525-marostegui.json
  • 14:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P57329 and previous config saved to /var/cache/conftool/dbconfig/20240220-140926-arnaudb.json
  • 14:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57328 and previous config saved to /var/cache/conftool/dbconfig/20240220-140609-root.json
  • 14:05 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 14:02 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 5%: After migration', diff saved to https://phabricator.wikimedia.org/P57327 and previous config saved to /var/cache/conftool/dbconfig/20240220-140247-root.json
  • 14:02 marostegui@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57326 and previous config saved to /var/cache/conftool/dbconfig/20240220-140239-root.json
  • 13:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2005.codfw.wmnet with reason: sretest
  • 13:55 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2005.codfw.wmnet with reason: sretest
  • 13:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T357189)', diff saved to https://phabricator.wikimedia.org/P57325 and previous config saved to /var/cache/conftool/dbconfig/20240220-135420-arnaudb.json
  • 13:54 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s5
  • 13:54 marostegui@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8
  • 13:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2194 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57324 and previous config saved to /var/cache/conftool/dbconfig/20240220-135104-root.json
  • 13:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2111 (T357189)', diff saved to https://phabricator.wikimedia.org/P57323 and previous config saved to /var/cache/conftool/dbconfig/20240220-134958-arnaudb.json
  • 13:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 13:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 13:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 1%: After migration', diff saved to https://phabricator.wikimedia.org/P57322 and previous config saved to /var/cache/conftool/dbconfig/20240220-134742-root.json
  • 13:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2190 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57321 and previous config saved to /var/cache/conftool/dbconfig/20240220-134734-root.json
  • 13:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 13:47 jynus: setting up mariadb instances at db2097
  • 13:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 13:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 13:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 13:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T355609)', diff saved to https://phabricator.wikimedia.org/P57320 and previous config saved to /var/cache/conftool/dbconfig/20240220-134403-marostegui.json
  • 13:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 13:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T357189)', diff saved to https://phabricator.wikimedia.org/P57319 and previous config saved to /var/cache/conftool/dbconfig/20240220-134354-arnaudb.json
  • 13:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T355609)', diff saved to https://phabricator.wikimedia.org/P57318 and previous config saved to /var/cache/conftool/dbconfig/20240220-134334-marostegui.json
  • 13:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P57317 and previous config saved to /var/cache/conftool/dbconfig/20240220-132848-arnaudb.json
  • 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P57316 and previous config saved to /var/cache/conftool/dbconfig/20240220-132827-marostegui.json
  • 13:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P57315 and previous config saved to /var/cache/conftool/dbconfig/20240220-131341-arnaudb.json
  • 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P57314 and previous config saved to /var/cache/conftool/dbconfig/20240220-131320-marostegui.json
  • 13:08 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2190.codfw.wmnet onto db2194.codfw.wmnet
  • 12:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T357189)', diff saved to https://phabricator.wikimedia.org/P57313 and previous config saved to /var/cache/conftool/dbconfig/20240220-125835-arnaudb.json
  • 12:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T355609)', diff saved to https://phabricator.wikimedia.org/P57312 and previous config saved to /var/cache/conftool/dbconfig/20240220-125814-marostegui.json
  • 12:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T357189)', diff saved to https://phabricator.wikimedia.org/P57311 and previous config saved to /var/cache/conftool/dbconfig/20240220-125516-arnaudb.json
  • 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 12:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 12:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 12:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T357189)', diff saved to https://phabricator.wikimedia.org/P57310 and previous config saved to /var/cache/conftool/dbconfig/20240220-125311-arnaudb.json
  • 12:48 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8
  • 12:48 marostegui@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s5
  • 12:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P57309 and previous config saved to /var/cache/conftool/dbconfig/20240220-123804-arnaudb.json
  • 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T355609)', diff saved to https://phabricator.wikimedia.org/P57308 and previous config saved to /var/cache/conftool/dbconfig/20240220-122947-marostegui.json
  • 12:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 12:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T355609)', diff saved to https://phabricator.wikimedia.org/P57307 and previous config saved to /var/cache/conftool/dbconfig/20240220-122907-marostegui.json
  • 12:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P57306 and previous config saved to /var/cache/conftool/dbconfig/20240220-122258-arnaudb.json
  • 12:18 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2384.codfw.wmnet with OS bullseye
  • 12:18 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2385.codfw.wmnet with OS bullseye
  • 12:16 claime: Draining mw2379
  • 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P57305 and previous config saved to /var/cache/conftool/dbconfig/20240220-121402-marostegui.json
  • 12:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T357189)', diff saved to https://phabricator.wikimedia.org/P57304 and previous config saved to /var/cache/conftool/dbconfig/20240220-120752-arnaudb.json
  • 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T357189)', diff saved to https://phabricator.wikimedia.org/P57303 and previous config saved to /var/cache/conftool/dbconfig/20240220-120434-arnaudb.json
  • 12:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 12:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 12:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T357189)', diff saved to https://phabricator.wikimedia.org/P57302 and previous config saved to /var/cache/conftool/dbconfig/20240220-120412-arnaudb.json
  • 12:04 kart_: cxserver: Update to 2024-02-15-085232-production + Bump mesh.configuration to 1.7 (T333969, T352747, T355686, T255568)
  • 12:03 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw2385.codfw.wmnet with OS bullseye
  • 12:03 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2385.codfw.wmnet with OS bullseye
  • 12:02 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw2384.codfw.wmnet with OS bullseye
  • 12:02 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2384.codfw.wmnet with OS bullseye
  • 12:01 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2369.codfw.wmnet with OS bullseye
  • 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57301 and previous config saved to /var/cache/conftool/dbconfig/20240220-120031-root.json
  • 12:00 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:59 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P57300 and previous config saved to /var/cache/conftool/dbconfig/20240220-115855-marostegui.json
  • 11:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2367.codfw.wmnet with OS bullseye
  • 11:55 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2313.codfw.wmnet with OS bullseye
  • 11:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:54 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:51 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2312.codfw.wmnet with OS bullseye
  • 11:51 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:50 sukhe: updating pdns-recursor to 4.8.6-1 on doh* hosts
  • 11:50 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P57299 and previous config saved to /var/cache/conftool/dbconfig/20240220-114906-arnaudb.json
  • 11:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57298 and previous config saved to /var/cache/conftool/dbconfig/20240220-114526-root.json
  • 11:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T355609)', diff saved to https://phabricator.wikimedia.org/P57297 and previous config saved to /var/cache/conftool/dbconfig/20240220-114349-marostegui.json
  • 11:42 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2369.codfw.wmnet with reason: host reimage
  • 11:39 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2367.codfw.wmnet with reason: host reimage
  • 11:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2313.codfw.wmnet with reason: host reimage
  • 11:35 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2367.codfw.wmnet with reason: host reimage
  • 11:35 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2369.codfw.wmnet with reason: host reimage
  • 11:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P57296 and previous config saved to /var/cache/conftool/dbconfig/20240220-113401-arnaudb.json
  • 11:33 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2312.codfw.wmnet with reason: host reimage
  • 11:33 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2190.codfw.wmnet onto db2194.codfw.wmnet
  • 11:30 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2312.codfw.wmnet with reason: host reimage
  • 11:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57295 and previous config saved to /var/cache/conftool/dbconfig/20240220-113021-root.json
  • 11:30 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw2385.codfw.wmnet with OS bullseye
  • 11:30 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw2384.codfw.wmnet with OS bullseye
  • 11:29 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2385.codfw.wmnet with OS bullseye
  • 11:29 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2384.codfw.wmnet with OS bullseye
  • 11:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2194.codfw.wmnet with OS bookworm
  • 11:19 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw2385.codfw.wmnet with OS bullseye
  • 11:19 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw2384.codfw.wmnet with OS bullseye
  • 11:19 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw2369.codfw.wmnet with OS bullseye
  • 11:19 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw2367.codfw.wmnet with OS bullseye
  • 11:19 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw2313.codfw.wmnet with OS bullseye
  • 11:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T357189)', diff saved to https://phabricator.wikimedia.org/P57294 and previous config saved to /var/cache/conftool/dbconfig/20240220-111854-arnaudb.json
  • 11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T355609)', diff saved to https://phabricator.wikimedia.org/P57293 and previous config saved to /var/cache/conftool/dbconfig/20240220-111722-marostegui.json
  • 11:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 11:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T355609)', diff saved to https://phabricator.wikimedia.org/P57292 and previous config saved to /var/cache/conftool/dbconfig/20240220-111700-marostegui.json
  • 11:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T357189)', diff saved to https://phabricator.wikimedia.org/P57291 and previous config saved to /var/cache/conftool/dbconfig/20240220-111531-arnaudb.json
  • 11:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57290 and previous config saved to /var/cache/conftool/dbconfig/20240220-111525-root.json
  • 11:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 11:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57289 and previous config saved to /var/cache/conftool/dbconfig/20240220-111516-root.json
  • 11:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 11:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T357189)', diff saved to https://phabricator.wikimedia.org/P57288 and previous config saved to /var/cache/conftool/dbconfig/20240220-111510-arnaudb.json
  • 11:14 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw2312.codfw.wmnet with OS bullseye
  • 11:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 11:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Place db2194 in s3 depooled T354826', diff saved to https://phabricator.wikimedia.org/P57287 and previous config saved to /var/cache/conftool/dbconfig/20240220-110444-marostegui.json
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P57286 and previous config saved to /var/cache/conftool/dbconfig/20240220-110154-marostegui.json
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2190', diff saved to https://phabricator.wikimedia.org/P57285 and previous config saved to /var/cache/conftool/dbconfig/20240220-110020-root.json
  • 11:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57284 and previous config saved to /var/cache/conftool/dbconfig/20240220-110011-root.json
  • 11:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57283 and previous config saved to /var/cache/conftool/dbconfig/20240220-110008-root.json
  • 11:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P57282 and previous config saved to /var/cache/conftool/dbconfig/20240220-110004-arnaudb.json
  • 11:00 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2194 multi instance', diff saved to https://phabricator.wikimedia.org/P57281 and previous config saved to /var/cache/conftool/dbconfig/20240220-105959-marostegui.json
  • 10:56 slyngs: Import CAS 6.6.12+wmf11u2 in apt-repo
  • 10:50 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cloudvirt1032.eqiad.wmnet with reason: nova-compute registration
  • 10:50 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cloudvirt1032.eqiad.wmnet with reason: nova-compute registration
  • 10:48 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2194.codfw.wmnet with OS bookworm
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P57280 and previous config saved to /var/cache/conftool/dbconfig/20240220-104647-marostegui.json
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2194', diff saved to https://phabricator.wikimedia.org/P57279 and previous config saved to /var/cache/conftool/dbconfig/20240220-104633-root.json
  • 10:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57278 and previous config saved to /var/cache/conftool/dbconfig/20240220-104231-root.json
  • 10:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P57277 and previous config saved to /var/cache/conftool/dbconfig/20240220-104209-arnaudb.json
  • 10:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2169.codfw.wmnet with OS bookworm
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57276 and previous config saved to /var/cache/conftool/dbconfig/20240220-103842-root.json
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on cumin1001.eqiad.wmnet with reason: being taken down
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on cumin1001.eqiad.wmnet with reason: being taken down
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T355609)', diff saved to https://phabricator.wikimedia.org/P57275 and previous config saved to /var/cache/conftool/dbconfig/20240220-103141-marostegui.json
  • 10:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T357189)', diff saved to https://phabricator.wikimedia.org/P57274 and previous config saved to /var/cache/conftool/dbconfig/20240220-102703-arnaudb.json
  • 10:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T357189)', diff saved to https://phabricator.wikimedia.org/P57273 and previous config saved to /var/cache/conftool/dbconfig/20240220-102344-arnaudb.json
  • 10:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57272 and previous config saved to /var/cache/conftool/dbconfig/20240220-102337-root.json
  • 10:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 10:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T357189)', diff saved to https://phabricator.wikimedia.org/P57271 and previous config saved to /var/cache/conftool/dbconfig/20240220-102322-arnaudb.json
  • 10:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2169.codfw.wmnet with reason: host reimage
  • 10:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2169.codfw.wmnet with reason: host reimage
  • 10:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2138 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57270 and previous config saved to /var/cache/conftool/dbconfig/20240220-101206-root.json
  • 10:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 10:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57269 and previous config saved to /var/cache/conftool/dbconfig/20240220-100832-root.json
  • 10:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P57268 and previous config saved to /var/cache/conftool/dbconfig/20240220-100816-arnaudb.json
  • 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'Add db2169 to s6 depooled', diff saved to https://phabricator.wikimedia.org/P57267 and previous config saved to /var/cache/conftool/dbconfig/20240220-100623-marostegui.json
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T355609)', diff saved to https://phabricator.wikimedia.org/P57266 and previous config saved to /var/cache/conftool/dbconfig/20240220-100511-marostegui.json
  • 10:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 10:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T355609)', diff saved to https://phabricator.wikimedia.org/P57265 and previous config saved to /var/cache/conftool/dbconfig/20240220-100449-marostegui.json
  • 10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2169 multiinstance', diff saved to https://phabricator.wikimedia.org/P57264 and previous config saved to /var/cache/conftool/dbconfig/20240220-100444-marostegui.json
  • 10:00 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 09:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2138 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57263 and previous config saved to /var/cache/conftool/dbconfig/20240220-095701-root.json
  • 09:56 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2169.codfw.wmnet with OS bookworm
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2169', diff saved to https://phabricator.wikimedia.org/P57262 and previous config saved to /var/cache/conftool/dbconfig/20240220-095353-root.json
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57261 and previous config saved to /var/cache/conftool/dbconfig/20240220-095327-root.json
  • 09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P57260 and previous config saved to /var/cache/conftool/dbconfig/20240220-095310-arnaudb.json
  • 09:49 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 09:46 filippo@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:46 filippo@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P57259 and previous config saved to /var/cache/conftool/dbconfig/20240220-094334-marostegui.json
  • 09:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2138 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57258 and previous config saved to /var/cache/conftool/dbconfig/20240220-094156-root.json
  • 09:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T357189)', diff saved to https://phabricator.wikimedia.org/P57257 and previous config saved to /var/cache/conftool/dbconfig/20240220-093803-arnaudb.json
  • 09:36 moritzm: installing imagemagick security updates
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57256 and previous config saved to /var/cache/conftool/dbconfig/20240220-093607-root.json
  • 09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T357189)', diff saved to https://phabricator.wikimedia.org/P57255 and previous config saved to /var/cache/conftool/dbconfig/20240220-093442-arnaudb.json
  • 09:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 09:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 09:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T357189)', diff saved to https://phabricator.wikimedia.org/P57254 and previous config saved to /var/cache/conftool/dbconfig/20240220-093420-arnaudb.json
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P57253 and previous config saved to /var/cache/conftool/dbconfig/20240220-092827-marostegui.json
  • 09:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2138 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57252 and previous config saved to /var/cache/conftool/dbconfig/20240220-092651-root.json
  • 09:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2167.codfw.wmnet with OS bookworm
  • 09:23 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:22 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:21 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57251 and previous config saved to /var/cache/conftool/dbconfig/20240220-092102-root.json
  • 09:21 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P57250 and previous config saved to /var/cache/conftool/dbconfig/20240220-091914-arnaudb.json
  • 09:16 akosiaris@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:16 dcausse@deploy2002: Finished deploy [airflow-dags/search@088b013]: search: wdqs updater set proper start date (duration: 00m 26s)
  • 09:16 dcausse@deploy2002: Started deploy [airflow-dags/search@088b013]: search: wdqs updater set proper start date
  • 09:15 akosiaris@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T355609)', diff saved to https://phabricator.wikimedia.org/P57249 and previous config saved to /var/cache/conftool/dbconfig/20240220-091321-marostegui.json
  • 09:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2138 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57248 and previous config saved to /var/cache/conftool/dbconfig/20240220-091146-root.json
  • 09:09 akosiaris@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:08 akosiaris@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57247 and previous config saved to /var/cache/conftool/dbconfig/20240220-090557-root.json
  • 09:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2167.codfw.wmnet with reason: host reimage
  • 09:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P57246 and previous config saved to /var/cache/conftool/dbconfig/20240220-090408-arnaudb.json
  • 09:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2167.codfw.wmnet with reason: host reimage
  • 09:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2138.codfw.wmnet with OS bookworm
  • 08:57 dcausse@deploy2002: Finished deploy [airflow-dags/search@a6356d2]: search: wdqs-updater reconcile, do not create the dag dynamically (duration: 00m 28s)
  • 08:56 dcausse@deploy2002: Started deploy [airflow-dags/search@a6356d2]: search: wdqs-updater reconcile, do not create the dag dynamically
  • 08:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2138 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57245 and previous config saved to /var/cache/conftool/dbconfig/20240220-085641-root.json
  • 08:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57244 and previous config saved to /var/cache/conftool/dbconfig/20240220-085222-root.json
  • 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57243 and previous config saved to /var/cache/conftool/dbconfig/20240220-085052-root.json
  • 08:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T357189)', diff saved to https://phabricator.wikimedia.org/P57242 and previous config saved to /var/cache/conftool/dbconfig/20240220-084901-arnaudb.json
  • 08:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1163 (T355609)', diff saved to https://phabricator.wikimedia.org/P57241 and previous config saved to /var/cache/conftool/dbconfig/20240220-084637-marostegui.json
  • 08:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 08:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T357189)', diff saved to https://phabricator.wikimedia.org/P57240 and previous config saved to /var/cache/conftool/dbconfig/20240220-084530-arnaudb.json
  • 08:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 08:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2167.codfw.wmnet with OS bookworm
  • 08:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2167', diff saved to https://phabricator.wikimedia.org/P57239 and previous config saved to /var/cache/conftool/dbconfig/20240220-084136-root.json
  • 08:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2138.codfw.wmnet with reason: host reimage
  • 08:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2138.codfw.wmnet with reason: host reimage
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57238 and previous config saved to /var/cache/conftool/dbconfig/20240220-083718-root.json
  • 08:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57237 and previous config saved to /var/cache/conftool/dbconfig/20240220-083547-root.json
  • 08:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 5%: After migration', diff saved to https://phabricator.wikimedia.org/P57236 and previous config saved to /var/cache/conftool/dbconfig/20240220-083132-root.json
  • 08:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57235 and previous config saved to /var/cache/conftool/dbconfig/20240220-082515-root.json
  • 08:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2168.codfw.wmnet with OS bookworm
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57234 and previous config saved to /var/cache/conftool/dbconfig/20240220-082213-root.json
  • 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57233 and previous config saved to /var/cache/conftool/dbconfig/20240220-082043-root.json
  • 08:19 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2138.codfw.wmnet with OS bookworm
  • 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2138', diff saved to https://phabricator.wikimedia.org/P57232 and previous config saved to /var/cache/conftool/dbconfig/20240220-081740-root.json
  • 08:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 1%: After migration', diff saved to https://phabricator.wikimedia.org/P57231 and previous config saved to /var/cache/conftool/dbconfig/20240220-081627-root.json
  • 08:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2170.codfw.wmnet with OS bookworm
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57230 and previous config saved to /var/cache/conftool/dbconfig/20240220-081353-root.json
  • 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57229 and previous config saved to /var/cache/conftool/dbconfig/20240220-081010-root.json
  • 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57228 and previous config saved to /var/cache/conftool/dbconfig/20240220-080708-root.json
  • 08:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2168.codfw.wmnet with reason: host reimage
  • 08:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2168.codfw.wmnet with reason: host reimage
  • 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57227 and previous config saved to /var/cache/conftool/dbconfig/20240220-075848-root.json
  • 07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2170.codfw.wmnet with reason: host reimage
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57226 and previous config saved to /var/cache/conftool/dbconfig/20240220-075505-root.json
  • 07:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2170.codfw.wmnet with reason: host reimage
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57225 and previous config saved to /var/cache/conftool/dbconfig/20240220-075203-root.json
  • 07:51 marostegui@cumin1002: dbctl commit (dc=all): 'es2021 (re)pooling @ 100%: After migration', diff saved to https://phabricator.wikimedia.org/P57224 and previous config saved to /var/cache/conftool/dbconfig/20240220-075128-root.json
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57223 and previous config saved to /var/cache/conftool/dbconfig/20240220-074343-root.json
  • 07:40 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2168.codfw.wmnet with OS bookworm
  • 07:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57222 and previous config saved to /var/cache/conftool/dbconfig/20240220-074000-root.json
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2168', diff saved to https://phabricator.wikimedia.org/P57221 and previous config saved to /var/cache/conftool/dbconfig/20240220-073912-root.json
  • 07:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
  • 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2171 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57220 and previous config saved to /var/cache/conftool/dbconfig/20240220-073658-root.json
  • 07:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2021 (re)pooling @ 75%: After migration', diff saved to https://phabricator.wikimedia.org/P57219 and previous config saved to /var/cache/conftool/dbconfig/20240220-073623-root.json
  • 07:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
  • 07:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57218 and previous config saved to /var/cache/conftool/dbconfig/20240220-073313-root.json
  • 07:32 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2170.codfw.wmnet with OS bookworm
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2170', diff saved to https://phabricator.wikimedia.org/P57217 and previous config saved to /var/cache/conftool/dbconfig/20240220-073139-root.json
  • 07:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57216 and previous config saved to /var/cache/conftool/dbconfig/20240220-072838-root.json
  • 07:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2171.codfw.wmnet with OS bookworm
  • 07:27 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56286
  • 07:27 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 56286
  • 07:27 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 60501
  • 07:26 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 60501
  • 07:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18779
  • 07:26 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 18779
  • 07:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26554
  • 07:25 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 26554
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57215 and previous config saved to /var/cache/conftool/dbconfig/20240220-072455-root.json
  • 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'es2021 (re)pooling @ 50%: After migration', diff saved to https://phabricator.wikimedia.org/P57214 and previous config saved to /var/cache/conftool/dbconfig/20240220-072118-root.json
  • 07:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57213 and previous config saved to /var/cache/conftool/dbconfig/20240220-071808-root.json
  • 07:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57212 and previous config saved to /var/cache/conftool/dbconfig/20240220-071333-root.json
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57211 and previous config saved to /var/cache/conftool/dbconfig/20240220-070948-root.json
  • 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'es2021 (re)pooling @ 25%: After migration', diff saved to https://phabricator.wikimedia.org/P57210 and previous config saved to /var/cache/conftool/dbconfig/20240220-070613-root.json
  • 07:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1246.eqiad.wmnet with OS bookworm
  • 07:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57209 and previous config saved to /var/cache/conftool/dbconfig/20240220-070303-root.json
  • 07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 06:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1244.eqiad.wmnet with OS bookworm
  • 06:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57208 and previous config saved to /var/cache/conftool/dbconfig/20240220-065828-root.json
  • 06:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2171.codfw.wmnet with reason: host reimage
  • 06:51 marostegui@cumin1002: dbctl commit (dc=all): 'es2021 (re)pooling @ 10%: After migration', diff saved to https://phabricator.wikimedia.org/P57207 and previous config saved to /var/cache/conftool/dbconfig/20240220-065108-root.json
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57206 and previous config saved to /var/cache/conftool/dbconfig/20240220-064758-root.json
  • 06:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
  • 06:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Place db2171 in s5 depooled T354826', diff saved to https://phabricator.wikimedia.org/P57205 and previous config saved to /var/cache/conftool/dbconfig/20240220-064152-marostegui.json
  • 06:40 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2171 multi-instance', diff saved to https://phabricator.wikimedia.org/P57204 and previous config saved to /var/cache/conftool/dbconfig/20240220-064014-marostegui.json
  • 06:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1244.eqiad.wmnet with reason: host reimage
  • 06:39 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2171.codfw.wmnet with OS bookworm
  • 06:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1244.eqiad.wmnet with reason: host reimage
  • 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2021 (re)pooling @ 5%: After migration', diff saved to https://phabricator.wikimedia.org/P57203 and previous config saved to /var/cache/conftool/dbconfig/20240220-063603-root.json
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2171 T354826', diff saved to https://phabricator.wikimedia.org/P57202 and previous config saved to /var/cache/conftool/dbconfig/20240220-063521-marostegui.json
  • 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57201 and previous config saved to /var/cache/conftool/dbconfig/20240220-063254-root.json
  • 06:29 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1246.eqiad.wmnet with OS bookworm
  • 06:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1246', diff saved to https://phabricator.wikimedia.org/P57200 and previous config saved to /var/cache/conftool/dbconfig/20240220-062759-root.json
  • 06:24 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1244.eqiad.wmnet with OS bookworm
  • 06:22 marostegui@deploy2002: Finished scap: Backport for Revert "db-production.php: Disable writes on es4" (duration: 09m 32s)
  • 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'es2021 (re)pooling @ 1%: After migration', diff saved to https://phabricator.wikimedia.org/P57199 and previous config saved to /var/cache/conftool/dbconfig/20240220-062058-root.json
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1244', diff saved to https://phabricator.wikimedia.org/P57198 and previous config saved to /var/cache/conftool/dbconfig/20240220-061932-root.json
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57197 and previous config saved to /var/cache/conftool/dbconfig/20240220-061749-root.json
  • 06:14 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:14 marostegui@deploy2002: marostegui: Backport for Revert "db-production.php: Disable writes on es4" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:13 marostegui@deploy2002: Started scap: Backport for Revert "db-production.php: Disable writes on es4"
  • 06:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS bookworm
  • 06:10 marostegui@cumin1002: dbctl commit (dc=all): 'Add weight to es2020', diff saved to https://phabricator.wikimedia.org/P57196 and previous config saved to /var/cache/conftool/dbconfig/20240220-061049-root.json
  • 06:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2021 T356372', diff saved to https://phabricator.wikimedia.org/P57195 and previous config saved to /var/cache/conftool/dbconfig/20240220-061025-marostegui.json
  • 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2020 to es4 primary T356372', diff saved to https://phabricator.wikimedia.org/P57194 and previous config saved to /var/cache/conftool/dbconfig/20240220-060852-marostegui.json
  • 06:08 marostegui: Starting es4 codfw failover from es2021 to es2020 - T356372
  • 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Set es2020 with weight 0 T356372', diff saved to https://phabricator.wikimedia.org/P57193 and previous config saved to /var/cache/conftool/dbconfig/20240220-060404-marostegui.json
  • 06:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es4 T356372
  • 06:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es4 T356372
  • 06:01 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2137.codfw.wmnet with OS bookworm
  • 06:00 marostegui@deploy2002: Finished scap: Backport for db-production.php: Disable writes on es4 (T356372) (duration: 09m 36s)
  • 05:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage
  • 05:55 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2137.codfw.wmnet with OS bookworm
  • 05:54 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2137.codfw.wmnet with OS bookworm
  • 05:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1170.eqiad.wmnet with reason: host reimage
  • 05:52 marostegui@deploy2002: marostegui: Continuing with sync
  • 05:52 marostegui@deploy2002: marostegui: Backport for db-production.php: Disable writes on es4 (T356372) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 05:50 marostegui@deploy2002: Started scap: Backport for db-production.php: Disable writes on es4 (T356372)
  • 05:45 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2137.codfw.wmnet with OS bookworm
  • 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2137 for reimage', diff saved to https://phabricator.wikimedia.org/P57192 and previous config saved to /var/cache/conftool/dbconfig/20240220-054156-marostegui.json
  • 05:41 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS bookworm
  • 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1170 for reimage', diff saved to https://phabricator.wikimedia.org/P57191 and previous config saved to /var/cache/conftool/dbconfig/20240220-053920-marostegui.json
  • 04:56 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.19 refs T354437 (duration: 52m 09s)
  • 04:04 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.19 refs T354437
  • 04:02 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.16 (duration: 01m 57s)
  • 02:15 tstarling@deploy2002: Synchronized wmf-config/CommonSettings.php: Set $wgLoginNotifyUseCheckUser = false T346989 (duration: 08m 13s)

2024-02-19

  • 23:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 23:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 23:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T357189)', diff saved to https://phabricator.wikimedia.org/P57190 and previous config saved to /var/cache/conftool/dbconfig/20240219-234251-arnaudb.json
  • 23:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P57189 and previous config saved to /var/cache/conftool/dbconfig/20240219-232745-arnaudb.json
  • 23:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P57188 and previous config saved to /var/cache/conftool/dbconfig/20240219-231238-arnaudb.json
  • 22:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T357189)', diff saved to https://phabricator.wikimedia.org/P57187 and previous config saved to /var/cache/conftool/dbconfig/20240219-225732-arnaudb.json
  • 22:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T357189)', diff saved to https://phabricator.wikimedia.org/P57186 and previous config saved to /var/cache/conftool/dbconfig/20240219-224117-arnaudb.json
  • 22:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 22:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 22:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T357189)', diff saved to https://phabricator.wikimedia.org/P57185 and previous config saved to /var/cache/conftool/dbconfig/20240219-224054-arnaudb.json
  • 22:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P57184 and previous config saved to /var/cache/conftool/dbconfig/20240219-222547-arnaudb.json
  • 22:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T352010)', diff saved to https://phabricator.wikimedia.org/P57183 and previous config saved to /var/cache/conftool/dbconfig/20240219-221239-ladsgroup.json
  • 22:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P57182 and previous config saved to /var/cache/conftool/dbconfig/20240219-221041-arnaudb.json
  • 21:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P57181 and previous config saved to /var/cache/conftool/dbconfig/20240219-215733-ladsgroup.json
  • 21:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T357189)', diff saved to https://phabricator.wikimedia.org/P57180 and previous config saved to /var/cache/conftool/dbconfig/20240219-215534-arnaudb.json
  • 21:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T357189)', diff saved to https://phabricator.wikimedia.org/P57179 and previous config saved to /var/cache/conftool/dbconfig/20240219-215217-arnaudb.json
  • 21:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 21:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 21:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T357189)', diff saved to https://phabricator.wikimedia.org/P57178 and previous config saved to /var/cache/conftool/dbconfig/20240219-215155-arnaudb.json
  • 21:42 zabe@deploy2002: Finished scap: Backport for EditAttemptStep: log buckets for the edit check test (T342930), Enrollment for the edit check a/b test (T342930), Launch the Visual Editor edit check a/b test (T342930 T352127), Default VE on mobile for other wikis (T352127) (duration: 17m 25s)
  • 21:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P57177 and previous config saved to /var/cache/conftool/dbconfig/20240219-214227-ladsgroup.json
  • 21:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P57176 and previous config saved to /var/cache/conftool/dbconfig/20240219-213648-arnaudb.json
  • 21:35 zabe@deploy2002: kemayo and zabe: Continuing with sync
  • 21:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T352010)', diff saved to https://phabricator.wikimedia.org/P57175 and previous config saved to /var/cache/conftool/dbconfig/20240219-212720-ladsgroup.json
  • 21:26 zabe@deploy2002: kemayo and zabe: Backport for EditAttemptStep: log buckets for the edit check test (T342930), Enrollment for the edit check a/b test (T342930), Launch the Visual Editor edit check a/b test (T342930 T352127), Default VE on mobile for other wikis (T352127) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:25 zabe@deploy2002: Started scap: Backport for EditAttemptStep: log buckets for the edit check test (T342930), Enrollment for the edit check a/b test (T342930), Launch the Visual Editor edit check a/b test (T342930 T352127), Default VE on mobile for other wikis (T352127)
  • 21:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P57174 and previous config saved to /var/cache/conftool/dbconfig/20240219-212141-arnaudb.json
  • 21:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T357189)', diff saved to https://phabricator.wikimedia.org/P57173 and previous config saved to /var/cache/conftool/dbconfig/20240219-210635-arnaudb.json
  • 21:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2171:3316 (T357189)', diff saved to https://phabricator.wikimedia.org/P57172 and previous config saved to /var/cache/conftool/dbconfig/20240219-210228-arnaudb.json
  • 21:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 21:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 20:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 20:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 20:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T357189)', diff saved to https://phabricator.wikimedia.org/P57171 and previous config saved to /var/cache/conftool/dbconfig/20240219-205935-arnaudb.json
  • 20:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3316 (re)pooling @ 100%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57169 and previous config saved to /var/cache/conftool/dbconfig/20240219-205047-arnaudb.json
  • 20:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P57168 and previous config saved to /var/cache/conftool/dbconfig/20240219-204429-arnaudb.json
  • 20:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3316 (re)pooling @ 75%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57167 and previous config saved to /var/cache/conftool/dbconfig/20240219-203542-arnaudb.json
  • 20:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P57166 and previous config saved to /var/cache/conftool/dbconfig/20240219-202923-arnaudb.json
  • 20:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T352010)', diff saved to https://phabricator.wikimedia.org/P57165 and previous config saved to /var/cache/conftool/dbconfig/20240219-202648-ladsgroup.json
  • 20:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 20:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 20:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T352010)', diff saved to https://phabricator.wikimedia.org/P57164 and previous config saved to /var/cache/conftool/dbconfig/20240219-202615-ladsgroup.json
  • 20:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3316 (re)pooling @ 50%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57163 and previous config saved to /var/cache/conftool/dbconfig/20240219-202037-arnaudb.json
  • 20:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T357189)', diff saved to https://phabricator.wikimedia.org/P57162 and previous config saved to /var/cache/conftool/dbconfig/20240219-201416-arnaudb.json
  • 20:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P57161 and previous config saved to /var/cache/conftool/dbconfig/20240219-201353-root.json
  • 20:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P57160 and previous config saved to /var/cache/conftool/dbconfig/20240219-201109-ladsgroup.json
  • 20:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T357189)', diff saved to https://phabricator.wikimedia.org/P57159 and previous config saved to /var/cache/conftool/dbconfig/20240219-200914-arnaudb.json
  • 20:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 20:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 20:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T357189)', diff saved to https://phabricator.wikimedia.org/P57158 and previous config saved to /var/cache/conftool/dbconfig/20240219-200847-arnaudb.json
  • 20:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3316 (re)pooling @ 40%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57157 and previous config saved to /var/cache/conftool/dbconfig/20240219-200533-arnaudb.json
  • 20:05 zabe@deploy2002: Finished scap: Backport for Remove reviewer group from testwiki (T356012) (duration: 09m 16s)
  • 19:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P57156 and previous config saved to /var/cache/conftool/dbconfig/20240219-195848-root.json
  • 19:57 zabe@deploy2002: zabe: Continuing with sync
  • 19:57 zabe@deploy2002: zabe: Backport for Remove reviewer group from testwiki (T356012) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:56 zabe@deploy2002: Started scap: Backport for Remove reviewer group from testwiki (T356012)
  • 19:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P57155 and previous config saved to /var/cache/conftool/dbconfig/20240219-195603-ladsgroup.json
  • 19:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P57154 and previous config saved to /var/cache/conftool/dbconfig/20240219-195341-arnaudb.json
  • 19:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3316 (re)pooling @ 30%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57153 and previous config saved to /var/cache/conftool/dbconfig/20240219-195028-arnaudb.json
  • 19:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P57152 and previous config saved to /var/cache/conftool/dbconfig/20240219-194343-root.json
  • 19:42 zabe: zabe@mwmaint2002:/tmp/uploads$ mwscript emptyUserGroup.php --wiki=testwiki reviewer # T356012
  • 19:41 zabe: zabe@mwmaint2002:/tmp/uploads$ mwscript importImages.php --wiki=commonswiki --user="Yann" --overwrite . # T357218
  • 19:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T352010)', diff saved to https://phabricator.wikimedia.org/P57151 and previous config saved to /var/cache/conftool/dbconfig/20240219-194056-ladsgroup.json
  • 19:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P57150 and previous config saved to /var/cache/conftool/dbconfig/20240219-193834-arnaudb.json
  • 19:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3316 (re)pooling @ 20%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57149 and previous config saved to /var/cache/conftool/dbconfig/20240219-193522-arnaudb.json
  • 19:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P57148 and previous config saved to /var/cache/conftool/dbconfig/20240219-192838-root.json
  • 19:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2149.codfw.wmnet onto db2156.codfw.wmnet
  • 19:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T357189)', diff saved to https://phabricator.wikimedia.org/P57147 and previous config saved to /var/cache/conftool/dbconfig/20240219-192327-arnaudb.json
  • 19:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3316 (re)pooling @ 10%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57146 and previous config saved to /var/cache/conftool/dbconfig/20240219-192018-arnaudb.json
  • 19:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T357189)', diff saved to https://phabricator.wikimedia.org/P57145 and previous config saved to /var/cache/conftool/dbconfig/20240219-191923-arnaudb.json
  • 19:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 19:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 19:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T357189)', diff saved to https://phabricator.wikimedia.org/P57144 and previous config saved to /var/cache/conftool/dbconfig/20240219-191901-arnaudb.json
  • 19:14 zabe: zabe@mwmaint2002:/tmp/uploads$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user="Yann" . # T357297
  • 19:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3316 (re)pooling @ 8%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57143 and previous config saved to /var/cache/conftool/dbconfig/20240219-190513-arnaudb.json
  • 19:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P57142 and previous config saved to /var/cache/conftool/dbconfig/20240219-190354-arnaudb.json
  • 18:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3316 (re)pooling @ 4%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57141 and previous config saved to /var/cache/conftool/dbconfig/20240219-185008-arnaudb.json
  • 18:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P57140 and previous config saved to /var/cache/conftool/dbconfig/20240219-184848-arnaudb.json
  • 18:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3316 (re)pooling @ 2%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57139 and previous config saved to /var/cache/conftool/dbconfig/20240219-183503-arnaudb.json
  • 18:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T357189)', diff saved to https://phabricator.wikimedia.org/P57138 and previous config saved to /var/cache/conftool/dbconfig/20240219-183341-arnaudb.json
  • 18:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T357189)', diff saved to https://phabricator.wikimedia.org/P57137 and previous config saved to /var/cache/conftool/dbconfig/20240219-182929-arnaudb.json
  • 18:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 18:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 18:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T357189)', diff saved to https://phabricator.wikimedia.org/P57136 and previous config saved to /var/cache/conftool/dbconfig/20240219-182905-arnaudb.json
  • 18:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3316 (re)pooling @ 1%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57135 and previous config saved to /var/cache/conftool/dbconfig/20240219-181958-arnaudb.json
  • 18:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3317 (re)pooling @ 100%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57134 and previous config saved to /var/cache/conftool/dbconfig/20240219-181953-arnaudb.json
  • 18:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P57133 and previous config saved to /var/cache/conftool/dbconfig/20240219-181359-arnaudb.json
  • 18:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3317 (re)pooling @ 75%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57132 and previous config saved to /var/cache/conftool/dbconfig/20240219-180448-arnaudb.json
  • 17:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P57131 and previous config saved to /var/cache/conftool/dbconfig/20240219-175853-arnaudb.json
  • 17:56 ladsgroup@cumin1002: START - Cookbook sre.mysql.clone of db2149.codfw.wmnet onto db2156.codfw.wmnet
  • 17:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3317 (re)pooling @ 50%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57130 and previous config saved to /var/cache/conftool/dbconfig/20240219-174943-arnaudb.json
  • 17:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T357189)', diff saved to https://phabricator.wikimedia.org/P57129 and previous config saved to /var/cache/conftool/dbconfig/20240219-174347-arnaudb.json
  • 17:43 hnowlan: running `decommssion` for mw2312.codfw.wmnet,mw2313.codfw.wmnet,mw2367.codfw.wmnet,mw2369.codfw.wmnet,mw2384.codfw.wmnet,mw2385.codfw.wmnet before reimaging to k8s workers
  • 17:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2117 (T357189)', diff saved to https://phabricator.wikimedia.org/P57128 and previous config saved to /var/cache/conftool/dbconfig/20240219-173941-arnaudb.json
  • 17:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 17:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 17:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T357189)', diff saved to https://phabricator.wikimedia.org/P57127 and previous config saved to /var/cache/conftool/dbconfig/20240219-173919-arnaudb.json
  • 17:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: recloning db2156 (T352010)
  • 17:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: recloning db2156 (T352010)
  • 17:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3317 (re)pooling @ 40%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57126 and previous config saved to /var/cache/conftool/dbconfig/20240219-173438-arnaudb.json
  • 17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2149 for maint', diff saved to https://phabricator.wikimedia.org/P57125 and previous config saved to /var/cache/conftool/dbconfig/20240219-173411-ladsgroup.json
  • 17:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P57124 and previous config saved to /var/cache/conftool/dbconfig/20240219-172412-arnaudb.json
  • 17:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3317 (re)pooling @ 30%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57123 and previous config saved to /var/cache/conftool/dbconfig/20240219-171933-arnaudb.json
  • 17:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P57122 and previous config saved to /var/cache/conftool/dbconfig/20240219-170906-arnaudb.json
  • 17:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3317 (re)pooling @ 20%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57121 and previous config saved to /var/cache/conftool/dbconfig/20240219-170428-arnaudb.json
  • 16:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57120 and previous config saved to /var/cache/conftool/dbconfig/20240219-165503-root.json
  • 16:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T357189)', diff saved to https://phabricator.wikimedia.org/P57119 and previous config saved to /var/cache/conftool/dbconfig/20240219-165400-arnaudb.json
  • 16:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2114 (T357189)', diff saved to https://phabricator.wikimedia.org/P57118 and previous config saved to /var/cache/conftool/dbconfig/20240219-165032-arnaudb.json
  • 16:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 16:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 16:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3317 (re)pooling @ 10%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57117 and previous config saved to /var/cache/conftool/dbconfig/20240219-164924-arnaudb.json
  • 16:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 16:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 16:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T357189)', diff saved to https://phabricator.wikimedia.org/P57116 and previous config saved to /var/cache/conftool/dbconfig/20240219-164809-arnaudb.json
  • 16:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 75%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57115 and previous config saved to /var/cache/conftool/dbconfig/20240219-163958-root.json
  • 16:38 jgiannelos@deploy2002: Finished deploy [restbase/deploy@7e5e720]: Disable parsoid storage on restbase[1031:1033] (duration: 01m 55s)
  • 16:36 jgiannelos@deploy2002: Started deploy [restbase/deploy@7e5e720]: Disable parsoid storage on restbase[1031:1033]
  • 16:35 jgiannelos@deploy2002: Finished deploy [restbase/deploy@7e5e720]: Disable parsoid storage on restbase[2033:2035] (duration: 01m 19s)
  • 16:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3317 (re)pooling @ 8%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57114 and previous config saved to /var/cache/conftool/dbconfig/20240219-163419-arnaudb.json
  • 16:33 jgiannelos@deploy2002: Started deploy [restbase/deploy@7e5e720]: Disable parsoid storage on restbase[2033:2035]
  • 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P57113 and previous config saved to /var/cache/conftool/dbconfig/20240219-163303-arnaudb.json
  • 16:32 jgiannelos@deploy2002: deploy aborted: Disable parsoid storage on all nodes (duration: 01m 57s)
  • 16:30 jgiannelos@deploy2002: Started deploy [restbase/deploy@7e5e720]: Disable parsoid storage on all nodes
  • 16:30 jgiannelos@deploy2002: Finished deploy [restbase/deploy@7e5e720]: Disable parsoid storage on all nodes (duration: 00m 07s)
  • 16:30 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 16:30 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 16:30 jgiannelos@deploy2002: Started deploy [restbase/deploy@7e5e720]: Disable parsoid storage on all nodes
  • 16:29 jgiannelos@deploy2002: deploy aborted: Disable parsoid storage on all nodes (duration: 00m 08s)
  • 16:29 jgiannelos@deploy2002: Started deploy [restbase/deploy@7e5e720]: Disable parsoid storage on all nodes
  • 16:29 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 16:29 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 16:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57112 and previous config saved to /var/cache/conftool/dbconfig/20240219-162453-root.json
  • 16:21 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: (no justification provided) (duration: 00m 04s)
  • 16:21 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: (no justification provided)
  • 16:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3317 (re)pooling @ 4%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57111 and previous config saved to /var/cache/conftool/dbconfig/20240219-161914-arnaudb.json
  • 16:19 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: (no justification provided) (duration: 00m 07s)
  • 16:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: (no justification provided)
  • 16:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P57110 and previous config saved to /var/cache/conftool/dbconfig/20240219-161756-arnaudb.json
  • 16:17 jgiannelos@deploy2002: deploy aborted: Deploy latest restbase config in all nodes (duration: 00m 04s)
  • 16:16 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: Deploy latest restbase config in all nodes
  • 16:14 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: (no justification provided) (duration: 00m 08s)
  • 16:14 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: (no justification provided)
  • 16:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57109 and previous config saved to /var/cache/conftool/dbconfig/20240219-160948-root.json
  • 16:04 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: (no justification provided) (duration: 00m 23s)
  • 16:04 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: (no justification provided)
  • 16:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3317 (re)pooling @ 2%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57108 and previous config saved to /var/cache/conftool/dbconfig/20240219-160409-arnaudb.json
  • 16:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T357189)', diff saved to https://phabricator.wikimedia.org/P57107 and previous config saved to /var/cache/conftool/dbconfig/20240219-160249-arnaudb.json
  • 16:02 arnaudb@cumin1002: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 100%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57106 and previous config saved to /var/cache/conftool/dbconfig/20240219-160221-arnaudb.json
  • 15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T357189)', diff saved to https://phabricator.wikimedia.org/P57105 and previous config saved to /var/cache/conftool/dbconfig/20240219-155936-arnaudb.json
  • 15:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 15:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 15:59 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: Disable parsoid storage on restbase[2029:2032] (duration: 02m 56s)
  • 15:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T357189)', diff saved to https://phabricator.wikimedia.org/P57104 and previous config saved to /var/cache/conftool/dbconfig/20240219-155702-arnaudb.json
  • 15:56 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: Disable parsoid storage on restbase[2029:2032]
  • 15:55 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: Disable parsoid storage on restbase[1027:1030] (duration: 04m 11s)
  • 15:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57103 and previous config saved to /var/cache/conftool/dbconfig/20240219-155443-root.json
  • 15:51 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: Disable parsoid storage on restbase[1027:1030]
  • 15:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db2194:3317 (re)pooling @ 1%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57102 and previous config saved to /var/cache/conftool/dbconfig/20240219-154904-arnaudb.json
  • 15:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 75%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57101 and previous config saved to /var/cache/conftool/dbconfig/20240219-154716-arnaudb.json
  • 15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P57100 and previous config saved to /var/cache/conftool/dbconfig/20240219-154154-arnaudb.json
  • 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'T343674 - db2194 missing config', diff saved to https://phabricator.wikimedia.org/P57099 and previous config saved to /var/cache/conftool/dbconfig/20240219-154148-arnaudb.json
  • 15:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1213.eqiad.wmnet with OS bookworm
  • 15:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57098 and previous config saved to /var/cache/conftool/dbconfig/20240219-153938-root.json
  • 15:37 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: Disable parsoid storage on restbase[2025:2028] (duration: 01m 28s)
  • 15:36 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: Disable parsoid storage on restbase[2025:2028]
  • 15:35 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: Disable parsoid storage on restbase1026 (duration: 01m 55s)
  • 15:33 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: Disable parsoid storage on restbase1026
  • 15:33 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: Disable parsoid storage on restbase[1023:1025] (duration: 01m 57s)
  • 15:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 50%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57097 and previous config saved to /var/cache/conftool/dbconfig/20240219-153211-arnaudb.json
  • 15:31 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: Disable parsoid storage on restbase[1023:1025]
  • 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P57096 and previous config saved to /var/cache/conftool/dbconfig/20240219-152634-arnaudb.json
  • 15:24 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: Disable parsoid storage on restbase2024 (duration: 01m 24s)
  • 15:23 Lucas_WMDE: UTC afternoon backport+config window done
  • 15:22 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: Disable parsoid storage on restbase2024
  • 15:22 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for Increase move rate limit for extendedmovers in arwiki to 16/60 (T357229) (duration: 24m 34s)
  • 15:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1213.eqiad.wmnet with reason: host reimage
  • 15:22 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: (no justification provided) (duration: 01m 30s)
  • 15:20 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: (no justification provided)
  • 15:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1213.eqiad.wmnet with reason: host reimage
  • 15:19 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: (no justification provided) (duration: 01m 54s)
  • 15:17 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: (no justification provided)
  • 15:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 40%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57095 and previous config saved to /var/cache/conftool/dbconfig/20240219-151706-arnaudb.json
  • 15:15 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: (no justification provided) (duration: 01m 55s)
  • 15:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 15:14 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and gergesshamon: Continuing with sync
  • 15:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 15:13 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: (no justification provided)
  • 15:13 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: (no justification provided) (duration: 01m 28s)
  • 15:12 marostegui@cumin1002: dbctl commit (dc=all): 'es1020 (re)pooling @ 100%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57094 and previous config saved to /var/cache/conftool/dbconfig/20240219-151246-root.json
  • 15:11 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: (no justification provided)
  • 15:11 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: (no justification provided) (duration: 01m 55s)
  • 15:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T357189)', diff saved to https://phabricator.wikimedia.org/P57093 and previous config saved to /var/cache/conftool/dbconfig/20240219-151127-arnaudb.json
  • 15:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 15:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 15:09 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: (no justification provided)
  • 14:53 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: (no justification provided) (duration: 01m 24s)
  • 14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P57087 and previous config saved to /var/cache/conftool/dbconfig/20240219-145251-arnaudb.json
  • 14:51 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: (no justification provided)
  • 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T352010)', diff saved to https://phabricator.wikimedia.org/P57086 and previous config saved to /var/cache/conftool/dbconfig/20240219-145119-ladsgroup.json
  • 14:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T352010)', diff saved to https://phabricator.wikimedia.org/P57085 and previous config saved to /var/cache/conftool/dbconfig/20240219-145057-ladsgroup.json
  • 14:49 jgiannelos@deploy2002: Finished deploy [restbase/deploy@e5ed8d0]: (no justification provided) (duration: 01m 51s)
  • 14:48 jgiannelos@deploy2002: Started deploy [restbase/deploy@e5ed8d0]: (no justification provided)
  • 14:46 arnaudb@cumin1002: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 20%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57084 and previous config saved to /var/cache/conftool/dbconfig/20240219-144655-arnaudb.json
  • 14:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57083 and previous config saved to /var/cache/conftool/dbconfig/20240219-144422-root.json
  • 14:42 marostegui@cumin1002: dbctl commit (dc=all): 'es1020 (re)pooling @ 50%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57082 and previous config saved to /var/cache/conftool/dbconfig/20240219-144237-root.json
  • 14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P57081 and previous config saved to /var/cache/conftool/dbconfig/20240219-143744-arnaudb.json
  • 14:37 reedy@deploy2002: Finished scap: Fix casing of MediaWiki (duration: 09m 11s)
  • 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P57080 and previous config saved to /var/cache/conftool/dbconfig/20240219-143550-ladsgroup.json
  • 14:31 arnaudb@cumin1002: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 10%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57079 and previous config saved to /var/cache/conftool/dbconfig/20240219-143150-arnaudb.json
  • 14:31 arnaudb@cumin1002: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 100%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57078 and previous config saved to /var/cache/conftool/dbconfig/20240219-143145-arnaudb.json
  • 14:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57077 and previous config saved to /var/cache/conftool/dbconfig/20240219-142917-root.json
  • 14:28 reedy@deploy2002: Started scap: Fix casing of MediaWiki
  • 14:27 marostegui@cumin1002: dbctl commit (dc=all): 'es1020 (re)pooling @ 25%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57076 and previous config saved to /var/cache/conftool/dbconfig/20240219-142732-root.json
  • 14:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T357189)', diff saved to https://phabricator.wikimedia.org/P57075 and previous config saved to /var/cache/conftool/dbconfig/20240219-142238-arnaudb.json
  • 14:20 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P57074 and previous config saved to /var/cache/conftool/dbconfig/20240219-142044-ladsgroup.json
  • 14:19 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:19 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T357189)', diff saved to https://phabricator.wikimedia.org/P57073 and previous config saved to /var/cache/conftool/dbconfig/20240219-141919-arnaudb.json
  • 14:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 14:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 14:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T357189)', diff saved to https://phabricator.wikimedia.org/P57072 and previous config saved to /var/cache/conftool/dbconfig/20240219-141858-arnaudb.json
  • 14:18 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:18 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:18 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 75%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57071 and previous config saved to /var/cache/conftool/dbconfig/20240219-141640-arnaudb.json
  • 14:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57070 and previous config saved to /var/cache/conftool/dbconfig/20240219-141412-root.json
  • 14:12 marostegui@cumin1002: dbctl commit (dc=all): 'es1020 (re)pooling @ 10%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57069 and previous config saved to /var/cache/conftool/dbconfig/20240219-141227-root.json
  • 14:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T352010)', diff saved to https://phabricator.wikimedia.org/P57068 and previous config saved to /var/cache/conftool/dbconfig/20240219-140538-ladsgroup.json
  • 14:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P57067 and previous config saved to /var/cache/conftool/dbconfig/20240219-140351-arnaudb.json
  • 14:01 arnaudb@cumin1002: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 50%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57066 and previous config saved to /var/cache/conftool/dbconfig/20240219-140135-arnaudb.json
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57065 and previous config saved to /var/cache/conftool/dbconfig/20240219-135907-root.json
  • 13:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on dbproxy1027.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on dbproxy1027.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on dbproxy1026.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on dbproxy1026.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on dbproxy1024.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on dbproxy1024.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on dbproxy1023.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on dbproxy1023.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:57 marostegui@cumin1002: dbctl commit (dc=all): 'es1020 (re)pooling @ 5%: After migration to 10.6', diff saved to https://phabricator.wikimedia.org/P57064 and previous config saved to /var/cache/conftool/dbconfig/20240219-135722-root.json
  • 13:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P57063 and previous config saved to /var/cache/conftool/dbconfig/20240219-134845-arnaudb.json
  • 13:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1020', diff saved to https://phabricator.wikimedia.org/P57062 and previous config saved to /var/cache/conftool/dbconfig/20240219-134804-root.json
  • 13:46 arnaudb@cumin1002: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 40%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57061 and previous config saved to /var/cache/conftool/dbconfig/20240219-134630-arnaudb.json
  • 13:45 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1021 to es4 primary ', diff saved to https://phabricator.wikimedia.org/P57060 and previous config saved to /var/cache/conftool/dbconfig/20240219-134551-root.json
  • 13:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57059 and previous config saved to /var/cache/conftool/dbconfig/20240219-134402-root.json
  • 13:43 marostegui: Starting es4 eqiad failover from es1020 to es1021 - T357904
  • 13:42 marostegui@cumin1002: dbctl commit (dc=all): 'Change weight of es1021', diff saved to https://phabricator.wikimedia.org/P57058 and previous config saved to /var/cache/conftool/dbconfig/20240219-134205-root.json
  • 13:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: es4 switchover T357904
  • 13:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: es4 switchover T357904
  • 13:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on dbproxy1021.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on dbproxy1021.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on dbproxy1020.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:35 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on dbproxy1020.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T357189)', diff saved to https://phabricator.wikimedia.org/P57057 and previous config saved to /var/cache/conftool/dbconfig/20240219-133339-arnaudb.json
  • 13:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1021', diff saved to https://phabricator.wikimedia.org/P57056 and previous config saved to /var/cache/conftool/dbconfig/20240219-133245-root.json
  • 13:31 arnaudb@cumin1002: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 30%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57055 and previous config saved to /var/cache/conftool/dbconfig/20240219-133125-arnaudb.json
  • 13:30 moritzm: installing runc security updates on buster
  • 13:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T357189)', diff saved to https://phabricator.wikimedia.org/P57054 and previous config saved to /var/cache/conftool/dbconfig/20240219-133019-arnaudb.json
  • 13:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T357189)', diff saved to https://phabricator.wikimedia.org/P57053 and previous config saved to /var/cache/conftool/dbconfig/20240219-132958-arnaudb.json
  • 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2170 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57052 and previous config saved to /var/cache/conftool/dbconfig/20240219-132858-root.json
  • 13:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on dbproxy1025.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on dbproxy1025.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:17 marostegui@cumin1002: dbctl commit (dc=all): 'Add db2170 depooled', diff saved to https://phabricator.wikimedia.org/P57051 and previous config saved to /var/cache/conftool/dbconfig/20240219-131729-marostegui.json
  • 13:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on dbproxy1022.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on dbproxy1022.eqiad.wmnet with reason: Silence for reboot T356240
  • 13:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 20%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57050 and previous config saved to /var/cache/conftool/dbconfig/20240219-131620-arnaudb.json
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db21170 multi-instance', diff saved to https://phabricator.wikimedia.org/P57049 and previous config saved to /var/cache/conftool/dbconfig/20240219-131609-marostegui.json
  • 13:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P57048 and previous config saved to /var/cache/conftool/dbconfig/20240219-131452-arnaudb.json
  • 13:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2170 T354826', diff saved to https://phabricator.wikimedia.org/P57047 and previous config saved to /var/cache/conftool/dbconfig/20240219-131245-root.json
  • 13:01 arnaudb@cumin1002: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 10%: Cloning to db2194 done', diff saved to https://phabricator.wikimedia.org/P57046 and previous config saved to /var/cache/conftool/dbconfig/20240219-130116-arnaudb.json
  • 12:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P57045 and previous config saved to /var/cache/conftool/dbconfig/20240219-125945-arnaudb.json
  • 12:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2138 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57044 and previous config saved to /var/cache/conftool/dbconfig/20240219-125456-root.json
  • 12:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T357189)', diff saved to https://phabricator.wikimedia.org/P57043 and previous config saved to /var/cache/conftool/dbconfig/20240219-124439-arnaudb.json
  • 12:44 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:43 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:43 hnowlan: migrating refreshLinks to k8s jobrunners
  • 12:42 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:42 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T357189)', diff saved to https://phabricator.wikimedia.org/P57042 and previous config saved to /var/cache/conftool/dbconfig/20240219-124115-arnaudb.json
  • 12:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T357189)', diff saved to https://phabricator.wikimedia.org/P57041 and previous config saved to /var/cache/conftool/dbconfig/20240219-124054-arnaudb.json
  • 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2138 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57040 and previous config saved to /var/cache/conftool/dbconfig/20240219-123951-root.json
  • 12:37 aborrero@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1032
  • 12:37 aborrero@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1032
  • 12:36 hnowlan@deploy2002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:36 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:36 hnowlan@deploy2002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 12:36 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 12:35 aborrero@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1032
  • 12:35 aborrero@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1032
  • 12:35 hnowlan@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:35 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 12:35 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 12:35 hnowlan@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 12:32 samtar@deploy2002: Finished scap: Backport for IS/CS: Add wmgEditRecoveryDefaultUserOptions (T350653) (duration: 10m 21s)
  • 12:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P57039 and previous config saved to /var/cache/conftool/dbconfig/20240219-122547-arnaudb.json
  • 12:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2138 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57038 and previous config saved to /var/cache/conftool/dbconfig/20240219-122446-root.json
  • 12:24 samtar@deploy2002: samtar: Continuing with sync
  • 12:23 samtar@deploy2002: samtar: Backport for IS/CS: Add wmgEditRecoveryDefaultUserOptions (T350653) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:21 samtar@deploy2002: Started scap: Backport for IS/CS: Add wmgEditRecoveryDefaultUserOptions (T350653)
  • 12:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57037 and previous config saved to /var/cache/conftool/dbconfig/20240219-122142-root.json
  • 12:19 samtar@deploy2002: backport Cancelled
  • 12:18 samtar@deploy2002: backport Cancelled
  • 12:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P57035 and previous config saved to /var/cache/conftool/dbconfig/20240219-121040-arnaudb.json
  • 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2137 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57034 and previous config saved to /var/cache/conftool/dbconfig/20240219-120951-root.json
  • 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2138 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57033 and previous config saved to /var/cache/conftool/dbconfig/20240219-120941-root.json
  • 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57032 and previous config saved to /var/cache/conftool/dbconfig/20240219-120637-root.json
  • 12:03 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 11:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T357189)', diff saved to https://phabricator.wikimedia.org/P57031 and previous config saved to /var/cache/conftool/dbconfig/20240219-115534-arnaudb.json
  • 11:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2137 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57030 and previous config saved to /var/cache/conftool/dbconfig/20240219-115439-root.json
  • 11:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2138 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57029 and previous config saved to /var/cache/conftool/dbconfig/20240219-115436-root.json
  • 11:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57028 and previous config saved to /var/cache/conftool/dbconfig/20240219-115435-root.json
  • 11:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T357189)', diff saved to https://phabricator.wikimedia.org/P57027 and previous config saved to /var/cache/conftool/dbconfig/20240219-115210-arnaudb.json
  • 11:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T357189)', diff saved to https://phabricator.wikimedia.org/P57026 and previous config saved to /var/cache/conftool/dbconfig/20240219-115138-arnaudb.json
  • 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57025 and previous config saved to /var/cache/conftool/dbconfig/20240219-115132-root.json
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2137 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57024 and previous config saved to /var/cache/conftool/dbconfig/20240219-113934-root.json
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2138 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57023 and previous config saved to /var/cache/conftool/dbconfig/20240219-113931-root.json
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57022 and previous config saved to /var/cache/conftool/dbconfig/20240219-113931-root.json
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Place db2138 in s2 T354826', diff saved to https://phabricator.wikimedia.org/P57021 and previous config saved to /var/cache/conftool/dbconfig/20240219-113926-marostegui.json
  • 11:37 ariel@deploy2002: Finished deploy [dumps/dumps@0d1f9be]: improvements to page content history backfill script (duration: 00m 04s)
  • 11:37 ariel@deploy2002: Started deploy [dumps/dumps@0d1f9be]: improvements to page content history backfill script
  • 11:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P57020 and previous config saved to /var/cache/conftool/dbconfig/20240219-113632-arnaudb.json
  • 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57019 and previous config saved to /var/cache/conftool/dbconfig/20240219-113627-root.json
  • 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'place db2138 in s2', diff saved to https://phabricator.wikimedia.org/P57018 and previous config saved to /var/cache/conftool/dbconfig/20240219-113622-marostegui.json
  • 11:34 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 11:28 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 11:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2138 T354826', diff saved to https://phabricator.wikimedia.org/P57017 and previous config saved to /var/cache/conftool/dbconfig/20240219-112405-root.json
  • 11:23 taavi: update cr*-codfw firewall policy for puppetmaster2003 -> puppetserver2003 rename
  • 11:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2137 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57016 and previous config saved to /var/cache/conftool/dbconfig/20240219-112311-root.json
  • 11:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57015 and previous config saved to /var/cache/conftool/dbconfig/20240219-112256-root.json
  • 11:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57014 and previous config saved to /var/cache/conftool/dbconfig/20240219-112030-root.json
  • 11:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P57013 and previous config saved to /var/cache/conftool/dbconfig/20240219-111819-arnaudb.json
  • 11:11 aborrero@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 11:10 claime: sudo cumin -b 20 -p 95 '*' 'run-puppet-agent -q --failed-only'
  • 11:09 claime: Running puppet on failed nodes
  • 11:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2137 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57012 and previous config saved to /var/cache/conftool/dbconfig/20240219-110806-root.json
  • 11:08 claime: puppetserver roll-restart done
  • 11:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57011 and previous config saved to /var/cache/conftool/dbconfig/20240219-110751-root.json
  • 11:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57010 and previous config saved to /var/cache/conftool/dbconfig/20240219-110525-root.json
  • 11:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T357189)', diff saved to https://phabricator.wikimedia.org/P57009 and previous config saved to /var/cache/conftool/dbconfig/20240219-110312-arnaudb.json
  • 11:00 claime: sudo cumin -s 10 -b 1 A:puppetserver 'systemctl restart puppetserver.service'
  • 11:00 claime: roll-restarting puppetserver
  • 10:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T357189)', diff saved to https://phabricator.wikimedia.org/P57008 and previous config saved to /var/cache/conftool/dbconfig/20240219-105949-arnaudb.json
  • 10:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:56 claime: restarting puppetserver on puppetserver1001
  • 10:54 godog: bounce thanos-query on titan1* - T356788
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2137 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57007 and previous config saved to /var/cache/conftool/dbconfig/20240219-105302-root.json
  • 10:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57006 and previous config saved to /var/cache/conftool/dbconfig/20240219-105246-root.json
  • 10:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2137 into s4, depooled', diff saved to https://phabricator.wikimedia.org/P57005 and previous config saved to /var/cache/conftool/dbconfig/20240219-105211-marostegui.json
  • 10:48 godog: bounce thanos-query on titan2* - T356788
  • 10:45 marostegui@cumin1002: dbctl commit (dc=all): 'Place db2137 in s4 T354826', diff saved to https://phabricator.wikimedia.org/P57004 and previous config saved to /var/cache/conftool/dbconfig/20240219-104556-marostegui.json
  • 10:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2137 T354826', diff saved to https://phabricator.wikimedia.org/P57002 and previous config saved to /var/cache/conftool/dbconfig/20240219-103939-root.json
  • 10:37 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2166.codfw.wmnet onto db2167.codfw.wmnet
  • 10:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57001 and previous config saved to /var/cache/conftool/dbconfig/20240219-103741-root.json
  • 10:33 cgoubert@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=thanos-query,name=eqiad
  • 10:33 claime: repooling thanos-query eqiad - T356788
  • 10:26 claime: restarting thanos-query.service - titan1001 - T356788
  • 10:22 claime: restarting thanos-query.service - titan1002 - T356788
  • 10:22 cgoubert@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=thanos-query,name=eqiad
  • 10:22 claime: depooling thanos-query eqiad - T356788
  • 10:11 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cloudvirt1032.eqiad.wmnet with reason: reimage
  • 10:11 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cloudvirt1032.eqiad.wmnet with reason: reimage
  • 10:10 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::eqiad1::cloudweb
  • 10:10 claime: restarting thanos-query.service - titan1002 - T356788
  • 10:05 claime: restarting thanos-query.service - titan1001 - T356788
  • 10:04 claime: restarting thanos-query.service - titan1001
  • 10:02 taavi@cumin1002: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::eqiad1::cloudweb
  • 09:59 taavi@cumin1002: conftool action : set/pooled=yes; selector: name=cloudweb1004.wikimedia.org
  • 09:55 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1004.wikimedia.org with OS bullseye
  • 09:49 claime: Draining mw2442 - failed RAID - T357380
  • 09:27 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
  • 09:24 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
  • 09:12 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS bullseye
  • 09:10 moritzm: installing gnutls28 security updates on bookworm
  • 09:06 taavi@cumin1002: conftool action : set/pooled=inactive; selector: name=cloudweb1004.wikimedia.org
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P57000 and previous config saved to /var/cache/conftool/dbconfig/20240219-090600-root.json
  • 09:01 ladsgroup@deploy2002: Finished scap: Backport for Set fawiki to read new in pagelinks (T351237) (duration: 09m 43s)
  • 08:54 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 08:53 ladsgroup@deploy2002: ladsgroup: Backport for Set fawiki to read new in pagelinks (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:51 ladsgroup@deploy2002: Started scap: Backport for Set fawiki to read new in pagelinks (T351237)
  • 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56999 and previous config saved to /var/cache/conftool/dbconfig/20240219-085055-root.json
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56998 and previous config saved to /var/cache/conftool/dbconfig/20240219-083840-root.json
  • 08:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56997 and previous config saved to /var/cache/conftool/dbconfig/20240219-083550-root.json
  • 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 08:25 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2166.codfw.wmnet onto db2167.codfw.wmnet
  • 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56996 and previous config saved to /var/cache/conftool/dbconfig/20240219-082336-root.json
  • 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2166 T354826', diff saved to https://phabricator.wikimedia.org/P56995 and previous config saved to /var/cache/conftool/dbconfig/20240219-082321-root.json
  • 08:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 08:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 08:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56994 and previous config saved to /var/cache/conftool/dbconfig/20240219-082121-root.json
  • 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56993 and previous config saved to /var/cache/conftool/dbconfig/20240219-082045-root.json
  • 08:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2119 (T352010)', diff saved to https://phabricator.wikimedia.org/P56992 and previous config saved to /var/cache/conftool/dbconfig/20240219-081920-ladsgroup.json
  • 08:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 08:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 08:16 moritzm: installing runc security updates on buster
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Place db2167 in s8 T354826', diff saved to https://phabricator.wikimedia.org/P56991 and previous config saved to /var/cache/conftool/dbconfig/20240219-081132-marostegui.json
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56990 and previous config saved to /var/cache/conftool/dbconfig/20240219-080831-root.json
  • 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2167 multiinstance', diff saved to https://phabricator.wikimedia.org/P56989 and previous config saved to /var/cache/conftool/dbconfig/20240219-080744-marostegui.json
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56988 and previous config saved to /var/cache/conftool/dbconfig/20240219-080616-root.json
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56987 and previous config saved to /var/cache/conftool/dbconfig/20240219-080612-root.json
  • 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56986 and previous config saved to /var/cache/conftool/dbconfig/20240219-080540-root.json
  • 08:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2167 T354826', diff saved to https://phabricator.wikimedia.org/P56985 and previous config saved to /var/cache/conftool/dbconfig/20240219-080322-root.json
  • 07:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56984 and previous config saved to /var/cache/conftool/dbconfig/20240219-075325-root.json
  • 07:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56983 and previous config saved to /var/cache/conftool/dbconfig/20240219-075111-root.json
  • 07:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56982 and previous config saved to /var/cache/conftool/dbconfig/20240219-075107-root.json
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2168 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56981 and previous config saved to /var/cache/conftool/dbconfig/20240219-075035-root.json
  • 07:46 marostegui@cumin1002: dbctl commit (dc=all): 'Place db2168 in s7 T354826', diff saved to https://phabricator.wikimedia.org/P56980 and previous config saved to /var/cache/conftool/dbconfig/20240219-074609-marostegui.json
  • 07:44 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2168 multiinstance', diff saved to https://phabricator.wikimedia.org/P56979 and previous config saved to /var/cache/conftool/dbconfig/20240219-074450-marostegui.json
  • 07:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2168 T354826', diff saved to https://phabricator.wikimedia.org/P56978 and previous config saved to /var/cache/conftool/dbconfig/20240219-074148-root.json
  • 07:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56977 and previous config saved to /var/cache/conftool/dbconfig/20240219-073820-root.json
  • 07:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56976 and previous config saved to /var/cache/conftool/dbconfig/20240219-073606-root.json
  • 07:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56975 and previous config saved to /var/cache/conftool/dbconfig/20240219-073602-root.json
  • 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 100%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56974 and previous config saved to /var/cache/conftool/dbconfig/20240219-073521-root.json
  • 07:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56973 and previous config saved to /var/cache/conftool/dbconfig/20240219-072315-root.json
  • 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56972 and previous config saved to /var/cache/conftool/dbconfig/20240219-072101-root.json
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56971 and previous config saved to /var/cache/conftool/dbconfig/20240219-072057-root.json
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 75%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56970 and previous config saved to /var/cache/conftool/dbconfig/20240219-072016-root.json
  • 07:17 marostegui@cumin1002: dbctl commit (dc=all): 'Place db1213 in s5 T354826', diff saved to https://phabricator.wikimedia.org/P56969 and previous config saved to /var/cache/conftool/dbconfig/20240219-071658-marostegui.json
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1213 multiinstance', diff saved to https://phabricator.wikimedia.org/P56968 and previous config saved to /var/cache/conftool/dbconfig/20240219-071604-marostegui.json
  • 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1213 T354826', diff saved to https://phabricator.wikimedia.org/P56967 and previous config saved to /var/cache/conftool/dbconfig/20240219-070815-root.json
  • 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56966 and previous config saved to /var/cache/conftool/dbconfig/20240219-070556-root.json
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56965 and previous config saved to /var/cache/conftool/dbconfig/20240219-070552-root.json
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 50%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56964 and previous config saved to /var/cache/conftool/dbconfig/20240219-070511-root.json
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Place db1246 in s2 T354826', diff saved to https://phabricator.wikimedia.org/P56963 and previous config saved to /var/cache/conftool/dbconfig/20240219-070212-marostegui.json
  • 06:58 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1246 multiinstance', diff saved to https://phabricator.wikimedia.org/P56962 and previous config saved to /var/cache/conftool/dbconfig/20240219-065848-marostegui.json
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1246 T354826', diff saved to https://phabricator.wikimedia.org/P56961 and previous config saved to /var/cache/conftool/dbconfig/20240219-065456-root.json
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56960 and previous config saved to /var/cache/conftool/dbconfig/20240219-065048-root.json
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 25%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56959 and previous config saved to /var/cache/conftool/dbconfig/20240219-065007-root.json
  • 06:43 marostegui@cumin1002: dbctl commit (dc=all): 'Place db1244 in s4 T354826', diff saved to https://phabricator.wikimedia.org/P56958 and previous config saved to /var/cache/conftool/dbconfig/20240219-064350-marostegui.json
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Place db1244 in s4 T354826', diff saved to https://phabricator.wikimedia.org/P56957 and previous config saved to /var/cache/conftool/dbconfig/20240219-064157-marostegui.json
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 10%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56956 and previous config saved to /var/cache/conftool/dbconfig/20240219-063502-root.json
  • 06:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1244 T354826', diff saved to https://phabricator.wikimedia.org/P56955 and previous config saved to /var/cache/conftool/dbconfig/20240219-063457-root.json
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 5%: After rearraging sections T354826', diff saved to https://phabricator.wikimedia.org/P56954 and previous config saved to /var/cache/conftool/dbconfig/20240219-061957-root.json
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Place db1170 in s7 T354826', diff saved to https://phabricator.wikimedia.org/P56953 and previous config saved to /var/cache/conftool/dbconfig/20240219-061919-marostegui.json
  • 06:17 marostegui@deploy2002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master" (duration: 19m 02s)
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Place db1170 in s7 T354826', diff saved to https://phabricator.wikimedia.org/P56952 and previous config saved to /var/cache/conftool/dbconfig/20240219-061548-marostegui.json
  • 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1170 T354826', diff saved to https://phabricator.wikimedia.org/P56951 and previous config saved to /var/cache/conftool/dbconfig/20240219-061121-root.json
  • 06:08 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:08 marostegui@deploy2002: marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 05:58 marostegui@deploy2002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 master"
  • 05:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: Primary switchover pc1 T356371
  • 05:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: Primary switchover pc1 T356371

2024-02-18

  • 23:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T352010)', diff saved to https://phabricator.wikimedia.org/P56950 and previous config saved to /var/cache/conftool/dbconfig/20240218-231102-ladsgroup.json
  • 22:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P56949 and previous config saved to /var/cache/conftool/dbconfig/20240218-225556-ladsgroup.json
  • 22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P56948 and previous config saved to /var/cache/conftool/dbconfig/20240218-224049-ladsgroup.json
  • 22:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T352010)', diff saved to https://phabricator.wikimedia.org/P56947 and previous config saved to /var/cache/conftool/dbconfig/20240218-222543-ladsgroup.json
  • 21:10 eileen: civicrm upgraded from 45a0138c to 5af300d4
  • 17:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T352010)', diff saved to https://phabricator.wikimedia.org/P56945 and previous config saved to /var/cache/conftool/dbconfig/20240218-171526-ladsgroup.json
  • 17:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 17:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 17:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T352010)', diff saved to https://phabricator.wikimedia.org/P56944 and previous config saved to /var/cache/conftool/dbconfig/20240218-171502-ladsgroup.json
  • 16:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P56943 and previous config saved to /var/cache/conftool/dbconfig/20240218-165955-ladsgroup.json
  • 16:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P56942 and previous config saved to /var/cache/conftool/dbconfig/20240218-164448-ladsgroup.json
  • 16:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T352010)', diff saved to https://phabricator.wikimedia.org/P56941 and previous config saved to /var/cache/conftool/dbconfig/20240218-162942-ladsgroup.json
  • 11:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T352010)', diff saved to https://phabricator.wikimedia.org/P56940 and previous config saved to /var/cache/conftool/dbconfig/20240218-111954-ladsgroup.json
  • 11:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T352010)', diff saved to https://phabricator.wikimedia.org/P56939 and previous config saved to /var/cache/conftool/dbconfig/20240218-111915-ladsgroup.json
  • 11:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P56938 and previous config saved to /var/cache/conftool/dbconfig/20240218-110408-ladsgroup.json
  • 10:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P56937 and previous config saved to /var/cache/conftool/dbconfig/20240218-104901-ladsgroup.json
  • 10:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T352010)', diff saved to https://phabricator.wikimedia.org/P56936 and previous config saved to /var/cache/conftool/dbconfig/20240218-103355-ladsgroup.json
  • 09:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T352010)', diff saved to https://phabricator.wikimedia.org/P56935 and previous config saved to /var/cache/conftool/dbconfig/20240218-093323-ladsgroup.json
  • 09:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T352010)', diff saved to https://phabricator.wikimedia.org/P56934 and previous config saved to /var/cache/conftool/dbconfig/20240218-093301-ladsgroup.json
  • 09:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P56933 and previous config saved to /var/cache/conftool/dbconfig/20240218-091754-ladsgroup.json
  • 09:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P56932 and previous config saved to /var/cache/conftool/dbconfig/20240218-090248-ladsgroup.json
  • 08:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T352010)', diff saved to https://phabricator.wikimedia.org/P56931 and previous config saved to /var/cache/conftool/dbconfig/20240218-084741-ladsgroup.json
  • 03:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T352010)', diff saved to https://phabricator.wikimedia.org/P56930 and previous config saved to /var/cache/conftool/dbconfig/20240218-035542-ladsgroup.json
  • 03:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 03:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance

2024-02-17

  • 23:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 23:42 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 23:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T352010)', diff saved to https://phabricator.wikimedia.org/P56929 and previous config saved to /var/cache/conftool/dbconfig/20240217-234216-ladsgroup.json
  • 23:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P56928 and previous config saved to /var/cache/conftool/dbconfig/20240217-232709-ladsgroup.json
  • 23:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P56927 and previous config saved to /var/cache/conftool/dbconfig/20240217-231203-ladsgroup.json
  • 22:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T352010)', diff saved to https://phabricator.wikimedia.org/P56926 and previous config saved to /var/cache/conftool/dbconfig/20240217-225656-ladsgroup.json
  • 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2138:3314 (T352010)', diff saved to https://phabricator.wikimedia.org/P56925 and previous config saved to /var/cache/conftool/dbconfig/20240217-175100-ladsgroup.json
  • 17:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 17:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 17:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T352010)', diff saved to https://phabricator.wikimedia.org/P56924 and previous config saved to /var/cache/conftool/dbconfig/20240217-175038-ladsgroup.json
  • 17:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P56923 and previous config saved to /var/cache/conftool/dbconfig/20240217-173531-ladsgroup.json
  • 17:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P56922 and previous config saved to /var/cache/conftool/dbconfig/20240217-172024-ladsgroup.json
  • 17:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T352010)', diff saved to https://phabricator.wikimedia.org/P56921 and previous config saved to /var/cache/conftool/dbconfig/20240217-170518-ladsgroup.json
  • 11:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2137:3314 (T352010)', diff saved to https://phabricator.wikimedia.org/P56920 and previous config saved to /var/cache/conftool/dbconfig/20240217-115446-ladsgroup.json
  • 11:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T352010)', diff saved to https://phabricator.wikimedia.org/P56919 and previous config saved to /var/cache/conftool/dbconfig/20240217-115422-ladsgroup.json
  • 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P56918 and previous config saved to /var/cache/conftool/dbconfig/20240217-113916-ladsgroup.json
  • 11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P56917 and previous config saved to /var/cache/conftool/dbconfig/20240217-112409-ladsgroup.json
  • 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T352010)', diff saved to https://phabricator.wikimedia.org/P56916 and previous config saved to /var/cache/conftool/dbconfig/20240217-110903-ladsgroup.json
  • 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T352010)', diff saved to https://phabricator.wikimedia.org/P56915 and previous config saved to /var/cache/conftool/dbconfig/20240217-100830-ladsgroup.json
  • 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T352010)', diff saved to https://phabricator.wikimedia.org/P56914 and previous config saved to /var/cache/conftool/dbconfig/20240217-100809-ladsgroup.json
  • 09:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P56913 and previous config saved to /var/cache/conftool/dbconfig/20240217-095302-ladsgroup.json
  • 09:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P56912 and previous config saved to /var/cache/conftool/dbconfig/20240217-093755-ladsgroup.json
  • 09:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T352010)', diff saved to https://phabricator.wikimedia.org/P56911 and previous config saved to /var/cache/conftool/dbconfig/20240217-092249-ladsgroup.json
  • 08:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2119 (T352010)', diff saved to https://phabricator.wikimedia.org/P56910 and previous config saved to /var/cache/conftool/dbconfig/20240217-082217-ladsgroup.json
  • 08:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 08:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T352010)', diff saved to https://phabricator.wikimedia.org/P56909 and previous config saved to /var/cache/conftool/dbconfig/20240217-082155-ladsgroup.json
  • 08:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P56908 and previous config saved to /var/cache/conftool/dbconfig/20240217-080649-ladsgroup.json
  • 07:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P56907 and previous config saved to /var/cache/conftool/dbconfig/20240217-075142-ladsgroup.json
  • 07:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T352010)', diff saved to https://phabricator.wikimedia.org/P56906 and previous config saved to /var/cache/conftool/dbconfig/20240217-073636-ladsgroup.json
  • 02:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2110 (T352010)', diff saved to https://phabricator.wikimedia.org/P56905 and previous config saved to /var/cache/conftool/dbconfig/20240217-022159-ladsgroup.json
  • 02:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 02:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 02:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T352010)', diff saved to https://phabricator.wikimedia.org/P56904 and previous config saved to /var/cache/conftool/dbconfig/20240217-022137-ladsgroup.json
  • 02:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P56903 and previous config saved to /var/cache/conftool/dbconfig/20240217-020630-ladsgroup.json
  • 01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P56902 and previous config saved to /var/cache/conftool/dbconfig/20240217-015123-ladsgroup.json
  • 01:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T352010)', diff saved to https://phabricator.wikimedia.org/P56901 and previous config saved to /var/cache/conftool/dbconfig/20240217-013617-ladsgroup.json

2024-02-16

  • 21:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2205.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2205.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2204.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2204.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2204.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2205.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2203.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2205.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2204.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2203.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2203 to codfw - jhancock@cumin2002"
  • 21:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2203 to codfw - jhancock@cumin2002"
  • 21:35 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2201.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2202.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2202.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2201.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:58 tzatziki: removing 2 files for legal compliance
  • 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T352010)', diff saved to https://phabricator.wikimedia.org/P56900 and previous config saved to /var/cache/conftool/dbconfig/20240216-204746-ladsgroup.json
  • 20:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T352010)', diff saved to https://phabricator.wikimedia.org/P56899 and previous config saved to /var/cache/conftool/dbconfig/20240216-204709-ladsgroup.json
  • 20:38 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:38 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:35 hashar@deploy2002: Finished deploy [integration/docroot@7a9d46f]: build: Upgrade mediawiki/mediawiki-codesniffer to v43.0.0 (duration: 00m 07s)
  • 20:35 hashar@deploy2002: Started deploy [integration/docroot@7a9d46f]: build: Upgrade mediawiki/mediawiki-codesniffer to v43.0.0
  • 20:33 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:33 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P56898 and previous config saved to /var/cache/conftool/dbconfig/20240216-203202-ladsgroup.json
  • 20:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:23 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 20:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P56897 and previous config saved to /var/cache/conftool/dbconfig/20240216-201656-ladsgroup.json
  • 20:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2202.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 20:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:12 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2106 (T352010)', diff saved to https://phabricator.wikimedia.org/P56896 and previous config saved to /var/cache/conftool/dbconfig/20240216-201239-ladsgroup.json
  • 20:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 20:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T352010)', diff saved to https://phabricator.wikimedia.org/P56895 and previous config saved to /var/cache/conftool/dbconfig/20240216-200149-ladsgroup.json
  • 19:56 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:55 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2201.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2201.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:51 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:50 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 19:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2201.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2202.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2201.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2200.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2200 to codfw - jhancock@cumin2002"
  • 19:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2200 to codfw - jhancock@cumin2002"
  • 19:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 19:08 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:07 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:50 jdrewniak@deploy2002: Finished scap: Backport for dd elements should have no margin (T357742) (duration: 14m 04s)
  • 17:43 jdrewniak@deploy2002: jdrewniak and kemayo: Continuing with sync
  • 17:37 jdrewniak@deploy2002: jdrewniak and kemayo: Backport for dd elements should have no margin (T357742) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:36 jdrewniak@deploy2002: Started scap: Backport for dd elements should have no margin (T357742)
  • 17:09 jdrewniak@deploy2002: Finished scap: Backport for Mitigates font size issues (T357724) (duration: 10m 04s)
  • 17:02 jdrewniak@deploy2002: jdrewniak and jdlrobson: Continuing with sync
  • 17:02 jdrewniak@deploy2002: jdrewniak and jdlrobson: Backport for Mitigates font size issues (T357724) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:59 jdrewniak@deploy2002: Started scap: Backport for Mitigates font size issues (T357724)
  • 16:53 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:53 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2199.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1044-1050].eqiad.wmnet
  • 16:39 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:39 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[1044-1050].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1002"
  • 16:36 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[1044-1050].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1002"
  • 16:34 mvernon@cumin1002: START - Cookbook sre.dns.netbox
  • 16:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2199.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:18 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2199 to codfw - jhancock@cumin2002"
  • 16:17 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:17 bking@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2199 to codfw - jhancock@cumin2002"
  • 16:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:05 mvernon@cumin1002: START - Cookbook sre.hosts.decommission for hosts ms-be[1044-1050].eqiad.wmnet
  • 16:04 ejegg: fundraising civicrm upgraded from 84ba0ccf to 45a0138c
  • 16:01 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be1047
  • 15:53 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic100[1-4]* for decom hosts - bking@cumin2002 - T357780
  • 15:53 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic100[1-4]* for decom hosts - bking@cumin2002 - T357780
  • 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
  • 15:35 hnowlan@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(mw1349.eqiad.wmnet|mw1367.eqiad.wmnet|mw1476.eqiad.wmnet|mw1477.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 15:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
  • 15:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ms-be1047.eqiad.wmnet
  • 15:28 hnowlan: running `homer 'cr*eqiad*' commit 'T351074'` for new k8s workers
  • 15:20 mvernon@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be1047.eqiad.wmnet
  • 15:20 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be1047
  • 15:14 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be1046
  • 15:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 15:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 14:49 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be1046
  • 14:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
  • 14:42 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
  • 14:34 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --start '["73436010"]' | tee -a ~/T315510-enwiki
  • 14:33 Lucas_WMDE: STOP persistRevisionThreadItems.php on enwiki for T315510 again, I forgot to adjust the --start >.<
  • 14:33 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be1046
  • 14:32 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --start '["67578461"]' | tee -a ~/T315510-enwiki
  • 14:32 Lucas_WMDE: STOP persistRevisionThreadItems on enwiki for T315510 – for restart on wmf.18; last output: --start '["73436010"]'
  • 14:19 bking@cumin2002: conftool action : set/pooled=yes; selector: name=cloudelastic1005.eqiad.wmnet
  • 14:19 bking@cumin2002: conftool action : set/weight=10; selector: name=cloudelastic1005.eqiad.wmnet
  • 14:18 mvernon@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ms-be1046.eqiad.wmnet
  • 14:08 mvernon@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be1046.eqiad.wmnet
  • 14:08 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 14:07 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be1046
  • 14:07 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 14:07 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 14:06 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 14:06 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 14:06 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:47 hashar@deploy2002: Finished scap: Backport for Revert "Avoid creating a MWReferenceModel if not needed" (T357745) (duration: 13m 24s)
  • 13:39 hashar@deploy2002: matmarex and hashar: Continuing with sync
  • 13:37 hashar@deploy2002: matmarex and hashar: Backport for Revert "Avoid creating a MWReferenceModel if not needed" (T357745) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1477.eqiad.wmnet with OS bullseye
  • 13:34 hashar@deploy2002: Started scap: Backport for Revert "Avoid creating a MWReferenceModel if not needed" (T357745)
  • 13:26 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1349.eqiad.wmnet with OS bullseye
  • 13:23 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1367.eqiad.wmnet with OS bullseye
  • 13:20 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1476.eqiad.wmnet with OS bullseye
  • 13:20 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1477.eqiad.wmnet with reason: host reimage
  • 13:17 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1477.eqiad.wmnet with reason: host reimage
  • 13:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
  • 13:05 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1477.eqiad.wmnet with OS bullseye
  • 13:04 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1367.eqiad.wmnet with reason: host reimage
  • 13:02 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1476.eqiad.wmnet with reason: host reimage
  • 13:00 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw1477.eqiad.wmnet with OS bullseye
  • 13:00 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
  • 13:00 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1367.eqiad.wmnet with reason: host reimage
  • 12:59 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1476.eqiad.wmnet with reason: host reimage
  • 12:47 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1477.eqiad.wmnet with OS bullseye
  • 12:46 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1476.eqiad.wmnet with OS bullseye
  • 12:46 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1367.eqiad.wmnet with OS bullseye
  • 12:46 taavi: publish docker-registry.discovery.wmnet/python3-bookworm:0.0.1
  • 12:46 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1349.eqiad.wmnet with OS bullseye
  • 12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T352010)', diff saved to https://phabricator.wikimedia.org/P56892 and previous config saved to /var/cache/conftool/dbconfig/20240216-121416-ladsgroup.json
  • 12:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 10:58 moritzm: update bullseye/bookworm netboot images on the Puppet 7 volatile environment to the latest point releases (to bring in sync with volatile for Puppet 5) T341056
  • 10:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T352010)', diff saved to https://phabricator.wikimedia.org/P56891 and previous config saved to /var/cache/conftool/dbconfig/20240216-105041-ladsgroup.json
  • 10:44 volans@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 10:44 volans@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 10:43 volans@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 10:42 volans@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1001.eqiad.wmnet
  • 10:41 hnowlan@cumin2002: conftool action : set/pooled=yes; selector: name=mw2379.codfw.wmnet
  • 10:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P56890 and previous config saved to /var/cache/conftool/dbconfig/20240216-103535-ladsgroup.json
  • 10:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P56889 and previous config saved to /var/cache/conftool/dbconfig/20240216-102028-ladsgroup.json
  • 10:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T352010)', diff saved to https://phabricator.wikimedia.org/P56888 and previous config saved to /var/cache/conftool/dbconfig/20240216-100521-ladsgroup.json
  • 10:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2194.codfw.wmnet with reason: Silence for WE
  • 10:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2194.codfw.wmnet with reason: Silence for WE
  • 09:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1036.eqiad.wmnet with OS bullseye
  • 09:07 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 09:06 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 08:38 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-redacteddb1001.eqiad.wmnet with OS bullseye
  • 08:07 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-redacteddb1001']
  • 08:07 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-redacteddb1001']
  • 06:04 apergos: manually generating 7z files in parallel for wikidata full history dumps run, in screen session, owned by ariel, on snapshot1009
  • 05:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T352010)', diff saved to https://phabricator.wikimedia.org/P56887 and previous config saved to /var/cache/conftool/dbconfig/20240216-052044-ladsgroup.json
  • 05:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 05:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 05:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T352010)', diff saved to https://phabricator.wikimedia.org/P56886 and previous config saved to /var/cache/conftool/dbconfig/20240216-052021-ladsgroup.json
  • 05:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 05:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P56885 and previous config saved to /var/cache/conftool/dbconfig/20240216-050514-ladsgroup.json
  • 05:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 05:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T352010)', diff saved to https://phabricator.wikimedia.org/P56884 and previous config saved to /var/cache/conftool/dbconfig/20240216-050458-ladsgroup.json
  • 04:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P56883 and previous config saved to /var/cache/conftool/dbconfig/20240216-045008-ladsgroup.json
  • 04:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P56882 and previous config saved to /var/cache/conftool/dbconfig/20240216-044952-ladsgroup.json
  • 04:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T352010)', diff saved to https://phabricator.wikimedia.org/P56881 and previous config saved to /var/cache/conftool/dbconfig/20240216-043501-ladsgroup.json
  • 04:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P56880 and previous config saved to /var/cache/conftool/dbconfig/20240216-043445-ladsgroup.json
  • 04:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T352010)', diff saved to https://phabricator.wikimedia.org/P56879 and previous config saved to /var/cache/conftool/dbconfig/20240216-041938-ladsgroup.json
  • 01:26 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 01:08 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@d93828e]: (no justification provided) (duration: 00m 28s)
  • 01:07 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@d93828e]: (no justification provided)
  • 00:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 00:28 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 00:27 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - ryankemper@cumin2002 - T356651
  • 00:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T352010)', diff saved to https://phabricator.wikimedia.org/P56877 and previous config saved to /var/cache/conftool/dbconfig/20240216-001636-ladsgroup.json
  • 00:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 00:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 00:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T352010)', diff saved to https://phabricator.wikimedia.org/P56876 and previous config saved to /var/cache/conftool/dbconfig/20240216-001612-ladsgroup.json
  • 00:06 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 00:02 thcipriani@deploy2002: Finished scap: Backport for Connection: Correct read-only detection (T354793 T356526) (duration: 10m 28s)
  • 00:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P56875 and previous config saved to /var/cache/conftool/dbconfig/20240216-000106-ladsgroup.json

2024-02-15

  • 23:55 thcipriani@deploy2002: ebernhardson and thcipriani: Continuing with sync
  • 23:53 thcipriani@deploy2002: ebernhardson and thcipriani: Backport for Connection: Correct read-only detection (T354793 T356526) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1005.eqiad.wmnet with OS bullseye
  • 23:52 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 23:52 thcipriani@deploy2002: Started scap: Backport for Connection: Correct read-only detection (T354793 T356526)
  • 23:50 thcipriani@deploy2002: Finished scap: Backport for Add border-collapse to wikitable (T357589) (duration: 11m 31s)
  • 23:46 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 23:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P56874 and previous config saved to /var/cache/conftool/dbconfig/20240215-234600-ladsgroup.json
  • 23:42 thcipriani@deploy2002: thcipriani and jdlrobson: Continuing with sync
  • 23:40 thcipriani@deploy2002: thcipriani and jdlrobson: Backport for Add border-collapse to wikitable (T357589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:38 thcipriani@deploy2002: Started scap: Backport for Add border-collapse to wikitable (T357589)
  • 23:33 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:33 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:32 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:32 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1005.eqiad.wmnet with reason: host reimage
  • 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T352010)', diff saved to https://phabricator.wikimedia.org/P56873 and previous config saved to /var/cache/conftool/dbconfig/20240215-233053-ladsgroup.json
  • 23:28 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1005.eqiad.wmnet with reason: host reimage
  • 23:26 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 23:18 tzatziki: removing 2 files for legal compliance
  • 23:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1005.eqiad.wmnet with OS bullseye
  • 23:09 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.eqiad.wmnet with OS bullseye
  • 23:02 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1005.eqiad.wmnet with OS bullseye
  • 23:02 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1005.eqiad.wmnet with OS bullseye
  • 22:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 22:40 vriley@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-redacteddb1001']
  • 22:40 vriley@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-redacteddb1001']
  • 22:40 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1005.eqiad.wmnet with OS bullseye
  • 22:38 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-redacteddb1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:34 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1005
  • 22:34 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "c_f"} and A:aqs and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 22:33 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1005
  • 22:30 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:30 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: migrate cloudelastic1005 to private IPs - bking@cumin2002"
  • 22:29 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: migrate cloudelastic1005 to private IPs - bking@cumin2002"
  • 22:27 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 22:25 wfan: payments-wiki upgraded from 29eb0fff to 709d89bf
  • 22:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudelastic1005.wikimedia.org
  • 22:19 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:19 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic1005.wikimedia.org decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 22:16 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic1005.wikimedia.org decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 22:12 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 22:08 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-redacteddb1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:05 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudelastic1005.wikimedia.org
  • 22:05 brennen@deploy2002: Finished scap: Backport for Filter out null external link attributes (T357668) (duration: 11m 40s)
  • 22:03 bking@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=cloudelastic1006\.eqiad\.wmnet
  • 22:00 vriley@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:59 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "c_f"} and A:aqs and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 21:59 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "b_e"} and A:aqs and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 21:59 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad plugin upgrade - ryankemper@cumin2002 - T356651
  • 21:58 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:57 brennen@deploy2002: brennen: Continuing with sync
  • 21:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - ryankemper@cumin2002 - T356651
  • 21:54 brennen@deploy2002: brennen: Backport for Filter out null external link attributes (T357668) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:53 brennen@deploy2002: Started scap: Backport for Filter out null external link attributes (T357668)
  • 21:52 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1005* for IP migration - bking@cumin2002 - T355617
  • 21:52 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1005* for IP migration - bking@cumin2002 - T355617
  • 21:51 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
  • 21:51 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
  • 21:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1006.eqiad.wmnet with OS bullseye
  • 21:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 21:26 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "b_e"} and A:aqs and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 21:21 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.18 refs T354436
  • 21:20 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "a_c"} and A:aqs and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "a_c"} and A:aqs and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 20:46 brennen@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.18 refs T354436 (duration: 08m 05s)
  • 20:41 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "rack3"} and A:aqs and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 20:38 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.18 refs T354436
  • 20:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2109 (T352010)', diff saved to https://phabricator.wikimedia.org/P56870 and previous config saved to /var/cache/conftool/dbconfig/20240215-202036-ladsgroup.json
  • 20:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T352010)', diff saved to https://phabricator.wikimedia.org/P56869 and previous config saved to /var/cache/conftool/dbconfig/20240215-202014-ladsgroup.json
  • 20:08 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "rack3"} and A:aqs and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 20:06 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "rack2"} and A:aqs and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 20:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P56868 and previous config saved to /var/cache/conftool/dbconfig/20240215-200507-ladsgroup.json
  • 20:00 arnaudb@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 100%: T355866 - Post migration repool of es2024', diff saved to https://phabricator.wikimedia.org/P56867 and previous config saved to /var/cache/conftool/dbconfig/20240215-200015-arnaudb.json
  • 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 19:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P56866 and previous config saved to /var/cache/conftool/dbconfig/20240215-195001-ladsgroup.json
  • 19:48 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw plugin upgrade - ryankemper@cumin2002 - T356651
  • 19:45 arnaudb@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 75%: T355866 - Post migration repool of es2024', diff saved to https://phabricator.wikimedia.org/P56865 and previous config saved to /var/cache/conftool/dbconfig/20240215-194510-arnaudb.json
  • 19:43 apergos: manually generating checksums in parallel for wikidata full history dumps run, in screen session, owned by ariel, on snapshot1009
  • 19:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1006.eqiad.wmnet with reason: host reimage
  • 19:39 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1006.eqiad.wmnet with reason: host reimage
  • 19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T352010)', diff saved to https://phabricator.wikimedia.org/P56864 and previous config saved to /var/cache/conftool/dbconfig/20240215-193455-ladsgroup.json
  • 19:31 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "rack2"} and A:aqs and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 19:30 arnaudb@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 50%: T355866 - Post migration repool of es2024', diff saved to https://phabricator.wikimedia.org/P56863 and previous config saved to /var/cache/conftool/dbconfig/20240215-193005-arnaudb.json
  • 19:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1006.eqiad.wmnet with OS bullseye
  • 19:22 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.18 refs T354436
  • 19:15 arnaudb@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 25%: T355866 - Post migration repool of es2024', diff saved to https://phabricator.wikimedia.org/P56862 and previous config saved to /var/cache/conftool/dbconfig/20240215-191500-arnaudb.json
  • 19:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: T355866 - Post migration repool of db2122', diff saved to https://phabricator.wikimedia.org/P56861 and previous config saved to /var/cache/conftool/dbconfig/20240215-191454-arnaudb.json
  • 19:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T352010)', diff saved to https://phabricator.wikimedia.org/P56860 and previous config saved to /var/cache/conftool/dbconfig/20240215-191226-ladsgroup.json
  • 19:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 19:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 19:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246:3314 (T352010)', diff saved to https://phabricator.wikimedia.org/P56859 and previous config saved to /var/cache/conftool/dbconfig/20240215-191203-ladsgroup.json
  • 19:11 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1006.eqiad.wmnet with OS bullseye
  • 19:04 brennen: train 1.42.0-wmf.18 (T354436): no current blockers, rolling to all wikis.
  • 18:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: T355866 - Post migration repool of db2122', diff saved to https://phabricator.wikimedia.org/P56858 and previous config saved to /var/cache/conftool/dbconfig/20240215-185949-arnaudb.json
  • 18:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246:3314', diff saved to https://phabricator.wikimedia.org/P56857 and previous config saved to /var/cache/conftool/dbconfig/20240215-185657-ladsgroup.json
  • 18:50 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1006.eqiad.wmnet
  • 18:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: T355866 - Post migration repool of db2122', diff saved to https://phabricator.wikimedia.org/P56856 and previous config saved to /var/cache/conftool/dbconfig/20240215-184444-arnaudb.json
  • 18:42 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudservices1006.eqiad.wmnet
  • 18:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246:3314', diff saved to https://phabricator.wikimedia.org/P56855 and previous config saved to /var/cache/conftool/dbconfig/20240215-184150-ladsgroup.json
  • 18:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: T355866 - Post migration repool of db2122', diff saved to https://phabricator.wikimedia.org/P56853 and previous config saved to /var/cache/conftool/dbconfig/20240215-182939-arnaudb.json
  • 18:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: T355866 - Post migration repool of db2105', diff saved to https://phabricator.wikimedia.org/P56852 and previous config saved to /var/cache/conftool/dbconfig/20240215-182934-arnaudb.json
  • 18:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246:3314 (T352010)', diff saved to https://phabricator.wikimedia.org/P56850 and previous config saved to /var/cache/conftool/dbconfig/20240215-182644-ladsgroup.json
  • 18:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1006.eqiad.wmnet with OS bullseye
  • 18:23 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1006
  • 18:21 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 18:21 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1006
  • 18:21 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:20 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: migrate cloudelastic1006 to private IPs - bking@cumin2002"
  • 18:20 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: migrate cloudelastic1006 to private IPs - bking@cumin2002"
  • 18:18 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 18:18 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 18:17 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 18:17 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 18:16 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 18:15 bd808@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 18:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: T355866 - Post migration repool of db2105', diff saved to https://phabricator.wikimedia.org/P56849 and previous config saved to /var/cache/conftool/dbconfig/20240215-181429-arnaudb.json
  • 18:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudelastic1006.wikimedia.org
  • 18:12 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:12 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic1006.wikimedia.org decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 18:11 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic1006.wikimedia.org decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 18:09 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 18:02 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudelastic1006.wikimedia.org
  • 17:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 50%: T355866 - Post migration repool of db2105', diff saved to https://phabricator.wikimedia.org/P56848 and previous config saved to /var/cache/conftool/dbconfig/20240215-175924-arnaudb.json
  • 17:54 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.eqiad.wmnet
  • 17:48 swfrench-wmf: reenabled puppet on mediawiki::webserver hosts after deploying for T357436
  • 17:47 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.eqiad.wmnet
  • 17:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 25%: T355866 - Post migration repool of db2105', diff saved to https://phabricator.wikimedia.org/P56847 and previous config saved to /var/cache/conftool/dbconfig/20240215-174419-arnaudb.json
  • 17:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 100%: T355866 - Post migration repool of db2156', diff saved to https://phabricator.wikimedia.org/P56846 and previous config saved to /var/cache/conftool/dbconfig/20240215-174414-arnaudb.json
  • 17:38 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 17:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 17:37 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 17:37 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 17:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 17:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 17:37 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 17:36 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 17:36 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 17:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 17:35 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:34 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:34 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:33 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:33 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:32 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:32 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:31 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:31 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:30 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 75%: T355866 - Post migration repool of db2156', diff saved to https://phabricator.wikimedia.org/P56844 and previous config saved to /var/cache/conftool/dbconfig/20240215-172909-arnaudb.json
  • 17:28 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1005.eqiad.wmnet
  • 17:24 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on vrts1002.eqiad.wmnet with reason: Migration Ongoing
  • 17:24 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "rack1"} and A:aqs and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 17:24 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on vrts1002.eqiad.wmnet with reason: Migration Ongoing
  • 17:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:23 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 17:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:21 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
  • 17:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 50%: T355866 - Post migration repool of db2156', diff saved to https://phabricator.wikimedia.org/P56843 and previous config saved to /var/cache/conftool/dbconfig/20240215-171403-arnaudb.json
  • 17:05 swfrench-wmf: disabling puppet shortly on mediawiki::webserver hosts to deploy T357436
  • 16:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2156 (re)pooling @ 25%: T355866 - Post migration repool of db2156', diff saved to https://phabricator.wikimedia.org/P56842 and previous config saved to /var/cache/conftool/dbconfig/20240215-165858-arnaudb.json
  • 16:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 100%: T355866 - Post migration repool of db2155', diff saved to https://phabricator.wikimedia.org/P56841 and previous config saved to /var/cache/conftool/dbconfig/20240215-165853-arnaudb.json
  • 16:53 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:53 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:53 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:53 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:53 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:53 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:51 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "rack1"} and A:aqs and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 16:46 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@162f72f] (sessionstore): Deploying to updated target list — T353550 (duration: 00m 15s)
  • 16:46 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@162f72f] (sessionstore): Deploying to updated target list — T353550
  • 16:46 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@162f72f] (ml-cache): Deploying to updated target list — T353550 (duration: 00m 15s)
  • 16:46 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw2379.codfw.wmnet with reason: BGP issues - uncordoned, needs investigation
  • 16:45 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@162f72f] (ml-cache): Deploying to updated target list — T353550
  • 16:45 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw2379.codfw.wmnet with reason: BGP issues - uncordoned, needs investigation
  • 16:45 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@162f72f] (cassandra-dev): Deploying to updated target list — T353550 (duration: 00m 15s)
  • 16:45 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@162f72f] (cassandra-dev): Deploying to updated target list — T353550
  • 16:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 75%: T355866 - Post migration repool of db2155', diff saved to https://phabricator.wikimedia.org/P56840 and previous config saved to /var/cache/conftool/dbconfig/20240215-164348-arnaudb.json
  • 16:43 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@162f72f] (aqs): Deploying to updated target list — T353550 (duration: 00m 37s)
  • 16:43 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@162f72f] (aqs): Deploying to updated target list — T353550
  • 16:40 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:40 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:40 hnowlan@cumin2002: conftool action : set/pooled=no; selector: name=mw2379.codfw.wmnet
  • 16:40 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:40 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:40 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:38 dancy@deploy2002: Finished scap: Backport for Load WikimediaCampaignEvents if CampaignEvents is loaded (T347909) (duration: 13m 36s)
  • 16:30 dancy@deploy2002: mhorsey and dancy: Continuing with sync
  • 16:29 hnowlan: kubectl cordon mw2379.codfw.wmnet - bgp issues
  • 16:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 50%: T355866 - Post migration repool of db2155', diff saved to https://phabricator.wikimedia.org/P56839 and previous config saved to /var/cache/conftool/dbconfig/20240215-162843-arnaudb.json
  • 16:26 dancy@deploy2002: mhorsey and dancy: Backport for Load WikimediaCampaignEvents if CampaignEvents is loaded (T347909) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:24 dancy@deploy2002: Started scap: Backport for Load WikimediaCampaignEvents if CampaignEvents is loaded (T347909)
  • 16:16 hnowlan@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=(mw2311.codfw.wmnet|mw2335.codfw.wmnet|mw2379.codfw.wmnet|mw2380.codfw.wmnet|mw2383.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 16:14 Daimona: Creating new DB table for the WikimediaCampaignEvents extension in x1.testwiki, x1.test2wiki, x1.officewiki, and x1.wikishared # T347909
  • 16:13 cgoubert@cumin2002: conftool action : set/pooled=yes; selector: name=(mw2302|mw2303|mw2304|mw2305|mw2306|mw2307|mw2308|mw2309|mw2426).*
  • 16:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 25%: T355866 - Post migration repool of db2155', diff saved to https://phabricator.wikimedia.org/P56838 and previous config saved to /var/cache/conftool/dbconfig/20240215-161338-arnaudb.json
  • 16:13 claime: Repooling mw2302|mw2303|mw2304|mw2305|mw2306|mw2307|mw2308|mw2309|mw2426 - T355866
  • 16:13 claime: Uncordoning kubernetes2059.codfw.wmnet kubernetes2028.codfw.wmnet kubernetes2027.codfw.wmnet kubernetes2060.codfw.wmnet kubernetes2008.codfw.wmnet kubernetes2007.codfw.wmnet kubernetes2055.codfw.wmnet mw2301.codfw.wmnet mw2424.codfw.wmnet mw2425.codfw.wmnet mw2427.codfw.wmnet - T355866
  • 16:13 hnowlan@deploy2002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 16:12 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 16:12 hnowlan@deploy2002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 16:12 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 16:00 topranks: commencing move of server uplinks codfw row A6 T355866
  • 15:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 38 hosts with reason: Migrating servers in codfw rack A6 to lsw1-a6-codfw
  • 15:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 38 hosts with reason: Migrating servers in codfw rack A6 to lsw1-a6-codfw
  • 15:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on es2028.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on es2028.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on es2027.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on es2027.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:49 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-a-codfw,cr[1-2]-codfw,lsw1-a6-codfw.mgmt with reason: prepping for server uplink migration codfw rack a6
  • 15:49 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on asw-a-codfw,cr[1-2]-codfw,lsw1-a6-codfw.mgmt with reason: prepping for server uplink migration codfw rack a6
  • 15:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2024.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on es2024.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2133.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2133.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2122.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2122.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2105.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2105.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2156.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2156.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2155.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2155.codfw.wmnet with reason: T355866 - Migrate servers in codfw rack A6 from asw-a6-codfw to lsw1-a6-codfw
  • 15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'T355866 - db2155 db2156 db2105 db2122 db2133 es2024', diff saved to https://phabricator.wikimedia.org/P56837 and previous config saved to /var/cache/conftool/dbconfig/20240215-154520-arnaudb.json
  • 15:24 moritzm: imported openssl11 1.1.1w-0+deb11u1+wmf2 to component/haproxy26 T352744 (with fix for libssl11-dev file contents)
  • 15:15 cgoubert@cumin2002: conftool action : set/pooled=inactive; selector: name=(mw2302|mw2303|mw2304|mw2305|mw2306|mw2307|mw2308|mw2309|mw2426).*
  • 15:15 claime: Depooling mw2302|mw2303|mw2304|mw2305|mw2306|mw2307|mw2308|mw2309|mw2426 - T355866
  • 15:14 claime: Draining kubernetes2059.codfw.wmnet kubernetes2028.codfw.wmnet kubernetes2027.codfw.wmnet kubernetes2060.codfw.wmnet kubernetes2008.codfw.wmnet kubernetes2007.codfw.wmnet kubernetes2055.codfw.wmnet mw2301.codfw.wmnet mw2424.codfw.wmnet mw2425.codfw.wmnet mw2427.codfw.wmnet - T355866
  • 15:12 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:47 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for [ruwikiquote] Add 'suppressredirect' right to editors (T357241) (duration: 09m 26s)
  • 14:40 logmsgbot: lucaswerkmeister-wmde@deploy2002 superpes and lucaswerkmeister-wmde: Continuing with sync
  • 14:40 logmsgbot: lucaswerkmeister-wmde@deploy2002 superpes and lucaswerkmeister-wmde: Backport for [ruwikiquote] Add 'suppressredirect' right to editors (T357241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:38 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for [ruwikiquote] Add 'suppressredirect' right to editors (T357241)
  • 14:37 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:36 hnowlan: migrating cirrusSearchLinksUpdate to k8s
  • 14:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:36 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:35 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:35 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:35 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:29 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for [rowiki] Change autoconfirmed setting (T355990) (duration: 10m 55s)
  • 14:23 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 14:21 logmsgbot: lucaswerkmeister-wmde@deploy2002 superpes and lucaswerkmeister-wmde: Continuing with sync
  • 14:20 logmsgbot: lucaswerkmeister-wmde@deploy2002 superpes and lucaswerkmeister-wmde: Backport for [rowiki] Change autoconfirmed setting (T355990) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:18 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for [rowiki] Change autoconfirmed setting (T355990)
  • 14:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2005.codfw.wmnet with OS bookworm
  • 14:16 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 14:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246:3314 (T352010)', diff saved to https://phabricator.wikimedia.org/P56836 and previous config saved to /var/cache/conftool/dbconfig/20240215-140613-ladsgroup.json
  • 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 14:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 14:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
  • 14:02 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
  • 13:48 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2005.codfw.wmnet with OS bookworm
  • 13:01 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1031.eqiad.wmnet with OS bookworm
  • 12:34 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host mw2379.codfw.wmnet
  • 12:32 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 12:30 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 12:21 hnowlan@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2379.codfw.wmnet
  • 12:17 moritzm: installing Linux 5.10.209 on Bullseye hosts
  • 12:11 aborrero@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1031.eqiad.wmnet with OS bookworm
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
  • 12:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
  • 11:59 claime: Bumping external traffic to mw-on-k8s to 45% - T357507
  • 11:57 cgoubert@deploy2002: Finished scap: Deploying mw-on-k8s 1003499 1003393 - T349796 T357507 (duration: 00m 50s)
  • 11:56 cgoubert@deploy2002: Started scap: Deploying mw-on-k8s 1003499 1003393 - T349796 T357507
  • 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_test_cluster::hadoop::ui
  • 11:39 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddumps1002.wikimedia.org
  • 11:36 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_test_cluster::hadoop::ui
  • 11:30 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddumps1002.wikimedia.org
  • 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2105 (T352010)', diff saved to https://phabricator.wikimedia.org/P56834 and previous config saved to /var/cache/conftool/dbconfig/20240215-112535-ladsgroup.json
  • 11:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 11:18 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2379.codfw.wmnet
  • 11:10 hnowlan@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2379.codfw.wmnet
  • 11:07 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for Revert "Include article name in Ploticus error messages" (T357268), Revert "Include article name in Ploticus error messages" (T357268) (duration: 10m 59s)
  • 11:07 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=mkwiki --logwiki=metawiki 'CatCat' 'MonkeyPython' # T357602
  • 11:00 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and matmarex: Continuing with sync
  • 10:58 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and matmarex: Backport for Revert "Include article name in Ploticus error messages" (T357268), Revert "Include article name in Ploticus error messages" (T357268) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:56 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for Revert "Include article name in Ploticus error messages" (T357268), Revert "Include article name in Ploticus error messages" (T357268)
  • 10:53 zabe: zabe@mwmaint2002:/tmp/uploads$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user="OGPawlis" . # T357605
  • 10:37 hnowlan: running `homer 'cr*codfw*' commit 'T351074'` for new k8s nodes
  • 10:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244:3314 (T352010)', diff saved to https://phabricator.wikimedia.org/P56831 and previous config saved to /var/cache/conftool/dbconfig/20240215-102409-ladsgroup.json
  • 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244:3314', diff saved to https://phabricator.wikimedia.org/P56830 and previous config saved to /var/cache/conftool/dbconfig/20240215-100903-ladsgroup.json
  • 09:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244:3314', diff saved to https://phabricator.wikimedia.org/P56829 and previous config saved to /var/cache/conftool/dbconfig/20240215-095356-ladsgroup.json
  • 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1003.eqiad.wmnet
  • 09:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host eventlog1003.eqiad.wmnet
  • 09:43 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 09:42 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 09:42 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 09:41 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 09:41 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 09:40 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: eventlogging::analytics
  • 09:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244:3314 (T352010)', diff saved to https://phabricator.wikimedia.org/P56827 and previous config saved to /var/cache/conftool/dbconfig/20240215-093850-ladsgroup.json
  • 09:29 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: eventlogging::analytics
  • 08:50 moritzm: rebalance Ganeti codfw/A now that the switch maintenance for A5 and A6 are completed T355864 T355863
  • 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host restbase1036.eqiad.wmnet
  • 08:35 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host restbase1036.eqiad.wmnet
  • 08:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: apifeatureusage::logstash
  • 08:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: apifeatureusage::logstash
  • 05:43 kart_: Update cxserver to 2023-12-04-083437-production (T344982, T338432, T351138)
  • 05:40 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:39 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:39 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:38 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 04:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P56823 and previous config saved to /var/cache/conftool/dbconfig/20240215-044554-ladsgroup.json
  • 04:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T352010)', diff saved to https://phabricator.wikimedia.org/P56822 and previous config saved to /var/cache/conftool/dbconfig/20240215-043047-ladsgroup.json
  • 04:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 04:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 02:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1036.eqiad.wmnet with OS bullseye
  • 02:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 02:29 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 02:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1036.eqiad.wmnet with reason: host reimage
  • 02:11 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1036.eqiad.wmnet with reason: host reimage
  • 01:55 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1036.eqiad.wmnet with OS bullseye
  • 01:46 aokoth@cumin1002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM vrts1002.eqiad.wmnet
  • 01:37 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 00:45 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002

2024-02-14

  • 23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T352010)', diff saved to https://phabricator.wikimedia.org/P56821 and previous config saved to /var/cache/conftool/dbconfig/20240214-235725-ladsgroup.json
  • 23:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 23:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T352010)', diff saved to https://phabricator.wikimedia.org/P56820 and previous config saved to /var/cache/conftool/dbconfig/20240214-235703-ladsgroup.json
  • 23:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P56819 and previous config saved to /var/cache/conftool/dbconfig/20240214-234157-ladsgroup.json
  • 23:32 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "c"} and A:restbase and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 23:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P56818 and previous config saved to /var/cache/conftool/dbconfig/20240214-232651-ladsgroup.json
  • 23:14 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new master settings - bking@cumin2002 - T355617
  • 23:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T352010)', diff saved to https://phabricator.wikimedia.org/P56817 and previous config saved to /var/cache/conftool/dbconfig/20240214-231144-ladsgroup.json
  • 23:10 eileen: civicrm upgraded from 3ee91f59 to 84ba0ccf
  • 22:51 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new master settings - bking@cumin2002 - T355617
  • 22:50 bking@cumin2002: conftool action : set/pooled=yes; selector: name=cloudelastic1008.eqiad.wmnet
  • 22:50 bking@cumin2002: conftool action : set/pooled=yes; selector: name=cloudelastic1007.eqiad.wmnet
  • 22:49 bking@cumin2002: conftool action : set/weight=10; selector: name=cloudelastic1008.eqiad.wmnet
  • 22:49 bking@cumin2002: conftool action : set/weight=10; selector: name=cloudelastic1007.eqiad.wmnet
  • 22:48 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new master settings - bking@cumin2002 - T355617
  • 22:39 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "c"} and A:restbase and A:codfw: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 22:33 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new master settings - bking@cumin2002 - T355617
  • 22:20 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1005*,cloudelastic1006* for IP migration - bking@cumin2002 - T355617
  • 22:20 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1005*,cloudelastic1006* for IP migration - bking@cumin2002 - T355617
  • 22:19 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
  • 22:19 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
  • 22:13 urandom: restarting Cassandra: restbase/codfw, row b — T353550
  • 22:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 22:10 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 22:08 cjming: end of UTC late backport window
  • 22:07 cjming@deploy2002: Finished scap: Backport for throttle.php: Add throttle rule for editathon (T356654) (duration: 08m 31s)
  • 22:00 cjming@deploy2002: zoranzoki21 and cjming: Continuing with sync
  • 22:00 cjming@deploy2002: zoranzoki21 and cjming: Backport for throttle.php: Add throttle rule for editathon (T356654) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 21:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T352010)', diff saved to https://phabricator.wikimedia.org/P56816 and previous config saved to /var/cache/conftool/dbconfig/20240214-215934-ladsgroup.json
  • 21:58 cjming@deploy2002: Started scap: Backport for throttle.php: Add throttle rule for editathon (T356654)
  • 21:56 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:56 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:53 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:53 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:52 cjming@deploy2002: Finished scap: Backport for Turn on Parsoid read views by default on wikitech Talk pages (T355374) (duration: 10m 44s)
  • 21:52 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1032.eqiad.wmnet: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 21:45 cjming@deploy2002: cscott and cjming: Continuing with sync
  • 21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P56815 and previous config saved to /var/cache/conftool/dbconfig/20240214-214427-ladsgroup.json
  • 21:43 cjming@deploy2002: cscott and cjming: Backport for Turn on Parsoid read views by default on wikitech Talk pages (T355374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:41 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:41 cjming@deploy2002: Started scap: Backport for Turn on Parsoid read views by default on wikitech Talk pages (T355374)
  • 21:41 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:41 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:41 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:41 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1032.eqiad.wmnet: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 21:38 cjming@deploy2002: Finished scap: Backport for New communities will not share scripts going forward (T331679), Register dblist (duration: 10m 06s)
  • 21:36 eevans@cumin1002: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching P{P:cassandra%rack = "b"} and A:restbase and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 21:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 100%: T355864 - Post migration repool of db2176', diff saved to https://phabricator.wikimedia.org/P56814 and previous config saved to /var/cache/conftool/dbconfig/20240214-213544-arnaudb.json
  • 21:31 cjming@deploy2002: cjming and jdlrobson: Continuing with sync
  • 21:29 cjming@deploy2002: cjming and jdlrobson: Backport for New communities will not share scripts going forward (T331679), Register dblist synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P56813 and previous config saved to /var/cache/conftool/dbconfig/20240214-212920-ladsgroup.json
  • 21:28 cjming@deploy2002: Started scap: Backport for New communities will not share scripts going forward (T331679), Register dblist
  • 21:26 cjming@deploy2002: Sync cancelled.
  • 21:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 75%: T355864 - Post migration repool of db2176', diff saved to https://phabricator.wikimedia.org/P56812 and previous config saved to /var/cache/conftool/dbconfig/20240214-212038-arnaudb.json
  • 21:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T352010)', diff saved to https://phabricator.wikimedia.org/P56811 and previous config saved to /var/cache/conftool/dbconfig/20240214-211413-ladsgroup.json
  • 21:08 cjming@deploy2002: cjming and jdlrobson: Backport for New communities will not share scripts going forward (T331679) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 50%: T355864 - Post migration repool of db2176', diff saved to https://phabricator.wikimedia.org/P56810 and previous config saved to /var/cache/conftool/dbconfig/20240214-210531-arnaudb.json
  • 21:05 cjming@deploy2002: Started scap: Backport for New communities will not share scripts going forward (T331679)
  • 20:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1036.eqiad.wmnet with reason: host reimage
  • 20:57 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "b"} and A:restbase and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 20:56 inflatador: bking@pcc-db1001.puppet-diffs.eqiad1.wikimedia.cloud updating puppet facts for PCC
  • 20:56 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1036.eqiad.wmnet with reason: host reimage
  • 20:56 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 20:52 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "a"} and A:restbase and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 20:51 inflatador: bking@puppetmaster1001 manually updating facts data for PCC T355617
  • 20:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db2176 (re)pooling @ 25%: T355864 - Post migration repool of db2176', diff saved to https://phabricator.wikimedia.org/P56808 and previous config saved to /var/cache/conftool/dbconfig/20240214-205027-arnaudb.json
  • 20:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: T355864 - Post migration repool of db2175', diff saved to https://phabricator.wikimedia.org/P56807 and previous config saved to /var/cache/conftool/dbconfig/20240214-205021-arnaudb.json
  • 20:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1036.eqiad.wmnet with OS bullseye
  • 20:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1007.eqiad.wmnet with reason: host reimage
  • 20:37 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:36 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:36 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1007.eqiad.wmnet with reason: host reimage
  • 20:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: T355864 - Post migration repool of db2175', diff saved to https://phabricator.wikimedia.org/P56806 and previous config saved to /var/cache/conftool/dbconfig/20240214-203517-arnaudb.json
  • 20:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:31 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:31 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:22 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 20:21 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1007
  • 20:20 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1007
  • 20:20 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: T355864 - Post migration repool of db2175', diff saved to https://phabricator.wikimedia.org/P56805 and previous config saved to /var/cache/conftool/dbconfig/20240214-202012-arnaudb.json
  • 20:17 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:16 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: migrate cloudelastic1007 to private IPs - bking@cumin2002"
  • 20:16 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: migrate cloudelastic1007 to private IPs - bking@cumin2002"
  • 20:12 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 20:06 arnaudb@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: T355864 - Post migration repool of db2175', diff saved to https://phabricator.wikimedia.org/P56804 and previous config saved to /var/cache/conftool/dbconfig/20240214-200507-arnaudb.json
  • 20:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 100%: T355864 - Post migration repool of db2154', diff saved to https://phabricator.wikimedia.org/P56803 and previous config saved to /var/cache/conftool/dbconfig/20240214-200501-arnaudb.json
  • 20:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudelastic1007.wikimedia.org
  • 20:05 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic1007.wikimedia.org decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 20:03 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic1007.wikimedia.org decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 19:59 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "a"} and A:restbase and A:eqiad: Restart to pickup logging jars — T353550 - eevans@cumin1002
  • 19:58 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:54 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 05s)
  • 19:54 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550
  • 19:53 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 06s)
  • 19:53 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550
  • 19:53 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 07s)
  • 19:53 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550
  • 19:51 ejegg: payments-wiki upgraded from b699e513 to 29eb0fff
  • 19:51 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 03s)
  • 19:51 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550
  • 19:50 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 04s)
  • 19:50 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550
  • 19:50 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudelastic1007.wikimedia.org
  • 19:50 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 05s)
  • 19:50 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550
  • 19:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 75%: T355864 - Post migration repool of db2154', diff saved to https://phabricator.wikimedia.org/P56802 and previous config saved to /var/cache/conftool/dbconfig/20240214-194956-arnaudb.json
  • 19:46 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 34s)
  • 19:46 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550
  • 19:43 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 14s)
  • 19:43 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550
  • 19:42 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 45s)
  • 19:41 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550
  • 19:39 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 01m 17s)
  • 19:38 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550
  • 19:37 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2383.codfw.wmnet with OS bullseye
  • 19:36 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550 (duration: 00m 16s)
  • 19:36 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@0521449]: Deploying to updated target list — T353550
  • 19:35 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2380.codfw.wmnet with OS bullseye
  • 19:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 50%: T355864 - Post migration repool of db2154', diff saved to https://phabricator.wikimedia.org/P56801 and previous config saved to /var/cache/conftool/dbconfig/20240214-193451-arnaudb.json
  • 19:34 brennen@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.18 refs T354436 (duration: 07m 35s)
  • 19:31 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@5c2dd00]: Deploying to updated target list — T353550 (duration: 00m 20s)
  • 19:30 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@5c2dd00]: Deploying to updated target list — T353550
  • 19:28 eevans@deploy2002: Finished deploy [cassandra/logstash-logback-encoder@5c2dd00]: Deploying to updated target list — T353550 (duration: 00m 41s)
  • 19:27 eevans@deploy2002: Started deploy [cassandra/logstash-logback-encoder@5c2dd00]: Deploying to updated target list — T353550
  • 19:26 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.18 refs T354436
  • 19:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 25%: T355864 - Post migration repool of db2154', diff saved to https://phabricator.wikimedia.org/P56800 and previous config saved to /var/cache/conftool/dbconfig/20240214-191946-arnaudb.json
  • 19:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db2153 (re)pooling @ 100%: T355864 - Post migration repool of db2153', diff saved to https://phabricator.wikimedia.org/P56799 and previous config saved to /var/cache/conftool/dbconfig/20240214-191941-arnaudb.json
  • 19:14 brennen: train 1.42.0-wmf.18 (T354436): logs chill, no current blockers, rolling to group1.
  • 19:13 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2383.codfw.wmnet with reason: host reimage
  • 19:11 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2380.codfw.wmnet with reason: host reimage
  • 19:09 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2383.codfw.wmnet with reason: host reimage
  • 19:08 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2380.codfw.wmnet with reason: host reimage
  • 19:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db2153 (re)pooling @ 75%: T355864 - Post migration repool of db2153', diff saved to https://phabricator.wikimedia.org/P56798 and previous config saved to /var/cache/conftool/dbconfig/20240214-190436-arnaudb.json
  • 18:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2379.codfw.wmnet with OS bullseye
  • 18:54 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host mw2383.codfw.wmnet with OS bullseye
  • 18:53 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host mw2380.codfw.wmnet with OS bullseye
  • 18:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T352010)', diff saved to https://phabricator.wikimedia.org/P56797 and previous config saved to /var/cache/conftool/dbconfig/20240214-185218-ladsgroup.json
  • 18:52 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mw2383.codfw.wmnet on all recursors
  • 18:52 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache mw2383.codfw.wmnet on all recursors
  • 18:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 18:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 18:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T352010)', diff saved to https://phabricator.wikimedia.org/P56796 and previous config saved to /var/cache/conftool/dbconfig/20240214-185207-ladsgroup.json
  • 18:52 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mw2380.codfw.wmnet on all recursors
  • 18:51 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache mw2380.codfw.wmnet on all recursors
  • 18:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db2153 (re)pooling @ 50%: T355864 - Post migration repool of db2153', diff saved to https://phabricator.wikimedia.org/P56795 and previous config saved to /var/cache/conftool/dbconfig/20240214-184931-arnaudb.json
  • 18:48 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:48 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for codfw mw servers - cmooney@cumin1002"
  • 18:47 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for codfw mw servers - cmooney@cumin1002"
  • 18:39 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 18:37 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new master settings - bking@cumin2002 - T355617
  • 18:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P56794 and previous config saved to /var/cache/conftool/dbconfig/20240214-183700-ladsgroup.json
  • 18:34 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2379.codfw.wmnet with reason: host reimage
  • 18:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db2153 (re)pooling @ 25%: T355864 - Post migration repool of db2153', diff saved to https://phabricator.wikimedia.org/P56793 and previous config saved to /var/cache/conftool/dbconfig/20240214-183426-arnaudb.json
  • 18:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: T355864 - Post migration repool of db2104', diff saved to https://phabricator.wikimedia.org/P56792 and previous config saved to /var/cache/conftool/dbconfig/20240214-183421-arnaudb.json
  • 18:31 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2379.codfw.wmnet with reason: host reimage
  • 18:24 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirtlocal1003.eqiad.wmnet
  • 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P56791 and previous config saved to /var/cache/conftool/dbconfig/20240214-182154-ladsgroup.json
  • 18:19 arnaudb@cumin1002: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: T355864 - Post migration repool of db2104', diff saved to https://phabricator.wikimedia.org/P56790 and previous config saved to /var/cache/conftool/dbconfig/20240214-181916-arnaudb.json
  • 18:18 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudvirtlocal1003.eqiad.wmnet
  • 18:14 hnowlan@cumin2002: conftool action : set/weight=10:pooled=yes; selector: name=mw2282.codfw.wmnet
  • 18:12 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host mw2379.codfw.wmnet with OS bullseye
  • 18:11 hnowlan: running `homer 'cr*codfw*' commit 'T351074'` to pick up mw2282's bgp change
  • 18:09 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mw2379.codfw.wmnet on all recursors
  • 18:09 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache mw2379.codfw.wmnet on all recursors
  • 18:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T352010)', diff saved to https://phabricator.wikimedia.org/P56789 and previous config saved to /var/cache/conftool/dbconfig/20240214-180647-ladsgroup.json
  • 18:06 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:06 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for mw2379 - cmooney@cumin1002"
  • 18:05 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for mw2379 - cmooney@cumin1002"
  • 18:05 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirtlocal1002.eqiad.wmnet
  • 18:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: T355864 - Post migration repool of db2104', diff saved to https://phabricator.wikimedia.org/P56788 and previous config saved to /var/cache/conftool/dbconfig/20240214-180411-arnaudb.json
  • 18:03 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new master settings - bking@cumin2002 - T355617
  • 18:02 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 18:01 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new master settings - bking@cumin2002 - T355617
  • 17:59 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2282.codfw.wmnet with reason: Testing if reimage is stable T355333
  • 17:59 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2282.codfw.wmnet with reason: Testing if reimage is stable T355333
  • 17:58 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudvirtlocal1002.eqiad.wmnet
  • 17:56 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2282.codfw.wmnet with OS bullseye
  • 17:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: T355864 - Post migration repool of db2104', diff saved to https://phabricator.wikimedia.org/P56787 and previous config saved to /var/cache/conftool/dbconfig/20240214-174906-arnaudb.json
  • 17:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db2145 (re)pooling @ 100%: T355864 - Post migration repool of db2145', diff saved to https://phabricator.wikimedia.org/P56786 and previous config saved to /var/cache/conftool/dbconfig/20240214-174900-arnaudb.json
  • 17:48 ladsgroup@deploy2002: Finished scap: Backport for Enable echo conditional defaults for loginwiki since 2013 (T357072) (duration: 12m 08s)
  • 17:44 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new master settings - bking@cumin2002 - T355617
  • 17:41 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 17:39 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet
  • 17:39 ladsgroup@deploy2002: ladsgroup: Backport for Enable echo conditional defaults for loginwiki since 2013 (T357072) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:36 ladsgroup@deploy2002: Started scap: Backport for Enable echo conditional defaults for loginwiki since 2013 (T357072)
  • 17:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db2145 (re)pooling @ 75%: T355864 - Post migration repool of db2145', diff saved to https://phabricator.wikimedia.org/P56785 and previous config saved to /var/cache/conftool/dbconfig/20240214-173355-arnaudb.json
  • 17:32 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudvirtlocal1001.eqiad.wmnet
  • 17:32 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2282.codfw.wmnet with reason: host reimage
  • 17:32 fnegri@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet
  • 17:29 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2282.codfw.wmnet with reason: host reimage
  • 17:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db2145 (re)pooling @ 50%: T355864 - Post migration repool of db2145', diff saved to https://phabricator.wikimedia.org/P56784 and previous config saved to /var/cache/conftool/dbconfig/20240214-171850-arnaudb.json
  • 17:13 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2282.codfw.wmnet with OS bullseye
  • 17:10 fabfur: enabled puppet on A:cp-upload to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1003109 selectively (T357479)
  • 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db2145 (re)pooling @ 25%: T355864 - Post migration repool of db2145', diff saved to https://phabricator.wikimedia.org/P56783 and previous config saved to /var/cache/conftool/dbconfig/20240214-170345-arnaudb.json
  • 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: T355864 - Post migration repool of db2121', diff saved to https://phabricator.wikimedia.org/P56782 and previous config saved to /var/cache/conftool/dbconfig/20240214-170339-arnaudb.json
  • 16:56 fabfur: disabled puppet on A:cp-upload to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1003109 selectively (T357479)
  • 16:52 hnowlan@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mw2282.codfw.wmnet with OS bullseye
  • 16:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: T355864 - Post migration repool of db2121', diff saved to https://phabricator.wikimedia.org/P56781 and previous config saved to /var/cache/conftool/dbconfig/20240214-164834-arnaudb.json
  • 16:48 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2282.codfw.wmnet with OS bullseye
  • 16:37 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2005.codfw.wmnet with OS bookworm
  • 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: T355864 - Post migration repool of db2121', diff saved to https://phabricator.wikimedia.org/P56780 and previous config saved to /var/cache/conftool/dbconfig/20240214-163330-arnaudb.json
  • 16:20 hnowlan@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mw2282.codfw.wmnet with OS bullseye
  • 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet
  • 16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: T355864 - Post migration repool of db2121', diff saved to https://phabricator.wikimedia.org/P56779 and previous config saved to /var/cache/conftool/dbconfig/20240214-161824-arnaudb.json
  • 16:17 cgoubert@cumin2002: conftool action : set/pooled=yes; selector: name=(mw2402|mw2403|mw2404|mw2405|mw2407|mw2408|mw2409|mw2401|mw2410|mw2411|parse2001|parse2002|parse2003).*
  • 16:16 claime: Repooling mw2402|mw2403|mw2404|mw2405|mw2407|mw2408|mw2409|mw2401|mw2410|mw2411|parse2001|parse2002|parse2003 for T355864
  • 16:16 claime: Uncordoning kubernetes2019.codfw.wmnet kubernetes2018.codfw.wmnet mw2420.codfw.wmnet mw2421.codfw.wmnet mw2406.codfw.wmnet mw2422.codfw.wmnet mw2423.codfw.wmnet for T355864
  • 16:07 topranks: Moving server uplinks from old switch to new codfw rack A5 T355864
  • 16:07 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on sretest2005.codfw.wmnet with reason: host reimage
  • 16:07 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2005.codfw.wmnet with reason: host reimage
  • 16:07 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 38 hosts with reason: Migrating servers in codfw rack A5 to lsw1-a5-codfw
  • 16:06 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 38 hosts with reason: Migrating servers in codfw rack A5 to lsw1-a5-codfw
  • 16:04 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new master settings - bking@cumin2002 - T355617
  • 15:59 topranks: disable puppet fleet-wide to allow for distruption to puppetmaster/puppetserver during network maint T355864
  • 15:59 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new master settings - bking@cumin2002 - T355617
  • 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new master settings - bking@cumin2002 - T355617
  • 15:54 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2282.codfw.wmnet with OS bullseye
  • 15:53 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 15:53 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2282.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:53 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new master settings - bking@cumin2002 - T355617
  • 15:51 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-a-codfw,cr[1-2]-codfw,lsw1-a5-codfw.mgmt with reason: prepping for server uplink migration codfw rack a5
  • 15:50 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on asw-a-codfw,cr[1-2]-codfw,lsw1-a5-codfw.mgmt with reason: prepping for server uplink migration codfw rack a5
  • 15:47 arnaudb@cumin1002: dbctl commit (dc=all): 'T355864 - Depool db2121 db2132 db2145 db2104 db2153 db2154 db2175 db2176', diff saved to https://phabricator.wikimedia.org/P56778 and previous config saved to /var/cache/conftool/dbconfig/20240214-154753-arnaudb.json
  • 15:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2176.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2176.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2175.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2175.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2154.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2154.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2153.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2153.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2104.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2104.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2145.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2145.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2132.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2132.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2121.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2121.codfw.wmnet with reason: T355864 - Migrate servers in codfw rack A5 from asw-a5-codfw to lsw1-a5-codfw
  • 15:44 hnowlan@cumin2002: START - Cookbook sre.hosts.provision for host mw2282.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:44 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet
  • 15:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host restbase1042.eqiad.wmnet
  • 15:30 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host restbase1042.eqiad.wmnet
  • 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host restbase1041.eqiad.wmnet
  • 15:21 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2380.codfw.wmnet with OS bullseye
  • 15:15 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host restbase1041.eqiad.wmnet
  • 15:14 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2380.codfw.wmnet with OS bullseye
  • 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host restbase1040.eqiad.wmnet
  • 15:11 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2379.codfw.wmnet with OS bullseye
  • 14:56 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 14:52 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2380.codfw.wmnet with OS bullseye
  • 14:52 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2379.codfw.wmnet with OS bullseye
  • 14:52 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2383.codfw.wmnet with OS bullseye
  • 14:51 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host restbase1038.eqiad.wmnet
  • 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host restbase1037.eqiad.wmnet
  • 14:47 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2311.codfw.wmnet with reason: host reimage
  • 14:45 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2005.codfw.wmnet with OS bookworm
  • 14:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T352010)', diff saved to https://phabricator.wikimedia.org/P56777 and previous config saved to /var/cache/conftool/dbconfig/20240214-144537-ladsgroup.json
  • 14:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 14:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 14:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T352010)', diff saved to https://phabricator.wikimedia.org/P56776 and previous config saved to /var/cache/conftool/dbconfig/20240214-144514-ladsgroup.json
  • 14:45 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2383.codfw.wmnet with OS bullseye
  • 14:44 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2379.codfw.wmnet with OS bullseye
  • 14:44 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2335.codfw.wmnet with reason: host reimage
  • 14:44 claime: Restarted rsyslog on A:wikikube-master
  • 14:44 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2380.codfw.wmnet with OS bullseye
  • 14:43 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2380.codfw.wmnet with OS bullseye
  • 14:43 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2379.codfw.wmnet with OS bullseye
  • 14:42 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2383.codfw.wmnet with OS bullseye
  • 14:42 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2311.codfw.wmnet with reason: host reimage
  • 14:41 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2335.codfw.wmnet with reason: host reimage
  • 14:40 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host restbase1037.eqiad.wmnet
  • 14:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host restbase1035.eqiad.wmnet
  • 14:35 cgoubert@cumin2002: conftool action : set/pooled=inactive; selector: name=(mw2402|mw2403|mw2404|mw2405|mw2407|mw2408|mw2409|mw2401|mw2410|mw2411|parse2001|parse2002|parse2003).*
  • 14:34 claime: Depooling mw2402|mw2403|mw2404|mw2405|mw2407|mw2408|mw2409|mw2401|mw2410|mw2411|parse2001|parse2002|parse2003 for T355864
  • 14:33 TheresNoTime: close UTC afternoon backport window
  • 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:31 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host restbase1035.eqiad.wmnet
  • 14:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2005.codfw.wmnet with reason: host reimage
  • 14:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host restbase1034.eqiad.wmnet
  • 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P56774 and previous config saved to /var/cache/conftool/dbconfig/20240214-143006-ladsgroup.json
  • 14:29 samtar@deploy2002: Finished scap: Backport for prod: Stop setting $wgCampaignEventsEnableParticipantQuestions (T347608) (duration: 23m 37s)
  • 14:27 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2005.codfw.wmnet with reason: host reimage
  • 14:26 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2335.codfw.wmnet with OS bullseye
  • 14:26 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2383.codfw.wmnet with OS bullseye
  • 14:26 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2380.codfw.wmnet with OS bullseye
  • 14:26 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2379.codfw.wmnet with OS bullseye
  • 14:25 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2311.codfw.wmnet with OS bullseye
  • 14:22 samtar@deploy2002: samtar and daimona: Continuing with sync
  • 14:15 claime: Draining and cordoning kubernetes2019.codfw.wmnet kubernetes2018.codfw.wmnet mw2420.codfw.wmnet mw2421.codfw.wmnet mw2406.codfw.wmnet mw2422.codfw.wmnet mw2423.codfw.wmnet for T355864
  • 14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P56773 and previous config saved to /var/cache/conftool/dbconfig/20240214-141459-ladsgroup.json
  • 14:14 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 14:10 samtar@deploy2002: samtar and daimona: Backport for prod: Stop setting $wgCampaignEventsEnableParticipantQuestions (T347608) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:09 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host restbase1034.eqiad.wmnet
  • 14:06 samtar@deploy2002: Started scap: Backport for prod: Stop setting $wgCampaignEventsEnableParticipantQuestions (T347608)
  • 14:05 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 14:03 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T352010)', diff saved to https://phabricator.wikimedia.org/P56772 and previous config saved to /var/cache/conftool/dbconfig/20240214-135953-ladsgroup.json
  • 13:59 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host eventlog1003.eqiad.wmnet with OS bullseye
  • 13:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T352010)', diff saved to https://phabricator.wikimedia.org/P56771 and previous config saved to /var/cache/conftool/dbconfig/20240214-135813-ladsgroup.json
  • 13:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 13:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 13:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T352010)', diff saved to https://phabricator.wikimedia.org/P56770 and previous config saved to /var/cache/conftool/dbconfig/20240214-135750-ladsgroup.json
  • 13:52 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T352010)', diff saved to https://phabricator.wikimedia.org/P56769 and previous config saved to /var/cache/conftool/dbconfig/20240214-134959-ladsgroup.json
  • 13:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T352010)', diff saved to https://phabricator.wikimedia.org/P56768 and previous config saved to /var/cache/conftool/dbconfig/20240214-134929-ladsgroup.json
  • 13:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P56767 and previous config saved to /var/cache/conftool/dbconfig/20240214-134244-ladsgroup.json
  • 13:42 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host apifeatureusage2001.codfw.wmnet
  • 13:39 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on eventlog1003.eqiad.wmnet with reason: host reimage
  • 13:36 brouberol@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on eventlog1003.eqiad.wmnet with reason: host reimage
  • 13:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P56766 and previous config saved to /var/cache/conftool/dbconfig/20240214-133422-ladsgroup.json
  • 13:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P56765 and previous config saved to /var/cache/conftool/dbconfig/20240214-132737-ladsgroup.json
  • 13:26 Daimona: T357007 Profiling current master version of CampaignEvents:GenerateInvitationList with excimer in mwmaint2002
  • 13:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host apifeatureusage2001.codfw.wmnet
  • 13:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2005.codfw.wmnet with OS bookworm
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver2003.codfw.wmnet with OS bookworm
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 13:24 brouberol@cumin1002: START - Cookbook sre.hosts.reimage for host eventlog1003.eqiad.wmnet with OS bullseye
  • 13:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P56764 and previous config saved to /var/cache/conftool/dbconfig/20240214-131916-ladsgroup.json
  • 13:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T352010)', diff saved to https://phabricator.wikimedia.org/P56763 and previous config saved to /var/cache/conftool/dbconfig/20240214-131231-ladsgroup.json
  • 13:10 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2005.codfw.wmnet with reason: host reimage
  • 13:07 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2005.codfw.wmnet with reason: host reimage
  • 13:05 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T352010)', diff saved to https://phabricator.wikimedia.org/P56762 and previous config saved to /var/cache/conftool/dbconfig/20240214-130410-ladsgroup.json
  • 13:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T352010)', diff saved to https://phabricator.wikimedia.org/P56761 and previous config saved to /var/cache/conftool/dbconfig/20240214-130157-ladsgroup.json
  • 13:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T352010)', diff saved to https://phabricator.wikimedia.org/P56760 and previous config saved to /var/cache/conftool/dbconfig/20240214-130134-ladsgroup.json
  • 12:52 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 12:49 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be1045
  • 12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P56758 and previous config saved to /var/cache/conftool/dbconfig/20240214-124627-ladsgroup.json
  • 12:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P56757 and previous config saved to /var/cache/conftool/dbconfig/20240214-123120-ladsgroup.json
  • 12:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T352010)', diff saved to https://phabricator.wikimedia.org/P56756 and previous config saved to /var/cache/conftool/dbconfig/20240214-121614-ladsgroup.json
  • 12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T352010)', diff saved to https://phabricator.wikimedia.org/P56755 and previous config saved to /var/cache/conftool/dbconfig/20240214-121401-ladsgroup.json
  • 12:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T352010)', diff saved to https://phabricator.wikimedia.org/P56754 and previous config saved to /var/cache/conftool/dbconfig/20240214-121337-ladsgroup.json
  • 12:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
  • 12:02 logmsgbot: lucaswerkmeister-wmde@deploy2002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 12:02 logmsgbot: lucaswerkmeister-wmde@deploy2002 helmfile [codfw] START helmfile.d/services/termbox: apply
  • 12:00 logmsgbot: lucaswerkmeister-wmde@deploy2002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 11:59 logmsgbot: lucaswerkmeister-wmde@deploy2002 helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 11:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P56753 and previous config saved to /var/cache/conftool/dbconfig/20240214-115831-ladsgroup.json
  • 11:58 logmsgbot: lucaswerkmeister-wmde@deploy2002 helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 11:58 logmsgbot: lucaswerkmeister-wmde@deploy2002 helmfile [staging] START helmfile.d/services/termbox: apply
  • 11:57 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be1045
  • 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver2003.codfw.wmnet with reason: host reimage
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver2003.codfw.wmnet with reason: host reimage
  • 11:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ms-be1045.eqiad.wmnet
  • 11:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P56752 and previous config saved to /var/cache/conftool/dbconfig/20240214-114325-ladsgroup.json
  • 11:40 mvernon@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be1045.eqiad.wmnet
  • 11:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetserver2003.codfw.wmnet with OS bookworm
  • 11:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T352010)', diff saved to https://phabricator.wikimedia.org/P56751 and previous config saved to /var/cache/conftool/dbconfig/20240214-112818-ladsgroup.json
  • 11:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T352010)', diff saved to https://phabricator.wikimedia.org/P56750 and previous config saved to /var/cache/conftool/dbconfig/20240214-112606-ladsgroup.json
  • 11:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T352010)', diff saved to https://phabricator.wikimedia.org/P56749 and previous config saved to /var/cache/conftool/dbconfig/20240214-112543-ladsgroup.json
  • 11:15 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
  • 11:14 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
  • 11:14 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
  • 11:14 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
  • 11:14 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
  • 11:14 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
  • 11:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P56748 and previous config saved to /var/cache/conftool/dbconfig/20240214-111037-ladsgroup.json
  • 11:06 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host puppetserver2003.codfw.wmnet with OS bookworm
  • 10:58 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 10:58 aokoth@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM vrts1002.eqiad.wmnet
  • 10:58 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 10:58 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 10:57 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 10:57 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 10:56 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 10:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P56747 and previous config saved to /var/cache/conftool/dbconfig/20240214-105530-ladsgroup.json
  • 10:52 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2005.codfw.wmnet with OS bookworm
  • 10:48 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 10:48 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 10:48 jelto: import prometheus-etherpad-exporter 0.7 to bookworm-wikimedia on apt hosts - T316421
  • 10:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet
  • 10:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2005.codfw.wmnet
  • 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetserver2003.codfw.wmnet with OS bookworm
  • 10:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T352010)', diff saved to https://phabricator.wikimedia.org/P56746 and previous config saved to /var/cache/conftool/dbconfig/20240214-104024-ladsgroup.json
  • 10:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T352010)', diff saved to https://phabricator.wikimedia.org/P56745 and previous config saved to /var/cache/conftool/dbconfig/20240214-103810-ladsgroup.json
  • 10:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:37 slyngs: Deploying new PKI checks to alertmanager
  • 10:33 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 10:33 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 10:31 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host puppetserver2003.codfw.wmnet with OS bookworm
  • 10:28 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 10:28 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host puppetserver2003.codfw.wmnet with OS bookworm
  • 10:18 godog: powercycle titan1001
  • 10:02 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 09:55 moritzm: installing Linux 5.10.209 on Bullseye hosts
  • 09:49 moritzm: imported openssl11 1.1.1w-0+deb11u1+wmf1 to component/haproxy26 T352744
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
  • 09:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
  • 09:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 09:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 08:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T352010)', diff saved to https://phabricator.wikimedia.org/P56744 and previous config saved to /var/cache/conftool/dbconfig/20240214-084146-ladsgroup.json
  • 08:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 08:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 08:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T352010)', diff saved to https://phabricator.wikimedia.org/P56743 and previous config saved to /var/cache/conftool/dbconfig/20240214-084104-ladsgroup.json
  • 08:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P56742 and previous config saved to /var/cache/conftool/dbconfig/20240214-082558-ladsgroup.json
  • 08:20 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddumps1001.wikimedia.org
  • 08:12 taavi: restart apache2 on lists1001 to remove traces of old, soon-to-expire TLS certificate
  • 08:11 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddumps1001.wikimedia.org
  • 08:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P56741 and previous config saved to /var/cache/conftool/dbconfig/20240214-081051-ladsgroup.json
  • 07:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T352010)', diff saved to https://phabricator.wikimedia.org/P56740 and previous config saved to /var/cache/conftool/dbconfig/20240214-075545-ladsgroup.json
  • 07:51 stran@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 07:50 stran@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 07:50 stran@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 07:49 stran@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 07:48 stran@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 07:48 stran@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 07:48 stran@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 07:47 stran@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 06:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1035.eqiad.wmnet with OS bullseye
  • 06:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 03:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T352010)', diff saved to https://phabricator.wikimedia.org/P56739 and previous config saved to /var/cache/conftool/dbconfig/20240214-031125-ladsgroup.json
  • 03:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 03:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 03:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T352010)', diff saved to https://phabricator.wikimedia.org/P56738 and previous config saved to /var/cache/conftool/dbconfig/20240214-031103-ladsgroup.json
  • 02:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P56737 and previous config saved to /var/cache/conftool/dbconfig/20240214-025557-ladsgroup.json
  • 02:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P56736 and previous config saved to /var/cache/conftool/dbconfig/20240214-024050-ladsgroup.json
  • 02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T352010)', diff saved to https://phabricator.wikimedia.org/P56735 and previous config saved to /var/cache/conftool/dbconfig/20240214-022544-ladsgroup.json
  • 01:44 eileen: civicrm upgraded from 497e0899 to 3ee91f59
  • 00:04 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncmonitor1001.eqiad.wmnet with reason: host reimage
  • 00:01 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncmonitor1001.eqiad.wmnet with reason: host reimage

2024-02-13

  • 23:55 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host ncmonitor1001.eqiad.wmnet with OS bookworm
  • 21:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T352010)', diff saved to https://phabricator.wikimedia.org/P56734 and previous config saved to /var/cache/conftool/dbconfig/20240213-212343-ladsgroup.json
  • 21:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 21:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 21:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T352010)', diff saved to https://phabricator.wikimedia.org/P56733 and previous config saved to /var/cache/conftool/dbconfig/20240213-212321-ladsgroup.json
  • 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P56732 and previous config saved to /var/cache/conftool/dbconfig/20240213-210814-ladsgroup.json
  • 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T352010)', diff saved to https://phabricator.wikimedia.org/P56731 and previous config saved to /var/cache/conftool/dbconfig/20240213-210813-ladsgroup.json
  • 20:59 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
  • 20:59 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
  • 20:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P56730 and previous config saved to /var/cache/conftool/dbconfig/20240213-205308-ladsgroup.json
  • 20:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P56729 and previous config saved to /var/cache/conftool/dbconfig/20240213-205307-ladsgroup.json
  • 20:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P56728 and previous config saved to /var/cache/conftool/dbconfig/20240213-203800-ladsgroup.json
  • 20:23 mutante: phab1004 - running public_task_dump.py T355502
  • 20:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T352010)', diff saved to https://phabricator.wikimedia.org/P56727 and previous config saved to /var/cache/conftool/dbconfig/20240213-202254-ladsgroup.json
  • 20:22 brennen@deploy2002: Finished deploy [phabricator/deployment@f4a7f50]: deploy to phab1004 for T357464 (duration: 00m 48s)
  • 20:21 brennen@deploy2002: Started deploy [phabricator/deployment@f4a7f50]: deploy to phab1004 for T357464
  • 20:20 brennen@deploy2002: Finished deploy [phabricator/deployment@f4a7f50]: test deploy to phab2002 for T357464 (duration: 00m 29s)
  • 20:20 brennen@deploy2002: Started deploy [phabricator/deployment@f4a7f50]: test deploy to phab2002 for T357464
  • 20:08 eileen: civicrm upgraded from ac69725f to 497e0899
  • 19:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T352010)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240213-195724-ladsgroup.json
  • 19:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 19:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 19:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T352010)', diff saved to https://phabricator.wikimedia.org/P56725 and previous config saved to /var/cache/conftool/dbconfig/20240213-195701-ladsgroup.json
  • 19:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P56724 and previous config saved to /var/cache/conftool/dbconfig/20240213-194155-ladsgroup.json
  • 19:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P56723 and previous config saved to /var/cache/conftool/dbconfig/20240213-192648-ladsgroup.json
  • 19:11 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.18 refs T354436
  • 19:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T352010)', diff saved to https://phabricator.wikimedia.org/P56722 and previous config saved to /var/cache/conftool/dbconfig/20240213-191142-ladsgroup.json
  • 19:01 brennen: train 1.42.0-wmf.18 (T354436): no current blockers, rolling to group0.
  • 18:43 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-ui1001.eqiad.wmnet with OS bullseye
  • 18:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T352010)', diff saved to https://phabricator.wikimedia.org/P56721 and previous config saved to /var/cache/conftool/dbconfig/20240213-184159-ladsgroup.json
  • 18:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T352010)', diff saved to https://phabricator.wikimedia.org/P56720 and previous config saved to /var/cache/conftool/dbconfig/20240213-184137-ladsgroup.json
  • 18:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P56718 and previous config saved to /var/cache/conftool/dbconfig/20240213-182630-ladsgroup.json
  • 18:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P56717 and previous config saved to /var/cache/conftool/dbconfig/20240213-181124-ladsgroup.json
  • 18:01 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 18:00 ladsgroup@deploy2002: Finished scap: Backport for ruwiki: Add 'edituserjson' right to 'engineers' group (T355499) (duration: 08m 28s)
  • 17:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T352010)', diff saved to https://phabricator.wikimedia.org/P56716 and previous config saved to /var/cache/conftool/dbconfig/20240213-175617-ladsgroup.json
  • 17:53 ladsgroup@deploy2002: ammarpad and ladsgroup: Continuing with sync
  • 17:53 ladsgroup@deploy2002: ammarpad and ladsgroup: Backport for ruwiki: Add 'edituserjson' right to 'engineers' group (T355499) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:52 ladsgroup@deploy2002: Started scap: Backport for ruwiki: Add 'edituserjson' right to 'engineers' group (T355499)
  • 17:49 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for Use EditEntity for ItemMergeInteractor (T356149 T356764), Use EditEntity for MergeLexemesInteractor (T356149 T356764), Use EditEntity for ItemMergeInteractor (T356149 T356764), Use EditEntity for MergeLexemesInteractor (T356149 T356764) (duration: 10m 11s)
  • 17:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1035.eqiad.wmnet with reason: host reimage
  • 17:43 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-ui1001.eqiad.wmnet with reason: host reimage
  • 17:42 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Continuing with sync
  • 17:41 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1035.eqiad.wmnet with reason: host reimage
  • 17:41 brouberol@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-ui1001.eqiad.wmnet with reason: host reimage
  • 17:40 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Backport for Use EditEntity for ItemMergeInteractor (T356149 T356764), Use EditEntity for MergeLexemesInteractor (T356149 T356764), Use EditEntity for ItemMergeInteractor (T356149 T356764), Use EditEntity for MergeLexemesInteractor (T356149 T356764) synced to the testservers (https://wik
  • 17:39 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for Use EditEntity for ItemMergeInteractor (T356149 T356764), Use EditEntity for MergeLexemesInteractor (T356149 T356764), Use EditEntity for ItemMergeInteractor (T356149 T356764), Use EditEntity for MergeLexemesInteractor (T356149 T356764)
  • 17:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp[2027-2028].codfw.wmnet
  • 17:37 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp[2027-2028].codfw.wmnet
  • 17:36 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,name=cp202(7|8).codfw.wmnet
  • 17:29 brouberol@cumin1002: START - Cookbook sre.hosts.reimage for host an-test-ui1001.eqiad.wmnet with OS bullseye
  • 17:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1035.eqiad.wmnet with OS bullseye
  • 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T352010)', diff saved to https://phabricator.wikimedia.org/P56715 and previous config saved to /var/cache/conftool/dbconfig/20240213-172620-ladsgroup.json
  • 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 17:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 17:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T352010)', diff saved to https://phabricator.wikimedia.org/P56714 and previous config saved to /var/cache/conftool/dbconfig/20240213-172542-ladsgroup.json
  • 17:25 brennen@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.18 refs T354436 (duration: 24m 39s)
  • 17:23 sukhe: running authdns-update to lower dyna TTLs: T140365
  • 17:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P56713 and previous config saved to /var/cache/conftool/dbconfig/20240213-171034-ladsgroup.json
  • 17:04 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2005.codfw.wmnet with OS bookworm
  • 17:00 brennen@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.18 refs T354436
  • 16:55 volans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest1001.eqiad.wmnet with reason: training
  • 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P56712 and previous config saved to /var/cache/conftool/dbconfig/20240213-165527-ladsgroup.json
  • 16:55 volans@cumin1002: START - Cookbook sre.hosts.downtime for 0:05:00 on sretest1001.eqiad.wmnet with reason: training
  • 16:49 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncmonitor1001.eqiad.wmnet with OS bookworm
  • 16:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1146.eqiad.wmnet
  • 16:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1146.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 16:41 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1146.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T352010)', diff saved to https://phabricator.wikimedia.org/P56709 and previous config saved to /var/cache/conftool/dbconfig/20240213-164021-ladsgroup.json
  • 16:39 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 16:36 sukhe: running authdns-update for CR 1003017: T346394
  • 16:34 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1146.eqiad.wmnet
  • 16:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1145.eqiad.wmnet
  • 16:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1145.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 16:32 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1145.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 16:30 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 16:24 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1145.eqiad.wmnet
  • 16:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1144.eqiad.wmnet
  • 16:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1144.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 16:23 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1144.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 16:21 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 16:14 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1144.eqiad.wmnet
  • 16:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1140.eqiad.wmnet
  • 16:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1140.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 16:12 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1140.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 16:11 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 16:10 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 16:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T352010)', diff saved to https://phabricator.wikimedia.org/P56707 and previous config saved to /var/cache/conftool/dbconfig/20240213-160826-ladsgroup.json
  • 16:08 topranks: moving codfw rack a4 server links T355863
  • 16:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 16:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 16:05 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 23 hosts with reason: Migrating servers in codfw rack A4 to lsw1-a4-codfw
  • 16:05 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 23 hosts with reason: Migrating servers in codfw rack A4 to lsw1-a4-codfw
  • 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1140.eqiad.wmnet
  • 16:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host ncmonitor1001.eqiad.wmnet with OS bookworm
  • 16:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 23 hosts
  • 16:04 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for 23 hosts
  • 16:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for asw-a-codfw,cr[1-2]-codfw,lsw1-a4-codfw.mgmt
  • 16:04 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for asw-a-codfw,cr[1-2]-codfw,lsw1-a4-codfw.mgmt
  • 16:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1139.eqiad.wmnet
  • 16:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1139.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 16:02 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1139.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 15:59 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 15:57 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2005.codfw.wmnet with OS bookworm
  • 15:56 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 15:55 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/termbox: apply
  • 15:54 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1139.eqiad.wmnet
  • 15:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1135.eqiad.wmnet
  • 15:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1135.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 15:53 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 15:52 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1135.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 15:52 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 15:50 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 15:50 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 15:50 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 15:44 topranks: moving netbox links and pre-configuring lsw1-a4-codfw for servers before network move T355863
  • 15:44 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1135.eqiad.wmnet
  • 15:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1133.eqiad.wmnet
  • 15:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1133.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 15:42 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1133.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 15:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T352010)', diff saved to https://phabricator.wikimedia.org/P56704 and previous config saved to /var/cache/conftool/dbconfig/20240213-154100-ladsgroup.json
  • 15:40 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ncmonitor1001.eqiad.wmnet with OS bookworm
  • 15:39 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 15:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T352010)', diff saved to https://phabricator.wikimedia.org/P56703 and previous config saved to /var/cache/conftool/dbconfig/20240213-153720-ladsgroup.json
  • 15:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 15:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 15:34 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1133.eqiad.wmnet
  • 15:32 cgoubert@deploy2002: Finished scap: mw-on-k8s: Raise the number of canary replicas - T357402 (duration: 02m 58s)
  • 15:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[2061-2062,2089].codfw.wmnet with reason: T355863
  • 15:30 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[2061-2062,2089].codfw.wmnet with reason: T355863
  • 15:29 cgoubert@deploy2002: Started scap: mw-on-k8s: Raise the number of canary replicas - T357402
  • 15:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1149.eqiad.wmnet
  • 15:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1149.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 15:26 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncmonitor1001.eqiad.wmnet with OS bookworm
  • 15:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P56702 and previous config saved to /var/cache/conftool/dbconfig/20240213-152554-ladsgroup.json
  • 15:21 hnowlan@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(mw1431.eqiad.wmnet|mw1430.eqiad.wmnet|mw1434.eqiad.wmnet|mw1453.eqiad.wmnet|mw1385.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 15:20 stevemunene@deploy2002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 15:20 arnaudb@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1149.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - arnaudb@cumin1002"
  • 15:20 stevemunene@deploy2002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 15:19 stevemunene@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 15:19 stevemunene@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 15:18 stevemunene@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 15:17 stevemunene@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 15:16 stevemunene@deploy2002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 15:16 arnaudb@cumin1002: START - Cookbook sre.dns.netbox
  • 15:15 stevemunene@deploy2002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 15:15 stevemunene@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 15:14 stevemunene@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 15:14 hnowlan: running `homer 'cr*eqiad*' commit 'T351074'
  • 15:14 stevemunene@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 15:13 stevemunene@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 15:11 arnaudb@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1149.eqiad.wmnet
  • 15:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P56701 and previous config saved to /var/cache/conftool/dbconfig/20240213-151047-ladsgroup.json
  • 14:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T352010)', diff saved to https://phabricator.wikimedia.org/P56700 and previous config saved to /var/cache/conftool/dbconfig/20240213-145541-ladsgroup.json
  • 14:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp[2027-2028].codfw.wmnet with reason: T355863
  • 14:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp[2027-2028].codfw.wmnet with reason: T355863
  • 14:46 brett@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,name=cp202(7|8).codfw.wmnet
  • 14:36 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sessionstore[2001-2003].codfw.wmnet
  • 14:36 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:36 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sessionstore[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
  • 14:35 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sessionstore[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
  • 14:33 moritzm: imported openssl 1.1.1w-0+deb11u1+wmf1 to component/haproxy26 T352744
  • 14:30 eevans@cumin1002: START - Cookbook sre.dns.netbox
  • 14:26 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2061*,elastic2062*,elastic2089* for switch maintenance - bking@cumin2002 - T355863
  • 14:26 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2061*,elastic2062*,elastic2089* for switch maintenance - bking@cumin2002 - T355863
  • 14:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2109 (T352010)', diff saved to https://phabricator.wikimedia.org/P56699 and previous config saved to /var/cache/conftool/dbconfig/20240213-142250-ladsgroup.json
  • 14:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T352010)', diff saved to https://phabricator.wikimedia.org/P56698 and previous config saved to /var/cache/conftool/dbconfig/20240213-142228-ladsgroup.json
  • 14:20 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts sessionstore[2001-2003].codfw.wmnet
  • 14:18 godog: bounce puppetserver on puppetserver1003 to test noop config change - T352640
  • 14:11 jelto: import etherpad-lite 1.9.7-2 on apt host into bookworm-wikimedia - T316421
  • 14:08 effie: restarting envoy on baremetal mediawiki api servers
  • 14:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P56697 and previous config saved to /var/cache/conftool/dbconfig/20240213-140722-ladsgroup.json
  • 13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P56696 and previous config saved to /var/cache/conftool/dbconfig/20240213-135215-ladsgroup.json
  • 13:48 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1005.eqiad.wmnet
  • 13:45 hashar@deploy2002: Finished deploy [gerrit/gerrit@737c475]: wm-checks-api: Gerrit 3.8 no more sets redundant real_author (duration: 00m 07s)
  • 13:45 hashar@deploy2002: Started deploy [gerrit/gerrit@737c475]: wm-checks-api: Gerrit 3.8 no more sets redundant real_author
  • 13:42 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2027.codfw.wmnet
  • 13:40 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2027.codfw.wmnet
  • 13:40 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.eqiad.wmnet
  • 13:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T352010)', diff saved to https://phabricator.wikimedia.org/P56695 and previous config saved to /var/cache/conftool/dbconfig/20240213-133709-ladsgroup.json
  • 13:33 hashar@deploy2002: Finished deploy [gerrit/gerrit@7dd9a27]: Support Gerrit 3.8 CSS styling API - T354886 (duration: 00m 07s)
  • 13:33 hashar@deploy2002: Started deploy [gerrit/gerrit@7dd9a27]: Support Gerrit 3.8 CSS styling API - T354886
  • 13:31 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1430.eqiad.wmnet with OS bullseye
  • 13:28 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1453.eqiad.wmnet with OS bullseye
  • 13:28 hashar@deploy2002: Finished deploy [gerrit/gerrit@b02c97e]: Let Gerrit manage light/dark theme (duration: 00m 07s)
  • 13:28 hashar@deploy2002: Started deploy [gerrit/gerrit@b02c97e]: Let Gerrit manage light/dark theme
  • 13:26 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1434.eqiad.wmnet with OS bullseye
  • 13:24 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1385.eqiad.wmnet with OS bullseye
  • 13:22 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.eqiad.wmnet
  • 13:21 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1431.eqiad.wmnet with OS bullseye
  • 13:13 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1430.eqiad.wmnet with reason: host reimage
  • 13:12 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.eqiad.wmnet
  • 13:11 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 13:11 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 13:11 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 13:11 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 13:10 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 13:10 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1453.eqiad.wmnet with reason: host reimage
  • 13:10 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 13:08 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: host reimage
  • 13:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1385.eqiad.wmnet with reason: host reimage
  • 13:04 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1006.eqiad.wmnet
  • 13:03 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1431.eqiad.wmnet with reason: host reimage
  • 13:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2105 (T352010)', diff saved to https://phabricator.wikimedia.org/P56694 and previous config saved to /var/cache/conftool/dbconfig/20240213-130316-ladsgroup.json
  • 13:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T352010)', diff saved to https://phabricator.wikimedia.org/P56693 and previous config saved to /var/cache/conftool/dbconfig/20240213-130255-ladsgroup.json
  • 13:02 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1385.eqiad.wmnet with reason: host reimage
  • 13:01 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1453.eqiad.wmnet with reason: host reimage
  • 13:01 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1430.eqiad.wmnet with reason: host reimage
  • 13:01 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: host reimage
  • 13:00 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1431.eqiad.wmnet with reason: host reimage
  • 12:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1035.eqiad.wmnet with reason: host reimage
  • 12:56 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.eqiad.wmnet
  • 12:56 zabe: zabe@mwmaint2002:/tmp/uploads$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user="Jeff G." . # T357403
  • 12:54 effie: restarting envoy on baremetal mediawiki appservers
  • 12:54 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1035.eqiad.wmnet with reason: host reimage
  • 12:48 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1385.eqiad.wmnet with OS bullseye
  • 12:48 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1453.eqiad.wmnet with OS bullseye
  • 12:48 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1434.eqiad.wmnet with OS bullseye
  • 12:48 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1430.eqiad.wmnet with OS bullseye
  • 12:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P56692 and previous config saved to /var/cache/conftool/dbconfig/20240213-124748-ladsgroup.json
  • 12:47 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host mw1431.eqiad.wmnet with OS bullseye
  • 12:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1035.eqiad.wmnet with OS bullseye
  • 12:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P56691 and previous config saved to /var/cache/conftool/dbconfig/20240213-123242-ladsgroup.json
  • 12:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T352010)', diff saved to https://phabricator.wikimedia.org/P56690 and previous config saved to /var/cache/conftool/dbconfig/20240213-121736-ladsgroup.json
  • 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T352010)', diff saved to https://phabricator.wikimedia.org/P56689 and previous config saved to /var/cache/conftool/dbconfig/20240213-120035-ladsgroup.json
  • 12:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 11:50 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apifeatureusage2001.codfw.wmnet with OS bullseye
  • 11:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: T350458
  • 11:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: T350458
  • 11:37 cgoubert@deploy2002: Finished scap: Change default maxUnavailable for mw-on-k8s to 10% (duration: 03m 17s)
  • 11:34 cgoubert@deploy2002: Started scap: Change default maxUnavailable for mw-on-k8s to 10%
  • 11:34 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apifeatureusage2001.codfw.wmnet with reason: host reimage
  • 11:33 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:32 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:31 brouberol@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on apifeatureusage2001.codfw.wmnet with reason: host reimage
  • 11:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:27 hnowlan@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mw2282.codfw.wmnet with OS bullseye
  • 11:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 11:24 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 11:24 claime: Change default maxUnavailable for mw-on-k8s to 10%
  • 11:21 brouberol@cumin1002: START - Cookbook sre.hosts.reimage for host apifeatureusage2001.codfw.wmnet with OS bullseye
  • 11:20 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apifeatureusage1001.eqiad.wmnet with OS bullseye
  • 11:14 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 11:14 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 11:14 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2282.codfw.wmnet with OS bullseye
  • 11:13 hnowlan@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mw2282.codfw.wmnet with OS bullseye
  • 11:12 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 11:11 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 11:10 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 11:10 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 11:04 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apifeatureusage1001.eqiad.wmnet with reason: host reimage
  • 11:01 brouberol@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on apifeatureusage1001.eqiad.wmnet with reason: host reimage
  • 11:01 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2388.codfw.wmnet
  • 11:01 cgoubert@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2388.codfw.wmnet
  • 10:57 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2282.codfw.wmnet with OS bullseye
  • 10:49 brouberol@cumin1002: START - Cookbook sre.hosts.reimage for host apifeatureusage1001.eqiad.wmnet with OS bullseye
  • 10:41 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apifeatureusage1001.eqiad.wmnet with OS bookworm
  • 10:39 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apifeatureusage1001.eqiad.wmnet with reason: host reimage
  • 10:36 brouberol@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on apifeatureusage1001.eqiad.wmnet with reason: host reimage
  • 10:25 brouberol@cumin1002: START - Cookbook sre.hosts.reimage for host apifeatureusage1001.eqiad.wmnet with OS bookworm
  • 10:23 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apifeatureusage1001.eqiad.wmnet with OS bookworm
  • 10:23 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 10:23 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 10:22 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 10:22 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 10:22 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 10:22 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 10:09 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apifeatureusage1001.eqiad.wmnet with reason: host reimage
  • 10:06 brouberol@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on apifeatureusage1001.eqiad.wmnet with reason: host reimage
  • 10:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb2002-dev.codfw.wmnet
  • 09:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host clouddb2002-dev.codfw.wmnet
  • 09:57 brouberol@cumin1002: START - Cookbook sre.hosts.reimage for host apifeatureusage1001.eqiad.wmnet with OS bookworm
  • 09:23 stran@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 09:22 stran@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 09:22 stran@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 09:22 akosiaris: delete sessionstore pod to force rescheduling
  • 09:21 stran@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 09:20 stran@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 09:20 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apifeatureusage1001.eqiad.wmnet with OS bookworm
  • 09:20 stran@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 09:18 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apifeatureusage1001.eqiad.wmnet with reason: host reimage
  • 09:16 brouberol@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on apifeatureusage1001.eqiad.wmnet with reason: host reimage
  • 09:04 brouberol@cumin1002: START - Cookbook sre.hosts.reimage for host apifeatureusage1001.eqiad.wmnet with OS bookworm
  • 08:28 hashar@deploy2002: Finished scap: Backport for Increase $wgMaxUploadSize to 5 GiB (previously was 4GiB). (T191804) (duration: 08m 57s)
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: grafana
  • 08:21 hashar@deploy2002: hashar and bawolff: Continuing with sync
  • 08:21 hashar@deploy2002: hashar and bawolff: Backport for Increase $wgMaxUploadSize to 5 GiB (previously was 4GiB). (T191804) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:19 hashar@deploy2002: Started scap: Backport for Increase $wgMaxUploadSize to 5 GiB (previously was 4GiB). (T191804)
  • 08:18 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: grafana
  • 07:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 07:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 04:57 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.18 refs T354436 (duration: 52m 36s)
  • 04:04 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.18 refs T354436
  • 04:02 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.15 (duration: 02m 09s)
  • 02:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 02:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 01:12 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 01:02 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 00:27 eileen: civicrm upgraded from 684286b4 to ac69725f

2024-02-12

  • 23:51 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ncmonitor1001.eqiad.wmnet with OS bookworm
  • 23:25 zabe: zabe@mwmaint2002:/tmp/uploads$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user="Yann" . # T357208
  • 23:16 Daimona: T357007 Running mwscript CampaignEvents:GenerateInvitationList --wiki=metawiki --listfile=/home/daimona/list2.txt
  • 23:15 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncmonitor1001.eqiad.wmnet with OS bookworm
  • 23:03 dzahn@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host etherpad2002.codfw.wmnet
  • 23:03 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host etherpad2002.codfw.wmnet with OS bookworm
  • 22:49 ebernhardson@deploy2002: Finished scap: Backport for Connection: Correct read-only detection (T354793 T356526) (duration: 08m 35s)
  • 22:42 ebernhardson@deploy2002: ebernhardson: Continuing with sync
  • 22:42 ebernhardson@deploy2002: ebernhardson: Backport for Connection: Correct read-only detection (T354793 T356526) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:40 ebernhardson@deploy2002: Started scap: Backport for Connection: Correct read-only detection (T354793 T356526)
  • 22:39 maryum: deployed patch for T357101
  • 22:30 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on sessionstore2003.codfw.wmnet with reason: Decommissioning — T356828
  • 22:30 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on sessionstore2003.codfw.wmnet with reason: Decommissioning — T356828
  • 22:30 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on sessionstore2002.codfw.wmnet with reason: Decommissioning — T356828
  • 22:30 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on sessionstore2002.codfw.wmnet with reason: Decommissioning — T356828
  • 22:29 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on sessionstore2001.codfw.wmnet with reason: Decommissioning — T356828
  • 22:29 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on sessionstore2001.codfw.wmnet with reason: Decommissioning — T356828
  • 22:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 22:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 22:20 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncmonitor1001.eqiad.wmnet
  • 22:20 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncmonitor1001.eqiad.wmnet with OS bookworm
  • 22:16 cjming@deploy2002: Finished scap: Backport for MobileFrontend: Set fallback editor to 'visual' on labs (duration: 07m 53s)
  • 22:15 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:15 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:11 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:11 ebernhardson@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:11 ebernhardson@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:10 cjming@deploy2002: esanders and cjming: Continuing with sync
  • 22:10 cjming@deploy2002: esanders and cjming: Backport for MobileFrontend: Set fallback editor to 'visual' on labs synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:08 cjming@deploy2002: Started scap: Backport for MobileFrontend: Set fallback editor to 'visual' on labs
  • 22:07 cjming@deploy2002: Finished scap: Backport for Make thanks button show again (T357202), Diffs: Localize number in timeago (T357079) (duration: 09m 17s)
  • 22:05 ebernhardson@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:04 ebernhardson@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:00 cjming@deploy2002: cjming and jdlrobson: Continuing with sync
  • 21:59 cjming@deploy2002: cjming and jdlrobson: Backport for Make thanks button show again (T357202), Diffs: Localize number in timeago (T357079) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:58 cjming@deploy2002: Started scap: Backport for Make thanks button show again (T357202), Diffs: Localize number in timeago (T357079)
  • 21:38 cjming@deploy2002: Finished scap: Backport for Use @wikimedia/mediawiki.skins.clientpreferences@1.1.1 (T357212) (duration: 12m 58s)
  • 21:31 cjming@deploy2002: cjming and jdlrobson: Continuing with sync
  • 21:26 cjming@deploy2002: cjming and jdlrobson: Backport for Use @wikimedia/mediawiki.skins.clientpreferences@1.1.1 (T357212) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:25 cjming@deploy2002: Started scap: Backport for Use @wikimedia/mediawiki.skins.clientpreferences@1.1.1 (T357212)
  • 20:30 eevans@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 20:28 eevans@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 20:27 eevans@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 20:27 eevans@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 20:26 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 20:26 eevans@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 19:48 ejegg: fundraising python tools upgraded from c823e692 to 2d164db5
  • 19:28 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on etherpad2002.codfw.wmnet with reason: host reimage
  • 19:25 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on etherpad2002.codfw.wmnet with reason: host reimage
  • 19:08 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host etherpad2002.codfw.wmnet with OS bookworm
  • 19:07 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM etherpad2002.codfw.wmnet - dzahn@cumin1002"
  • 19:06 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM etherpad2002.codfw.wmnet - dzahn@cumin1002"
  • 19:06 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) etherpad2002.codfw.wmnet on all recursors
  • 19:05 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache etherpad2002.codfw.wmnet on all recursors
  • 19:05 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:05 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM etherpad2002.codfw.wmnet - dzahn@cumin1002"
  • 19:04 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM etherpad2002.codfw.wmnet - dzahn@cumin1002"
  • 19:02 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 19:02 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host etherpad2002.codfw.wmnet
  • 18:58 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync - dzahn@cumin1002"
  • 18:57 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync - dzahn@cumin1002"
  • 18:56 dzahn@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host etherpad2002.codfw.wmnet
  • 18:56 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) etherpad2002.codfw.wmnet on all recursors
  • 18:56 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache etherpad2002.codfw.wmnet on all recursors
  • 18:56 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:56 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM etherpad2002.codfw.wmnet - dzahn@cumin1002"
  • 18:55 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM etherpad2002.codfw.wmnet - dzahn@cumin1002"
  • 18:55 mutante: attempt to create a completely new VM with a new name ALSO FAILS and removes DNS entries
  • 18:53 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 18:53 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) etherpad2002.codfw.wmnet on all recursors
  • 18:53 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache etherpad2002.codfw.wmnet on all recursors
  • 18:53 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:53 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM etherpad2002.codfw.wmnet - dzahn@cumin1002"
  • 18:52 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM etherpad2002.codfw.wmnet - dzahn@cumin1002"
  • 18:48 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 18:48 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host etherpad2002.codfw.wmnet
  • 18:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 13:00:00 on db1133.eqiad.wmnet with reason: hush
  • 18:42 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 13:00:00 on db1133.eqiad.wmnet with reason: hush
  • 18:37 dzahn@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host etherpad2001.codfw.wmnet
  • 18:37 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) etherpad2001.codfw.wmnet on all recursors
  • 18:37 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache etherpad2001.codfw.wmnet on all recursors
  • 18:37 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:37 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 18:36 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 18:35 mutante: attempting decom cookbook on "unverified" host etherpad2001, followed by makevm cookbook to create it again to get out of the cycle of adding and removing DNS records - fails with "is already in the cluster" even after decom finished T357159
  • 18:34 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 18:34 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) etherpad2001.codfw.wmnet on all recursors
  • 18:34 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache etherpad2001.codfw.wmnet on all recursors
  • 18:34 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:33 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 18:33 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host etherpad2001.codfw.wmnet
  • 18:29 mutante: makevm cookbook creates and then removes DNS records, sync-netbox-hiera cookbook fails with raise NetboxError(f"Server {self._server.name} does not have any primary IP with a DNS name set.")
  • 18:29 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync - dzahn@cumin1002"
  • 18:28 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync - dzahn@cumin1002"
  • 18:25 dzahn@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host etherpad2001.codfw.wmnet
  • 18:25 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) etherpad2001.codfw.wmnet on all recursors
  • 18:25 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache etherpad2001.codfw.wmnet on all recursors
  • 18:25 dzahn@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:24 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 18:23 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) etherpad2001.codfw.wmnet on all recursors
  • 18:23 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache etherpad2001.codfw.wmnet on all recursors
  • 18:23 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:23 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 18:22 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 18:20 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 18:20 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host etherpad2001.codfw.wmnet
  • 18:19 dzahn@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host etherpad2001.codfw.wmnet
  • 18:19 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) etherpad2001.codfw.wmnet on all recursors
  • 18:19 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache etherpad2001.codfw.wmnet on all recursors
  • 18:18 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:18 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 18:18 mutante: makevm cookbook in a cycle of adding and then removing DNS records
  • 18:17 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 18:16 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw2388.codfw.wmnet with reason: Envoy config changed for ipoid
  • 18:16 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw2388.codfw.wmnet with reason: Envoy config changed for ipoid
  • 18:16 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 18:15 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) etherpad2001.codfw.wmnet on all recursors
  • 18:15 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache etherpad2001.codfw.wmnet on all recursors
  • 18:15 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:15 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 18:14 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 18:12 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 18:12 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host etherpad2001.codfw.wmnet
  • 18:11 mutante: spicerack.netbox.NetboxError: Server etherpad2001 does not have any primary IP with a DNS name set.
  • 18:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 18:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 18:07 dzahn@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host etherpad2001.codfw.wmnet
  • 18:07 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) etherpad2001.codfw.wmnet on all recursors
  • 18:07 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 18:07 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache etherpad2001.codfw.wmnet on all recursors
  • 18:06 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:06 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 18:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 18:05 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 18:03 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 18:03 dzahn@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 17:59 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 17:58 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) etherpad2001.codfw.wmnet on all recursors
  • 17:58 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache etherpad2001.codfw.wmnet on all recursors
  • 17:58 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:57 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 17:55 dzahn@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:55 dzahn@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 17:46 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM etherpad2001.codfw.wmnet - dzahn@cumin1002"
  • 17:42 hnowlan@deploy2002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 17:42 hnowlan@deploy2002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 17:41 hnowlan@deploy2002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 17:41 hnowlan@deploy2002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 17:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 17:35 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 17:35 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host etherpad2001.codfw.wmnet
  • 17:17 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:15 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:15 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:13 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:07 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: testing db2194 done', diff saved to https://phabricator.wikimedia.org/P56686 and previous config saved to /var/cache/conftool/dbconfig/20240212-170423-arnaudb.json
  • 16:56 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:54 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 00s)
  • 16:49 arnaudb@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: testing db2194 done', diff saved to https://phabricator.wikimedia.org/P56685 and previous config saved to /var/cache/conftool/dbconfig/20240212-164918-arnaudb.json
  • 16:48 jgiannelos@deploy2002: Finished deploy [restbase/deploy@228b93d]: (no justification provided) (duration: 16m 16s)
  • 16:47 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 07s)
  • 16:34 arnaudb@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: testing db2194 done', diff saved to https://phabricator.wikimedia.org/P56684 and previous config saved to /var/cache/conftool/dbconfig/20240212-163413-arnaudb.json
  • 16:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Removing instances as per T350458', diff saved to https://phabricator.wikimedia.org/P56683 and previous config saved to /var/cache/conftool/dbconfig/20240212-163407-arnaudb.json
  • 16:32 jgiannelos@deploy2002: Started deploy [restbase/deploy@228b93d]: (no justification provided)
  • 16:22 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for Disable JSON Dump tests to prepare for schema change in Wikibase (T305660), Return stdClass/Object from Serializers for empty lists (T305660), Change expected serialization format of JSON dumps to include arrays (T305660) (duration: 09m 42s)
  • 16:16 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Continuing with sync
  • 16:14 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde: Backport for Disable JSON Dump tests to prepare for schema change in Wikibase (T305660), Return stdClass/Object from Serializers for empty lists (T305660), Change expected serialization format of JSON dumps to include arrays (T305660) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:13 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for Disable JSON Dump tests to prepare for schema change in Wikibase (T305660), Return stdClass/Object from Serializers for empty lists (T305660), Change expected serialization format of JSON dumps to include arrays (T305660)
  • 15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 40%: testing db2194 done', diff saved to https://phabricator.wikimedia.org/P56682 and previous config saved to /var/cache/conftool/dbconfig/20240212-155325-arnaudb.json
  • 15:51 eevans@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 15:50 eevans@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 15:48 eevans@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 15:48 eevans@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 15:46 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 15:46 eevans@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 15:42 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 15:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 30%: testing db2194 done', diff saved to https://phabricator.wikimedia.org/P56681 and previous config saved to /var/cache/conftool/dbconfig/20240212-153820-arnaudb.json
  • 15:36 denisse: Failover Back to grafana1002 - T352665
  • 15:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2194.codfw.wmnet with OS bookworm
  • 15:34 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host grafana1002.eqiad.wmnet with OS bookworm
  • 15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 20%: testing db2194 done', diff saved to https://phabricator.wikimedia.org/P56680 and previous config saved to /var/cache/conftool/dbconfig/20240212-152315-arnaudb.json
  • 15:19 denisse@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on grafana1002.eqiad.wmnet with reason: host reimage
  • 15:16 denisse@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on grafana1002.eqiad.wmnet with reason: host reimage
  • 15:14 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 15:11 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: testing db2194 done', diff saved to https://phabricator.wikimedia.org/P56679 and previous config saved to /var/cache/conftool/dbconfig/20240212-150810-arnaudb.json
  • 15:08 denisse@cumin2002: START - Cookbook sre.hosts.reimage for host grafana1002.eqiad.wmnet with OS bookworm
  • 15:07 denisse: Reimage Standby Host (grafana1002) - T352665
  • 15:06 ejegg: re-enabled thank you mailer and donations queue consumer
  • 15:03 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.codfw.wmnet
  • 14:56 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.codfw.wmnet
  • 14:56 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2004-dev.codfw.wmnet
  • 14:51 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2194.codfw.wmnet with OS bookworm
  • 14:49 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.codfw.wmnet
  • 14:47 denisse: Completed failover to grafana2001 - T352665
  • 14:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db2180.codfw.wmnet onto db2194.codfw.wmnet
  • 14:42 ejegg: fundraising civicrm upgraded from c66b04bd to 684286b4
  • 14:41 ejegg: disabled thank you mailer and donations queue consumer
  • 14:36 denisse: starting Upgrade Grafana hosts to Bookworm - T352665
  • 14:33 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:32 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for uzwiki: remove temporary logo files (T353723) (duration: 09m 53s)
  • 14:27 moritzm: installing Linux 6.1.76 on Bookworm hosts
  • 14:26 logmsgbot: lucaswerkmeister-wmde@deploy2002 anzx and lucaswerkmeister-wmde: Continuing with sync
  • 14:24 logmsgbot: lucaswerkmeister-wmde@deploy2002 anzx and lucaswerkmeister-wmde: Backport for uzwiki: remove temporary logo files (T353723) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:23 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for uzwiki: remove temporary logo files (T353723)
  • 14:20 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for Set $wgMinervaEnableSiteNotice for arwikisource (T356460) (duration: 09m 05s)
  • 14:13 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and hubaishan: Continuing with sync
  • 14:12 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and hubaishan: Backport for Set $wgMinervaEnableSiteNotice for arwikisource (T356460) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:10 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for Set $wgMinervaEnableSiteNotice for arwikisource (T356460)
  • 14:02 ladsgroup@deploy2002: Finished scap: Backport for Stop writing to old pagelinks schema in s4 (T352010) (duration: 23m 12s)
  • 13:57 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol2005-dev.codfw.wmnet
  • 13:55 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 13:49 ladsgroup@deploy2002: ladsgroup: Backport for Stop writing to old pagelinks schema in s4 (T352010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:44 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2005-dev.codfw.wmnet
  • 13:44 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol2004-dev.codfw.wmnet
  • 13:39 ladsgroup@deploy2002: Started scap: Backport for Stop writing to old pagelinks schema in s4 (T352010)
  • 13:35 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2180.codfw.wmnet onto db2194.codfw.wmnet
  • 13:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db2180.codfw.wmnet onto db2194.codfw.wmnet
  • 13:34 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2180.codfw.wmnet onto db2194.codfw.wmnet
  • 13:32 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) Will create a clone of db2180.codfw.wmnet onto db2194.codfw.wmnet
  • 13:31 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2180.codfw.wmnet onto db2194.codfw.wmnet
  • 13:27 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol2001-dev.codfw.wmnet
  • 13:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db2180.codfw.wmnet onto db2194.codfw.wmnet
  • 13:21 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2180.codfw.wmnet onto db2194.codfw.wmnet
  • 12:55 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2001-dev.codfw.wmnet
  • 12:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1002-dev.eqiad.wmnet
  • 12:48 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudbackup1002-dev.eqiad.wmnet
  • 12:47 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1001-dev.eqiad.wmnet
  • 12:43 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudbackup1001-dev.eqiad.wmnet
  • 12:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 12:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 12:19 brouberol@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for stat1005.eqiad.wmnet: Renew puppet certificate - brouberol@cumin1002
  • 12:17 brouberol@cumin1002: START - Cookbook sre.puppet.renew-cert for stat1005.eqiad.wmnet: Renew puppet certificate - brouberol@cumin1002
  • 12:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1040.eqiad.wmnet with OS bullseye
  • 12:14 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:14 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1034.eqiad.wmnet with OS bullseye
  • 12:14 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1037.eqiad.wmnet with OS bullseye
  • 12:14 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:14 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1038.eqiad.wmnet with OS bullseye
  • 12:13 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1039.eqiad.wmnet with OS bullseye
  • 12:13 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1041.eqiad.wmnet with OS bullseye
  • 12:13 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1042.eqiad.wmnet with OS bullseye
  • 12:13 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36236
  • 12:02 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 36236
  • 12:01 taavi: taavi@gerrit1003 ~ $ sudo systemctl restart apache2
  • 11:48 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki frwiki --current --all --touched-after=20230613000000 --start '["7544396"]' 2>&1 | tee ~/T315510-frwiki # in tmux
  • 11:46 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki viwiki --current --all --touched-after=20230613000000 2>&1 | tee ~/T315510-viwiki # in tmux
  • 10:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138881
  • 10:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2190 T343674', diff saved to https://phabricator.wikimedia.org/P56677 and previous config saved to /var/cache/conftool/dbconfig/20240212-102046-arnaudb.json
  • 10:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: T343674 testing cloning a single instance node to a multi-instance one
  • 10:19 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 138881
  • 10:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2180.codfw.wmnet with reason: T343674 testing cloning a single instance node to a multi-instance one
  • 10:08 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Remove legacy codfw vc switches from synced hiera data after netbox status change - cmooney@cumin1002 - T355544"
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P56676 and previous config saved to /var/cache/conftool/dbconfig/20240212-100655-ladsgroup.json
  • 10:06 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Remove legacy codfw vc switches from synced hiera data after netbox status change - cmooney@cumin1002 - T355544"
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P56675 and previous config saved to /var/cache/conftool/dbconfig/20240212-095150-ladsgroup.json
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P56674 and previous config saved to /var/cache/conftool/dbconfig/20240212-093645-ladsgroup.json
  • 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver1002.eqiad.wmnet
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetserver1002.eqiad.wmnet
  • 09:21 moritzm: restarting archiva to pick up Java security updates
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P56672 and previous config saved to /var/cache/conftool/dbconfig/20240212-092140-ladsgroup.json
  • 09:16 moritzm: installing java 8 security updates on Buster
  • 08:58 taavi@cumin1002: conftool action : set/pooled=yes; selector: name=cloudweb1003.wikimedia.org
  • 08:54 taavi@cumin1002: conftool action : set/pooled=no; selector: name=cloudweb1003.wikimedia.org
  • 08:54 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1003.wikimedia.org with OS bullseye
  • 08:52 hashar@deploy2002: Finished deploy [gerrit/gerrit@db69b2b]: Bump javascript from es2018 to es2020 (duration: 00m 07s)
  • 08:52 hashar@deploy2002: Started deploy [gerrit/gerrit@db69b2b]: Bump javascript from es2018 to es2020
  • 08:26 hashar@deploy2002: Finished deploy [integration/docroot@2360fa1]: Updating eslint-config-wikimedia and mediawiki-phan-config (duration: 00m 06s)
  • 08:26 hashar@deploy2002: Started deploy [integration/docroot@2360fa1]: Updating eslint-config-wikimedia and mediawiki-phan-config
  • 08:26 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
  • 08:23 moritzm: update netboot image for Bookworm 12.5 point release T357133
  • 08:23 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
  • 08:11 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS bullseye
  • 08:11 XioNoX: set esams NL-IX peering as primary
  • 08:10 moritzm: update netboot image for Bullseye 11.9 point release T357144
  • 08:03 taavi@cumin1002: conftool action : set/pooled=inactive; selector: name=cloudweb1003.wikimedia.org
  • 07:44 vgutierrez: upload golang-github-u-root-u-root_0.11.0 to apt.wm.o (bookworm)

2024-02-11

  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T352010)', diff saved to https://phabricator.wikimedia.org/P56670 and previous config saved to /var/cache/conftool/dbconfig/20240211-195509-ladsgroup.json
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P56669 and previous config saved to /var/cache/conftool/dbconfig/20240211-194002-ladsgroup.json
  • 19:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P56668 and previous config saved to /var/cache/conftool/dbconfig/20240211-192455-ladsgroup.json
  • 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T352010)', diff saved to https://phabricator.wikimedia.org/P56667 and previous config saved to /var/cache/conftool/dbconfig/20240211-190948-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2192 (T352010)', diff saved to https://phabricator.wikimedia.org/P56666 and previous config saved to /var/cache/conftool/dbconfig/20240211-165910-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T352010)', diff saved to https://phabricator.wikimedia.org/P56665 and previous config saved to /var/cache/conftool/dbconfig/20240211-165848-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P56664 and previous config saved to /var/cache/conftool/dbconfig/20240211-164341-ladsgroup.json
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P56663 and previous config saved to /var/cache/conftool/dbconfig/20240211-162834-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T352010)', diff saved to https://phabricator.wikimedia.org/P56662 and previous config saved to /var/cache/conftool/dbconfig/20240211-161328-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T352010)', diff saved to https://phabricator.wikimedia.org/P56661 and previous config saved to /var/cache/conftool/dbconfig/20240211-132638-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T352010)', diff saved to https://phabricator.wikimedia.org/P56660 and previous config saved to /var/cache/conftool/dbconfig/20240211-132617-ladsgroup.json
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P56659 and previous config saved to /var/cache/conftool/dbconfig/20240211-131110-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P56658 and previous config saved to /var/cache/conftool/dbconfig/20240211-125603-ladsgroup.json
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T352010)', diff saved to https://phabricator.wikimedia.org/P56657 and previous config saved to /var/cache/conftool/dbconfig/20240211-124057-ladsgroup.json
  • 09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T352010)', diff saved to https://phabricator.wikimedia.org/P56656 and previous config saved to /var/cache/conftool/dbconfig/20240211-094158-ladsgroup.json
  • 09:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 09:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T352010)', diff saved to https://phabricator.wikimedia.org/P56655 and previous config saved to /var/cache/conftool/dbconfig/20240211-094136-ladsgroup.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P56654 and previous config saved to /var/cache/conftool/dbconfig/20240211-092630-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P56653 and previous config saved to /var/cache/conftool/dbconfig/20240211-091123-ladsgroup.json
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T352010)', diff saved to https://phabricator.wikimedia.org/P56652 and previous config saved to /var/cache/conftool/dbconfig/20240211-085616-ladsgroup.json
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T352010)', diff saved to https://phabricator.wikimedia.org/P56651 and previous config saved to /var/cache/conftool/dbconfig/20240211-055427-ladsgroup.json
  • 05:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T352010)', diff saved to https://phabricator.wikimedia.org/P56650 and previous config saved to /var/cache/conftool/dbconfig/20240211-055405-ladsgroup.json
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P56649 and previous config saved to /var/cache/conftool/dbconfig/20240211-053858-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P56648 and previous config saved to /var/cache/conftool/dbconfig/20240211-052352-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T352010)', diff saved to https://phabricator.wikimedia.org/P56647 and previous config saved to /var/cache/conftool/dbconfig/20240211-050845-ladsgroup.json
  • 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T352010)', diff saved to https://phabricator.wikimedia.org/P56646 and previous config saved to /var/cache/conftool/dbconfig/20240211-015257-ladsgroup.json
  • 01:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 01:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T352010)', diff saved to https://phabricator.wikimedia.org/P56645 and previous config saved to /var/cache/conftool/dbconfig/20240211-015236-ladsgroup.json
  • 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P56644 and previous config saved to /var/cache/conftool/dbconfig/20240211-013729-ladsgroup.json
  • 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P56643 and previous config saved to /var/cache/conftool/dbconfig/20240211-012222-ladsgroup.json
  • 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T352010)', diff saved to https://phabricator.wikimedia.org/P56642 and previous config saved to /var/cache/conftool/dbconfig/20240211-010715-ladsgroup.json

2024-02-10

  • 21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T352010)', diff saved to https://phabricator.wikimedia.org/P56641 and previous config saved to /var/cache/conftool/dbconfig/20240210-215952-ladsgroup.json
  • 21:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 21:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T352010)', diff saved to https://phabricator.wikimedia.org/P56640 and previous config saved to /var/cache/conftool/dbconfig/20240210-215913-ladsgroup.json
  • 21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P56639 and previous config saved to /var/cache/conftool/dbconfig/20240210-214405-ladsgroup.json
  • 21:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P56638 and previous config saved to /var/cache/conftool/dbconfig/20240210-212859-ladsgroup.json
  • 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T352010)', diff saved to https://phabricator.wikimedia.org/P56637 and previous config saved to /var/cache/conftool/dbconfig/20240210-211352-ladsgroup.json
  • 19:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T352010)', diff saved to https://phabricator.wikimedia.org/P56636 and previous config saved to /var/cache/conftool/dbconfig/20240210-192353-ladsgroup.json
  • 19:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 19:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 19:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T352010)', diff saved to https://phabricator.wikimedia.org/P56635 and previous config saved to /var/cache/conftool/dbconfig/20240210-192312-ladsgroup.json
  • 19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P56634 and previous config saved to /var/cache/conftool/dbconfig/20240210-190805-ladsgroup.json
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P56633 and previous config saved to /var/cache/conftool/dbconfig/20240210-185258-ladsgroup.json
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T352010)', diff saved to https://phabricator.wikimedia.org/P56632 and previous config saved to /var/cache/conftool/dbconfig/20240210-183752-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T352010)', diff saved to https://phabricator.wikimedia.org/P56631 and previous config saved to /var/cache/conftool/dbconfig/20240210-181424-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T352010)', diff saved to https://phabricator.wikimedia.org/P56630 and previous config saved to /var/cache/conftool/dbconfig/20240210-181403-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P56629 and previous config saved to /var/cache/conftool/dbconfig/20240210-175856-ladsgroup.json
  • 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P56628 and previous config saved to /var/cache/conftool/dbconfig/20240210-174349-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T352010)', diff saved to https://phabricator.wikimedia.org/P56627 and previous config saved to /var/cache/conftool/dbconfig/20240210-172843-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T352010)', diff saved to https://phabricator.wikimedia.org/P56626 and previous config saved to /var/cache/conftool/dbconfig/20240210-140241-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T352010)', diff saved to https://phabricator.wikimedia.org/P56625 and previous config saved to /var/cache/conftool/dbconfig/20240210-112150-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T352010)', diff saved to https://phabricator.wikimedia.org/P56624 and previous config saved to /var/cache/conftool/dbconfig/20240210-112129-ladsgroup.json
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P56623 and previous config saved to /var/cache/conftool/dbconfig/20240210-110622-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P56622 and previous config saved to /var/cache/conftool/dbconfig/20240210-105116-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T352010)', diff saved to https://phabricator.wikimedia.org/P56621 and previous config saved to /var/cache/conftool/dbconfig/20240210-103609-ladsgroup.json
  • 08:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 05:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 05:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 05:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1244:3315 (T352010)', diff saved to https://phabricator.wikimedia.org/P56620 and previous config saved to /var/cache/conftool/dbconfig/20240210-054721-ladsgroup.json
  • 05:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1244:3315', diff saved to https://phabricator.wikimedia.org/P56619 and previous config saved to /var/cache/conftool/dbconfig/20240210-053215-ladsgroup.json
  • 05:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1244:3315', diff saved to https://phabricator.wikimedia.org/P56618 and previous config saved to /var/cache/conftool/dbconfig/20240210-051708-ladsgroup.json
  • 05:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1244:3315 (T352010)', diff saved to https://phabricator.wikimedia.org/P56617 and previous config saved to /var/cache/conftool/dbconfig/20240210-050201-ladsgroup.json
  • 03:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 03:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T352010)', diff saved to https://phabricator.wikimedia.org/P56615 and previous config saved to /var/cache/conftool/dbconfig/20240210-032801-ladsgroup.json
  • 03:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P56614 and previous config saved to /var/cache/conftool/dbconfig/20240210-031255-ladsgroup.json
  • 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P56613 and previous config saved to /var/cache/conftool/dbconfig/20240210-025748-ladsgroup.json
  • 02:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T352010)', diff saved to https://phabricator.wikimedia.org/P56612 and previous config saved to /var/cache/conftool/dbconfig/20240210-024242-ladsgroup.json
  • 02:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1244:3315 (T352010)', diff saved to https://phabricator.wikimedia.org/P56611 and previous config saved to /var/cache/conftool/dbconfig/20240210-021141-ladsgroup.json
  • 02:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 02:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 02:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T352010)', diff saved to https://phabricator.wikimedia.org/P56610 and previous config saved to /var/cache/conftool/dbconfig/20240210-021119-ladsgroup.json
  • 01:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P56609 and previous config saved to /var/cache/conftool/dbconfig/20240210-015612-ladsgroup.json
  • 01:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P56608 and previous config saved to /var/cache/conftool/dbconfig/20240210-014106-ladsgroup.json
  • 01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T352010)', diff saved to https://phabricator.wikimedia.org/P56607 and previous config saved to /var/cache/conftool/dbconfig/20240210-012559-ladsgroup.json

2024-02-09

  • 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1230 (T352010)', diff saved to https://phabricator.wikimedia.org/P56606 and previous config saved to /var/cache/conftool/dbconfig/20240209-230425-ladsgroup.json
  • 23:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 23:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 21:39 inflatador: bking@deploy2002 install 'python3-boto3' pkg T348685
  • 21:36 inflatador: bking@deploy2002 install 'python3-plac' pkg T348685
  • 21:09 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new systemd settings - bking@cumin2002 - T355617
  • 21:06 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new systemd settings - bking@cumin2002 - T355617
  • 20:55 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: apply new systemd settings - bking@cumin2002 - T355617
  • 20:46 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: apply new systemd settings - bking@cumin2002 - T355617
  • 20:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 20:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T352010)', diff saved to https://phabricator.wikimedia.org/P56605 and previous config saved to /var/cache/conftool/dbconfig/20240209-202830-ladsgroup.json
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P56604 and previous config saved to /var/cache/conftool/dbconfig/20240209-201324-ladsgroup.json
  • 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P56603 and previous config saved to /var/cache/conftool/dbconfig/20240209-195817-ladsgroup.json
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T352010)', diff saved to https://phabricator.wikimedia.org/P56602 and previous config saved to /var/cache/conftool/dbconfig/20240209-194310-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T352010)', diff saved to https://phabricator.wikimedia.org/P56601 and previous config saved to /var/cache/conftool/dbconfig/20240209-193452-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T352010)', diff saved to https://phabricator.wikimedia.org/P56600 and previous config saved to /var/cache/conftool/dbconfig/20240209-193430-ladsgroup.json
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P56599 and previous config saved to /var/cache/conftool/dbconfig/20240209-191923-ladsgroup.json
  • 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P56598 and previous config saved to /var/cache/conftool/dbconfig/20240209-190416-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T352010)', diff saved to https://phabricator.wikimedia.org/P56597 and previous config saved to /var/cache/conftool/dbconfig/20240209-184910-ladsgroup.json
  • 18:49 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host etherpad1004.eqiad.wmnet with OS bookworm
  • 18:39 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:38 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:37 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:37 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1008.eqiad.wmnet with OS bullseye
  • 18:36 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 18:36 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:36 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:35 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:35 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:35 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on etherpad1004.eqiad.wmnet with reason: host reimage
  • 18:32 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 18:32 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on etherpad1004.eqiad.wmnet with reason: host reimage
  • 18:19 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host etherpad1004.eqiad.wmnet with OS bookworm
  • 18:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1008.eqiad.wmnet with reason: host reimage
  • 18:11 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1008.eqiad.wmnet with reason: host reimage
  • 17:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.eqiad.wmnet with OS bullseye
  • 17:43 dzahn@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host etherpad1004.eqiad.wmnet
  • 17:43 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host etherpad1004.eqiad.wmnet with OS bookworm
  • 17:43 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host etherpad1004.eqiad.wmnet with OS bookworm
  • 17:41 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM etherpad1004.eqiad.wmnet - dzahn@cumin1002"
  • 17:41 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM etherpad1004.eqiad.wmnet - dzahn@cumin1002"
  • 17:40 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) etherpad1004.eqiad.wmnet on all recursors
  • 17:40 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache etherpad1004.eqiad.wmnet on all recursors
  • 17:40 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:40 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM etherpad1004.eqiad.wmnet - dzahn@cumin1002"
  • 17:39 mutante: merging netbox/hiera data changes that add restbase hosts and show up when I run unrelated cookbook creating a new VM - T354893
  • 17:35 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM etherpad1004.eqiad.wmnet - dzahn@cumin1002"
  • 17:30 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 17:30 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host etherpad1004.eqiad.wmnet
  • 17:18 cdanis: rolling restart of pods on k8s aux eqiad T356661
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T352010)', diff saved to https://phabricator.wikimedia.org/P56594 and previous config saved to /var/cache/conftool/dbconfig/20240209-171225-ladsgroup.json
  • 17:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T352010)', diff saved to https://phabricator.wikimedia.org/P56593 and previous config saved to /var/cache/conftool/dbconfig/20240209-171203-ladsgroup.json
  • 17:11 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for color-link-visited was not defined (T356928) (duration: 13m 13s)
  • 17:04 logmsgbot: lucaswerkmeister-wmde@deploy2002 jdlrobson and lucaswerkmeister-wmde: Continuing with sync
  • 16:59 logmsgbot: lucaswerkmeister-wmde@deploy2002 jdlrobson and lucaswerkmeister-wmde: Backport for color-link-visited was not defined (T356928) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:57 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for color-link-visited was not defined (T356928)
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P56592 and previous config saved to /var/cache/conftool/dbconfig/20240209-165657-ladsgroup.json
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P56591 and previous config saved to /var/cache/conftool/dbconfig/20240209-164150-ladsgroup.json
  • 16:39 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:34 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:29 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T352010)', diff saved to https://phabricator.wikimedia.org/P56590 and previous config saved to /var/cache/conftool/dbconfig/20240209-162643-ladsgroup.json
  • 16:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:18 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:59 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:59 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1035.eqiad.wmnet with OS bullseye
  • 15:58 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1035.eqiad.wmnet with OS bullseye
  • 15:51 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:06 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1005*,cloudelastic1006*,cloudelastic1007*,cloudelastic1008* for IP migration - bking@cumin2002 - T355617
  • 15:05 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1005*,cloudelastic1006*,cloudelastic1007*,cloudelastic1008* for IP migration - bking@cumin2002 - T355617
  • 14:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1035.eqiad.wmnet with OS bullseye
  • 14:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 14:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T352010)', diff saved to https://phabricator.wikimedia.org/P56588 and previous config saved to /var/cache/conftool/dbconfig/20240209-135337-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T352010)', diff saved to https://phabricator.wikimedia.org/P56587 and previous config saved to /var/cache/conftool/dbconfig/20240209-135315-ladsgroup.json
  • 13:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 13:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 13:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1040.eqiad.wmnet with reason: host reimage
  • 13:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:46 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1041.eqiad.wmnet with reason: host reimage
  • 13:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1037.eqiad.wmnet with reason: host reimage
  • 13:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1042.eqiad.wmnet with reason: host reimage
  • 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P56586 and previous config saved to /var/cache/conftool/dbconfig/20240209-133809-ladsgroup.json
  • 13:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1038.eqiad.wmnet with reason: host reimage
  • 13:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1039.eqiad.wmnet with reason: host reimage
  • 13:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1040.eqiad.wmnet with reason: host reimage
  • 13:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1041.eqiad.wmnet with reason: host reimage
  • 13:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1042.eqiad.wmnet with reason: host reimage
  • 13:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1034.eqiad.wmnet with reason: host reimage
  • 13:31 topranks: enabling BGP peering to NL-IX (new IXP connection) route servers from cr2-esams T322630
  • 13:30 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1039.eqiad.wmnet with reason: host reimage
  • 13:29 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1037.eqiad.wmnet with reason: host reimage
  • 13:29 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1038.eqiad.wmnet with reason: host reimage
  • 13:26 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1034.eqiad.wmnet with reason: host reimage
  • 13:25 jmm@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P56585 and previous config saved to /var/cache/conftool/dbconfig/20240209-132302-ladsgroup.json
  • 13:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1042.eqiad.wmnet with OS bullseye
  • 13:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1041.eqiad.wmnet with OS bullseye
  • 13:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1040.eqiad.wmnet with OS bullseye
  • 13:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1039.eqiad.wmnet with OS bullseye
  • 13:14 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1038.eqiad.wmnet with OS bullseye
  • 13:14 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1037.eqiad.wmnet with OS bullseye
  • 13:14 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1035.eqiad.wmnet with OS bullseye
  • 13:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1034.eqiad.wmnet with OS bullseye
  • 13:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase1039']
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T352010)', diff saved to https://phabricator.wikimedia.org/P56584 and previous config saved to /var/cache/conftool/dbconfig/20240209-130755-ladsgroup.json
  • 13:07 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1039']
  • 13:07 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase1034']
  • 13:07 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1034']
  • 13:06 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:06 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for puppetserver2003 - cmooney@cumin1002"
  • 13:05 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new entries for puppetserver2003 - cmooney@cumin1002"
  • 13:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase1034']
  • 13:03 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1034']
  • 13:02 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 13:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1041.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1034.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1042.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:50 stran@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 12:49 stran@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 12:49 stran@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 12:48 stran@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 12:47 stran@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:47 stran@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:47 stran@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:45 stran@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:45 stran@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:44 stran@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:43 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be1044
  • 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster2003 rename - jmm@cumin2002"
  • 12:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster2003 rename - jmm@cumin2002"
  • 12:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1034.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:32 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:31 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:30 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:30 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:28 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1041.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:27 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1041.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:27 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:27 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1041.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1042.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:24 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be1044
  • 12:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ms-be1044.eqiad.wmnet
  • 12:04 mvernon@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be1044.eqiad.wmnet
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T352010)', diff saved to https://phabricator.wikimedia.org/P56583 and previous config saved to /var/cache/conftool/dbconfig/20240209-114208-ladsgroup.json
  • 11:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:41 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 11:40 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 11:39 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 11:39 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 11:32 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 11:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 11:30 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 11:30 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 11:26 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 11:25 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T352010)', diff saved to https://phabricator.wikimedia.org/P56582 and previous config saved to /var/cache/conftool/dbconfig/20240209-102336-ladsgroup.json
  • 10:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T352010)', diff saved to https://phabricator.wikimedia.org/P56581 and previous config saved to /var/cache/conftool/dbconfig/20240209-102314-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P56580 and previous config saved to /var/cache/conftool/dbconfig/20240209-100808-ladsgroup.json
  • 09:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2194.codfw.wmnet with OS bookworm
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P56579 and previous config saved to /var/cache/conftool/dbconfig/20240209-095301-ladsgroup.json
  • 09:46 moritzm: uploaded openjdk-8 8u402-ga-2~deb10u1 for buster-wikimedia (backport of latest Java 8 security updates)
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T352010)', diff saved to https://phabricator.wikimedia.org/P56578 and previous config saved to /var/cache/conftool/dbconfig/20240209-093754-ladsgroup.json
  • 09:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 09:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 09:08 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2194.codfw.wmnet with OS bookworm
  • 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster2003.codfw.wmnet
  • 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:37 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:35 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:29 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetmaster2003.codfw.wmnet
  • 06:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T352010)', diff saved to https://phabricator.wikimedia.org/P56577 and previous config saved to /var/cache/conftool/dbconfig/20240209-065147-ladsgroup.json
  • 06:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 06:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T352010)', diff saved to https://phabricator.wikimedia.org/P56576 and previous config saved to /var/cache/conftool/dbconfig/20240209-065125-ladsgroup.json
  • 06:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1124.eqiad.wmnet
  • 06:38 marostegui@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:38 marostegui@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1124.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
  • 06:36 marostegui@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1124.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
  • 06:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P56575 and previous config saved to /var/cache/conftool/dbconfig/20240209-063618-ladsgroup.json
  • 06:34 marostegui@cumin1002: START - Cookbook sre.dns.netbox
  • 06:29 marostegui@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1124.eqiad.wmnet
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P56574 and previous config saved to /var/cache/conftool/dbconfig/20240209-062111-ladsgroup.json
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T352010)', diff saved to https://phabricator.wikimedia.org/P56573 and previous config saved to /var/cache/conftool/dbconfig/20240209-060605-ladsgroup.json
  • 05:48 marostegui: dbmaint Schema change on s7@codfw T357067
  • 04:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T352010)', diff saved to https://phabricator.wikimedia.org/P56572 and previous config saved to /var/cache/conftool/dbconfig/20240209-030028-ladsgroup.json
  • 03:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 02:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 02:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 02:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 00:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance

2024-02-08

  • 23:57 volans@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 23:56 volans@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 23:50 foks: removing 14 files for legal compliance
  • 23:28 foks: removing one file for legal compliance
  • 23:17 foks: removing two files for legal compliance
  • 22:58 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
  • 22:57 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
  • 22:51 jhathaway: made a stupid mistake and accidentally installed knot & unbound on dns1004, based on logs I don't think any harm was caused, they have since been removed
  • 22:44 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:44 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: racked and provision network restbase servers - jclark@cumin1002"
  • 22:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: racked and provision network restbase servers - jclark@cumin1002"
  • 22:41 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 22:38 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1005*,cloudelastic1006*,cloudelastic1007*,cloudelastic1008* for IP migration - bking@cumin2002 - T355617
  • 22:38 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1005*,cloudelastic1006*,cloudelastic1007*,cloudelastic1008* for IP migration - bking@cumin2002 - T355617
  • 22:26 vriley@cumin1001: START - Cookbook sre.hosts.provision for host restbase1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:24 vriley@cumin1001: START - Cookbook sre.hosts.provision for host restbase1034.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: racked and provision network restbase servers - jclark@cumin1002"
  • 22:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: racked and provision network restbase servers - jclark@cumin1002"
  • 22:18 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 22:11 topranks: adding missing external-links group to AMS-IX peering port ae1.380 cr1-esams
  • 22:06 urbanecm@deploy2002: Finished scap: Backport for Echo: Conditional defaults: Fix start timestamp (T353225) (duration: 09m 29s)
  • 22:00 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 21:59 Daimona: T357007 Running mwscript /home/daimona/GenerateInvitationList.php --wiki=metawiki --listfile=/home/daimona/list2.txt (same as current master)
  • 21:58 urbanecm@deploy2002: urbanecm: Backport for Echo: Conditional defaults: Fix start timestamp (T353225) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:57 urbanecm@deploy2002: Started scap: Backport for Echo: Conditional defaults: Fix start timestamp (T353225)
  • 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P56571 and previous config saved to /var/cache/conftool/dbconfig/20240208-214640-root.json
  • 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P56570 and previous config saved to /var/cache/conftool/dbconfig/20240208-214625-root.json
  • 21:46 urbanecm@deploy2002: Finished scap: Backport for Echo: Use conditional defaults for 4 user properties (T353225) (duration: 09m 07s)
  • 21:40 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 21:38 urbanecm@deploy2002: urbanecm: Backport for Echo: Use conditional defaults for 4 user properties (T353225) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:37 urbanecm@deploy2002: Started scap: Backport for Echo: Use conditional defaults for 4 user properties (T353225)
  • 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P56569 and previous config saved to /var/cache/conftool/dbconfig/20240208-213135-root.json
  • 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P56568 and previous config saved to /var/cache/conftool/dbconfig/20240208-213120-root.json
  • 21:25 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.17 refs T354435
  • 21:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P56567 and previous config saved to /var/cache/conftool/dbconfig/20240208-211630-root.json
  • 21:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P56566 and previous config saved to /var/cache/conftool/dbconfig/20240208-211615-root.json
  • 21:13 brennen@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.17 refs T354435 (duration: 06m 52s)
  • 21:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 21:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 21:06 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.17 refs T354435
  • 21:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 21:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P56565 and previous config saved to /var/cache/conftool/dbconfig/20240208-210125-root.json
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P56564 and previous config saved to /var/cache/conftool/dbconfig/20240208-210110-root.json
  • 20:55 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:55 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2003 - cmooney@cumin1002"
  • 20:55 brennen@deploy2002: Finished scap: Backport for Revert "Migrate `editResponseTime` metric to Prometheus store" (T357050) (duration: 09m 17s)
  • 20:54 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2003 - cmooney@cumin1002"
  • 20:48 brennen@deploy2002: brennen: Continuing with sync
  • 20:47 brennen@deploy2002: brennen: Backport for Revert "Migrate `editResponseTime` metric to Prometheus store" (T357050) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:45 brennen@deploy2002: Started scap: Backport for Revert "Migrate `editResponseTime` metric to Prometheus store" (T357050)
  • 20:39 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 20:24 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:24 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2003 - cmooney@cumin1002"
  • 20:24 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2003 - cmooney@cumin1002"
  • 20:18 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 20:17 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:16 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 20:16 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 19:01 brennen: train 1.42.0-wmf.17 (T354435): currently rolled back to group0; blocked pending a fix for edit metrics (further details to come)
  • 18:58 ejegg: re-enabled fundraising scheduled jobs
  • 18:49 ejegg: standalone SmashPig upgraded from 20d6434e to 669a9fe3
  • 18:48 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.17 refs T354435
  • 18:41 ejegg: jobs disabled for option change
  • 18:03 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 18:02 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 18:02 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 18:01 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 18:01 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 18:00 bd808@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 17:52 marostegui@cumin1002: dbctl commit (dc=all): 'es2020 (re)pooling @ 100%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P56563 and previous config saved to /var/cache/conftool/dbconfig/20240208-175206-root.json
  • 17:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P56562 and previous config saved to /var/cache/conftool/dbconfig/20240208-175149-root.json
  • 17:45 mutante: deploy1002/deploy2002 - change in scap foreachwikiindblist deployed (gerrit:992263)
  • 17:37 marostegui@cumin1002: dbctl commit (dc=all): 'es2020 (re)pooling @ 75%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P56561 and previous config saved to /var/cache/conftool/dbconfig/20240208-173701-root.json
  • 17:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P56560 and previous config saved to /var/cache/conftool/dbconfig/20240208-173644-root.json
  • 17:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P56559 and previous config saved to /var/cache/conftool/dbconfig/20240208-172902-root.json
  • 17:21 marostegui@cumin1002: dbctl commit (dc=all): 'es2020 (re)pooling @ 50%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P56558 and previous config saved to /var/cache/conftool/dbconfig/20240208-172156-root.json
  • 17:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P56557 and previous config saved to /var/cache/conftool/dbconfig/20240208-172139-root.json
  • 17:15 brennen@deploy2002: Synchronized php: group1 wikis to 1.42.0-wmf.17 refs T354435 (duration: 06m 52s)
  • 17:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P56556 and previous config saved to /var/cache/conftool/dbconfig/20240208-171358-root.json
  • 17:09 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.42.0-wmf.17 refs T354435
  • 17:06 marostegui@cumin1002: dbctl commit (dc=all): 'es2020 (re)pooling @ 25%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P56555 and previous config saved to /var/cache/conftool/dbconfig/20240208-170651-root.json
  • 17:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P56554 and previous config saved to /var/cache/conftool/dbconfig/20240208-170634-root.json
  • 17:01 brennen: train 1.42.0-wmf.17 (T354435): blockers resolved, rolling to group1
  • 16:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P56553 and previous config saved to /var/cache/conftool/dbconfig/20240208-165853-root.json
  • 16:51 marostegui@cumin1002: dbctl commit (dc=all): 'es2020 (re)pooling @ 10%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P56552 and previous config saved to /var/cache/conftool/dbconfig/20240208-165147-root.json
  • 16:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 16:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2103 (re)pooling @ 10%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P56551 and previous config saved to /var/cache/conftool/dbconfig/20240208-165129-root.json
  • 16:48 cgoubert@cumin2002: conftool action : set/pooled=yes; selector: name=(mw2379|mw2380|mw2382|mw2383|mw2384|mw2385|mw2386|mw2387|mw2388|mw2389|mw2390|mw2391|mw2392|mw2393|mw2394|mw2396|mw2397|mw2398|mw2399|mw2400|mw2298|mw2299|mw2300).*
  • 16:48 claime: Repooling mw2379|mw2380|mw2382|mw2383|mw2384|mw2385|mw2386|mw2387|mw2388|mw2389|mw2390|mw2391|mw2392|mw2393|mw2394|mw2396|mw2397|mw2398|mw2399|mw2400|mw2298|mw2299|mw2300 - T355862
  • 16:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P56550 and previous config saved to /var/cache/conftool/dbconfig/20240208-164348-root.json
  • 16:40 claime: Uncordoning mw2377.codfw.wmnet mw2378.codfw.wmnet mw2381.codfw.wmnet mw2395.codfw.wmnet mw2291.codfw.wmnet mw2292.codfw.wmnet mw2293.codfw.wmnet mw2294.codfw.wmnet mw2295.codfw.wmnet mw2296.codfw.wmnet mw2297.codfw.wmnet - T355862
  • 16:37 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for asw-a-codfw,cr[1-2]-codfw,lsw1-a3-codfw.mgmt
  • 16:37 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for asw-a-codfw,cr[1-2]-codfw,lsw1-a3-codfw.mgmt
  • 16:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2020 (re)pooling @ 5%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P56549 and previous config saved to /var/cache/conftool/dbconfig/20240208-163642-root.json
  • 16:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2103 (re)pooling @ 5%: After network maintenance', diff saved to https://phabricator.wikimedia.org/P56548 and previous config saved to /var/cache/conftool/dbconfig/20240208-163624-root.json
  • 16:31 hnowlan@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mw2282.codfw.wmnet with OS bullseye
  • 16:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P56547 and previous config saved to /var/cache/conftool/dbconfig/20240208-162843-root.json
  • 16:26 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2282.codfw.wmnet with OS bullseye
  • 16:23 topranks: Server move completed codfw rack A3 T355862
  • 16:15 Dreamy_Jazz: Running `mwscript extensions/MediaModeration/maintenance/scanFilesInScanTable.php --wiki=commonswiki --use-jobqueue --sleep 30 --verbose 2>&1 | tee ~/scan-files-in-scan-table-commonswiki-sleep-30-no-render-now.txt` on a tmux session - See https://wikitech.wikimedia.org/wiki/MediaModeration
  • 16:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P56546 and previous config saved to /var/cache/conftool/dbconfig/20240208-161338-root.json
  • 16:10 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 39 hosts with reason: Migrating servers in codfw rack A3 to lsw1-a3-codfw
  • 16:10 hnowlan@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mw2282.codfw.wmnet with OS bullseye
  • 16:09 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 39 hosts with reason: Migrating servers in codfw rack A3 to lsw1-a3-codfw
  • 16:09 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-a-codfw,cr[1-2]-codfw,lsw1-a3-codfw.mgmt with reason: server uplink migration codfw rack a3
  • 16:09 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on asw-a-codfw,cr[1-2]-codfw,lsw1-a3-codfw.mgmt with reason: server uplink migration codfw rack a3
  • 16:07 topranks: Commencing server uplink moves from old switch to new in codfw rack A3 T355862
  • 16:05 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 16:04 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2282.codfw.wmnet with OS bullseye
  • 16:04 hnowlan@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mw2282.codfw.wmnet with OS bullseye
  • 16:03 moritzm: installing pillow security updates
  • 16:03 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2282.codfw.wmnet with OS bullseye
  • 16:03 hnowlan@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mw2282.codfw.wmnet with OS bullseye
  • 15:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: After schema change', diff saved to https://phabricator.wikimedia.org/P56545 and previous config saved to /var/cache/conftool/dbconfig/20240208-155833-root.json
  • 15:57 topranks: moving Netbox server uplinks from asw-a3-codfw to lsw1-a3-codfw to prep config for server moves T355862
  • 15:57 Dreamy_Jazz: Running `foreachwikindblist group0.dblist extensions/MediaModeration/maintenance/scanFilesInScanTable.php --use-jobqueue --sleep 30 --verbose 2>&1 | tee ~/scan-files-in-scan-table-group0-sleep-30-thumbor.txt`
  • 15:57 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: wmcs::openstack::codfw1dev::cloudweb
  • 15:56 claime: Depooled mw2379|mw2380|mw2382|mw2383|mw2384|mw2385|mw2386|mw2387|mw2388|mw2389|mw2390|mw2391|mw2392|mw2393|mw2394|mw2396|mw2397|mw2398|mw2399|mw2400|mw2298|mw2299|mw2300 - T355862
  • 15:55 Dreamy_Jazz: Running `mwscript extensions/MediaModeration/maintenance/scanFilesInScanTable.php --wiki=testwiki --use-jobqueue --sleep 30 --verbose 2>&1 | tee ~/scan-files-in-scan-table-testwiki-sleep-30-no-render-now.txt`
  • 15:55 cgoubert@cumin2002: conftool action : set/pooled=inactive; selector: name=(mw2379|mw2380|mw2382|mw2383|mw2384|mw2385|mw2386|mw2387|mw2388|mw2389|mw2390|mw2391|mw2392|mw2393|mw2394|mw2396|mw2397|mw2398|mw2399|mw2400|mw2298|mw2299|mw2300).*
  • 15:54 dreamyjazz@deploy2002: Finished scap: Backport for Follow-up: MediaModerationImageContentsLookup: use proxied HTTP request to generate file (T356047), Follow-up: MediaModerationImageContentsLookup: use proxied HTTP request to generate file (T356047) (duration: 08m 03s)
  • 15:50 taavi@cumin1002: START - Cookbook sre.puppet.migrate-role for role: wmcs::openstack::codfw1dev::cloudweb
  • 15:49 claime: Draining mw2377.codfw.wmnet mw2378.codfw.wmnet mw2381.codfw.wmnet mw2395.codfw.wmnet mw2291.codfw.wmnet mw2292.codfw.wmnet mw2293.codfw.wmnet mw2294.codfw.wmnet mw2295.codfw.wmnet mw2296.codfw.wmnet mw2297.codfw.wmnet - T355862
  • 15:48 claime: Draining mw2377.codfw.wmnet mw2378.codfw.wmnet mw2381.codfw.wmnet mw2395.codfw.wmnet mw2291.codfw.wmnet mw2292.codfw.wmnet mw2293.codfw.wmnet mw2294.codfw.wmnet mw2295.codfw.wmnet mw2296.codfw.wmnet mw2297.codfw.wmnet - T355870
  • 15:47 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
  • 15:47 dreamyjazz@deploy2002: dreamyjazz: Backport for Follow-up: MediaModerationImageContentsLookup: use proxied HTTP request to generate file (T356047), Follow-up: MediaModerationImageContentsLookup: use proxied HTTP request to generate file (T356047) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:47 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2282.codfw.wmnet with OS bullseye
  • 15:47 hnowlan@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mw2282.codfw.wmnet with OS bullseye
  • 15:47 Dreamy_Jazz: Stopped mediamoderation scanning script
  • 15:46 dreamyjazz@deploy2002: Started scap: Backport for Follow-up: MediaModerationImageContentsLookup: use proxied HTTP request to generate file (T356047), Follow-up: MediaModerationImageContentsLookup: use proxied HTTP request to generate file (T356047)
  • 15:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T355609)', diff saved to https://phabricator.wikimedia.org/P56544 and previous config saved to /var/cache/conftool/dbconfig/20240208-154452-marostegui.json
  • 15:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 15:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 15:41 marostegui: dbmaint Schema change on s3@codfw T356988
  • 15:39 marostegui: dbmaint Schema change on s4@codfw T356988
  • 15:38 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 15:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T355609)', diff saved to https://phabricator.wikimedia.org/P56543 and previous config saved to /var/cache/conftool/dbconfig/20240208-152511-marostegui.json
  • 15:20 Dreamy_Jazz: Afternoon backport window done
  • 15:17 Dreamy_Jazz: Running `mwscript extensions/MediaModeration/maintenance/scanFilesInScanTable.php --wiki=commonswiki --use-jobqueue --sleep 30 --verbose 2>&1 | tee ~/scan-files-in-scan-table-commonswiki-sleep-30-no-render-now.txt` on a tmux session - See https://wikitech.wikimedia.org/wiki/MediaModeration
  • 15:17 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2282.codfw.wmnet with OS bullseye
  • 15:17 dreamyjazz@deploy2002: Finished scap: Backport for MediaModerationImageContentsLookup: use proxied HTTP request to generate file (T356047) (duration: 08m 42s)
  • 15:13 marostegui: dbmaint Schema change on s5@codfw T356988
  • 15:10 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
  • 15:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P56542 and previous config saved to /var/cache/conftool/dbconfig/20240208-151005-marostegui.json
  • 15:09 dreamyjazz@deploy2002: dreamyjazz: Backport for MediaModerationImageContentsLookup: use proxied HTTP request to generate file (T356047) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:08 marostegui: dbmaint Schema change on s7@codfw T356988
  • 15:08 dreamyjazz@deploy2002: Started scap: Backport for MediaModerationImageContentsLookup: use proxied HTTP request to generate file (T356047)
  • 15:05 marostegui: dbmaint (retroactive logging) Schema change on s7@codfw T356987
  • 15:05 Dreamy_Jazz: Stopped mediamoderation scanning script for commonswiki
  • 15:04 Dreamy_Jazz: testwiki scan finished
  • 15:03 marostegui: dbmaint Schema change on s8@codfw T356988
  • 15:03 marostegui: dbmaint Schema change on s6@codfw T356988
  • 15:03 marostegui: dbmaint Schema change on s2@codfw T356988
  • 15:03 marostegui: dbmaint Schema change on s1@codfw T356988
  • 14:55 Dreamy_Jazz: Running `mwscript extensions/MediaModeration/maintenance/scanFilesInScanTable.php --wiki=testwiki --use-jobqueue --sleep 30 --verbose 2>&1 | tee ~/scan-files-in-scan-table-testwiki-sleep-30-no-render-now.txt`
  • 14:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P56541 and previous config saved to /var/cache/conftool/dbconfig/20240208-145457-marostegui.json
  • 14:54 dreamyjazz@deploy2002: Finished scap: Backport for MediaModerationImageContentsLookup: use proxied HTTP request to generate file (T356047) (duration: 07m 49s)
  • 14:48 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
  • 14:48 dreamyjazz@deploy2002: dreamyjazz: Backport for MediaModerationImageContentsLookup: use proxied HTTP request to generate file (T356047) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:46 dreamyjazz@deploy2002: Started scap: Backport for MediaModerationImageContentsLookup: use proxied HTTP request to generate file (T356047)
  • 14:46 dreamyjazz@deploy2002: Finished scap: Backport for Add edit_interaction stream config for iOS (T355265) (duration: 10m 12s)
  • 14:40 dreamyjazz@deploy2002: tsev and dreamyjazz: Continuing with sync
  • 14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T355609)', diff saved to https://phabricator.wikimedia.org/P56540 and previous config saved to /var/cache/conftool/dbconfig/20240208-143951-marostegui.json
  • 14:37 dreamyjazz@deploy2002: tsev and dreamyjazz: Backport for Add edit_interaction stream config for iOS (T355265) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:35 dreamyjazz@deploy2002: Started scap: Backport for Add edit_interaction stream config for iOS (T355265)
  • 14:35 dreamyjazz@deploy2002: Finished scap: Backport for Parser: Fix the main loop getting stuck on some signatures (T356884) (duration: 08m 29s)
  • 14:29 dreamyjazz@deploy2002: dreamyjazz and matmarex: Continuing with sync
  • 14:28 eoghan@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM vrts1002.eqiad.wmnet
  • 14:28 dreamyjazz@deploy2002: dreamyjazz and matmarex: Backport for Parser: Fix the main loop getting stuck on some signatures (T356884) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:26 dreamyjazz@deploy2002: Started scap: Backport for Parser: Fix the main loop getting stuck on some signatures (T356884)
  • 14:26 dreamyjazz@deploy2002: Finished scap: Backport for Parser: Fix the main loop getting stuck on some signatures (T356884) (duration: 09m 36s)
  • 14:19 dreamyjazz@deploy2002: dreamyjazz and matmarex: Continuing with sync
  • 14:19 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb2002-dev.wikimedia.org with OS bullseye
  • 14:18 dreamyjazz@deploy2002: dreamyjazz and matmarex: Backport for Parser: Fix the main loop getting stuck on some signatures (T356884) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:16 dreamyjazz@deploy2002: Started scap: Backport for Parser: Fix the main loop getting stuck on some signatures (T356884)
  • 14:13 eoghan@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM vrts1002.eqiad.wmnet
  • 14:13 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 14:07 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 14:07 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 13:58 Emperor: disable puppet and stop swift on ms-be10[44-50] T353149
  • 13:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on 7 hosts with reason: due for decomm
  • 13:57 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on 7 hosts with reason: due for decomm
  • 13:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T355609)', diff saved to https://phabricator.wikimedia.org/P56539 and previous config saved to /var/cache/conftool/dbconfig/20240208-135142-marostegui.json
  • 13:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 13:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 13:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 13:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 13:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T355609)', diff saved to https://phabricator.wikimedia.org/P56538 and previous config saved to /var/cache/conftool/dbconfig/20240208-134243-marostegui.json
  • 13:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb2002-dev.wikimedia.org with reason: host reimage
  • 13:35 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb2002-dev.wikimedia.org with reason: host reimage
  • 13:31 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 13:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P56537 and previous config saved to /var/cache/conftool/dbconfig/20240208-132736-marostegui.json
  • 13:24 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 13:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:13 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudweb2002-dev.wikimedia.org with OS bullseye
  • 13:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P56536 and previous config saved to /var/cache/conftool/dbconfig/20240208-131229-marostegui.json
  • 12:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T355609)', diff saved to https://phabricator.wikimedia.org/P56535 and previous config saved to /var/cache/conftool/dbconfig/20240208-125723-marostegui.json
  • 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetserver1003.eqiad.wmnet
  • 12:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetserver1003.eqiad.wmnet
  • 12:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2168:3317 (T355609)', diff saved to https://phabricator.wikimedia.org/P56534 and previous config saved to /var/cache/conftool/dbconfig/20240208-123343-marostegui.json
  • 12:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T355609)', diff saved to https://phabricator.wikimedia.org/P56533 and previous config saved to /var/cache/conftool/dbconfig/20240208-123320-marostegui.json
  • 12:21 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6a64b3d]: restbase: Disable parsoid storage for jawiki (duration: 15m 49s)
  • 12:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P56532 and previous config saved to /var/cache/conftool/dbconfig/20240208-121813-marostegui.json
  • 12:05 jgiannelos@deploy2002: Started deploy [restbase/deploy@6a64b3d]: restbase: Disable parsoid storage for jawiki
  • 12:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P56531 and previous config saved to /var/cache/conftool/dbconfig/20240208-120306-marostegui.json
  • 12:01 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 12:01 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:58 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:58 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T355609)', diff saved to https://phabricator.wikimedia.org/P56530 and previous config saved to /var/cache/conftool/dbconfig/20240208-114759-marostegui.json
  • 11:41 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:41 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T355609)', diff saved to https://phabricator.wikimedia.org/P56529 and previous config saved to /var/cache/conftool/dbconfig/20240208-113707-marostegui.json
  • 11:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T355609)', diff saved to https://phabricator.wikimedia.org/P56528 and previous config saved to /var/cache/conftool/dbconfig/20240208-113630-marostegui.json
  • 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P56527 and previous config saved to /var/cache/conftool/dbconfig/20240208-112123-marostegui.json
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P56526 and previous config saved to /var/cache/conftool/dbconfig/20240208-110616-marostegui.json
  • 10:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T355609)', diff saved to https://phabricator.wikimedia.org/P56525 and previous config saved to /var/cache/conftool/dbconfig/20240208-105110-marostegui.json
  • 10:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 10:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 10:41 hnowlan@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T334488)
  • 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T355609)', diff saved to https://phabricator.wikimedia.org/P56524 and previous config saved to /var/cache/conftool/dbconfig/20240208-104011-marostegui.json
  • 10:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 10:40 hnowlan@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T334488)
  • 10:39 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T355609)', diff saved to https://phabricator.wikimedia.org/P56523 and previous config saved to /var/cache/conftool/dbconfig/20240208-103949-marostegui.json
  • 10:39 hnowlan@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T334488)
  • 10:38 hnowlan@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T334488)
  • 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P56522 and previous config saved to /var/cache/conftool/dbconfig/20240208-102442-marostegui.json
  • 10:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P56521 and previous config saved to /var/cache/conftool/dbconfig/20240208-100936-marostegui.json
  • 10:07 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 10:06 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 10:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 10:05 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 10:04 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 10:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 10:03 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 10:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 10:01 jiji@cumin1002: conftool action : set/pooled=inactive; selector: service=kubesvc,name=mw2282.codfw.wmnet
  • 10:01 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 10:00 jiji@cumin1002: conftool action : set/pooled=no; selector: service=kubesvc,name=mw2282.codfw.wmnet
  • 09:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T355609)', diff saved to https://phabricator.wikimedia.org/P56520 and previous config saved to /var/cache/conftool/dbconfig/20240208-095429-marostegui.json
  • 09:36 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 09:34 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 09:21 vgutierrez@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir and not P{ncredir2.*} and A:ncredir
  • 09:08 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1003.wikimedia.org
  • 09:01 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb1003.wikimedia.org
  • 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T355609)', diff saved to https://phabricator.wikimedia.org/P56518 and previous config saved to /var/cache/conftool/dbconfig/20240208-085357-marostegui.json
  • 08:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 08:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T355609)', diff saved to https://phabricator.wikimedia.org/P56517 and previous config saved to /var/cache/conftool/dbconfig/20240208-085334-marostegui.json
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P56516 and previous config saved to /var/cache/conftool/dbconfig/20240208-083827-marostegui.json
  • 08:37 urbanecm@deploy2002: Finished scap: Backport for Use real anonymous user in ComputedUserImpactLookup (T356895) (duration: 07m 49s)
  • 08:29 urbanecm@deploy2002: Started scap: Backport for Use real anonymous user in ComputedUserImpactLookup (T356895)
  • 08:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P56515 and previous config saved to /var/cache/conftool/dbconfig/20240208-082544-root.json
  • 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P56514 and previous config saved to /var/cache/conftool/dbconfig/20240208-082320-marostegui.json
  • 08:19 vgutierrez@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir and not P{ncredir2.*} and A:ncredir
  • 08:17 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P56513 and previous config saved to /var/cache/conftool/dbconfig/20240208-081039-root.json
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T355609)', diff saved to https://phabricator.wikimedia.org/P56512 and previous config saved to /var/cache/conftool/dbconfig/20240208-080814-marostegui.json
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2121 (T355609)', diff saved to https://phabricator.wikimedia.org/P56511 and previous config saved to /var/cache/conftool/dbconfig/20240208-075549-marostegui.json
  • 07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P56510 and previous config saved to /var/cache/conftool/dbconfig/20240208-075534-root.json
  • 07:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T355609)', diff saved to https://phabricator.wikimedia.org/P56509 and previous config saved to /var/cache/conftool/dbconfig/20240208-075526-marostegui.json
  • 07:51 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 07:49 vgutierrez: reboot ncredir2002 to validate https://gerrit.wikimedia.org/r/c/operations/puppet/+/998438
  • 07:45 vgutierrez: repool ncredir2001
  • 07:44 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 07:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P56508 and previous config saved to /var/cache/conftool/dbconfig/20240208-074029-root.json
  • 07:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P56507 and previous config saved to /var/cache/conftool/dbconfig/20240208-074019-marostegui.json
  • 07:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 07:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 07:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2140 as able to serve API', diff saved to https://phabricator.wikimedia.org/P56506 and previous config saved to /var/cache/conftool/dbconfig/20240208-072808-arnaudb.json
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P56505 and previous config saved to /var/cache/conftool/dbconfig/20240208-072523-root.json
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P56504 and previous config saved to /var/cache/conftool/dbconfig/20240208-072512-marostegui.json
  • 07:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2140 T355658', diff saved to https://phabricator.wikimedia.org/P56503 and previous config saved to /var/cache/conftool/dbconfig/20240208-071916-arnaudb.json
  • 07:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db2179 to s4 primary and set section read-write T355658', diff saved to https://phabricator.wikimedia.org/P56502 and previous config saved to /var/cache/conftool/dbconfig/20240208-071559-arnaudb.json
  • 07:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - T355658', diff saved to https://phabricator.wikimedia.org/P56501 and previous config saved to /var/cache/conftool/dbconfig/20240208-071414-arnaudb.json
  • 07:12 arnaudb: Starting s4 codfw failover from db2140 to db2179 - T355658
  • 07:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P56500 and previous config saved to /var/cache/conftool/dbconfig/20240208-071018-root.json
  • 07:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T355609)', diff saved to https://phabricator.wikimedia.org/P56499 and previous config saved to /var/cache/conftool/dbconfig/20240208-071006-marostegui.json
  • 06:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2120 (T355609)', diff saved to https://phabricator.wikimedia.org/P56498 and previous config saved to /var/cache/conftool/dbconfig/20240208-065742-marostegui.json
  • 06:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 06:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 06:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T355609)', diff saved to https://phabricator.wikimedia.org/P56497 and previous config saved to /var/cache/conftool/dbconfig/20240208-065720-marostegui.json
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2032 back to es1 primary T351916', diff saved to https://phabricator.wikimedia.org/P56496 and previous config saved to /var/cache/conftool/dbconfig/20240208-065607-root.json
  • 06:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 1%: After reimage', diff saved to https://phabricator.wikimedia.org/P56495 and previous config saved to /var/cache/conftool/dbconfig/20240208-065513-root.json
  • 06:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2179 with weight 0 T355658', diff saved to https://phabricator.wikimedia.org/P56494 and previous config saved to /var/cache/conftool/dbconfig/20240208-064802-arnaudb.json
  • 06:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 38 hosts with reason: Primary switchover s4 T355658
  • 06:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 38 hosts with reason: Primary switchover s4 T355658
  • 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P56493 and previous config saved to /var/cache/conftool/dbconfig/20240208-064213-marostegui.json
  • 06:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2032.codfw.wmnet with OS bookworm
  • 06:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P56492 and previous config saved to /var/cache/conftool/dbconfig/20240208-062706-marostegui.json
  • 06:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2032.codfw.wmnet with reason: host reimage
  • 06:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2032.codfw.wmnet with reason: host reimage
  • 06:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T355609)', diff saved to https://phabricator.wikimedia.org/P56491 and previous config saved to /var/cache/conftool/dbconfig/20240208-061200-marostegui.json
  • 06:03 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es2032.codfw.wmnet with OS bookworm
  • 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2032 T351916', diff saved to https://phabricator.wikimedia.org/P56490 and previous config saved to /var/cache/conftool/dbconfig/20240208-060226-root.json
  • 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2020 to es1 primary T351916', diff saved to https://phabricator.wikimedia.org/P56489 and previous config saved to /var/cache/conftool/dbconfig/20240208-060204-root.json
  • 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2108 (T355609)', diff saved to https://phabricator.wikimedia.org/P56488 and previous config saved to /var/cache/conftool/dbconfig/20240208-055944-marostegui.json
  • 05:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 05:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2103 es2020 T355862', diff saved to https://phabricator.wikimedia.org/P56487 and previous config saved to /var/cache/conftool/dbconfig/20240208-055316-root.json
  • 05:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 05:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 05:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 02:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2198.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2196.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2197.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2198.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:18 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2198 to codfw - jhancock@cumin2002"
  • 02:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2198 to codfw - jhancock@cumin2002"
  • 02:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 02:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2197.mgmt.codfw.wmnet with reboot policy FORCED
  • 02:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2197 to codfw - jhancock@cumin2002"
  • 02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2197 to codfw - jhancock@cumin2002"
  • 02:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 01:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2196.mgmt.codfw.wmnet with reboot policy FORCED
  • 01:57 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2196 to codfw - jhancock@cumin2002"
  • 01:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2196 to codfw - jhancock@cumin2002"
  • 01:54 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 00:43 ejegg: fundraising civicrm upgraded from 98d35c79 to c66b04bd

2024-02-07

  • 23:54 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 22:49 brett: Uploaded ncmonitor 0.0.2 to bookworm-wikimedia archive
  • 22:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T347624, testing 961878 patch) xfer categories from wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 22:46 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T347624, testing 961878 patch) xfer categories from wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
  • 22:16 ebernhardson@deploy2002: Finished scap: Backport for cirrus: Re-enable writes to wikidata on cloudelastic (T352335) (duration: 09m 10s)
  • 22:10 ebernhardson@deploy2002: ebernhardson: Continuing with sync
  • 22:09 ebernhardson@deploy2002: ebernhardson: Backport for cirrus: Re-enable writes to wikidata on cloudelastic (T352335) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 22:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 22:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T355609)', diff saved to https://phabricator.wikimedia.org/P56485 and previous config saved to /var/cache/conftool/dbconfig/20240207-220824-marostegui.json
  • 22:07 ebernhardson@deploy2002: Started scap: Backport for cirrus: Re-enable writes to wikidata on cloudelastic (T352335)
  • 22:07 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 21:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P56484 and previous config saved to /var/cache/conftool/dbconfig/20240207-215317-marostegui.json
  • 21:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P56483 and previous config saved to /var/cache/conftool/dbconfig/20240207-213810-marostegui.json
  • 21:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T355609)', diff saved to https://phabricator.wikimedia.org/P56482 and previous config saved to /var/cache/conftool/dbconfig/20240207-212304-marostegui.json
  • 21:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T355609)', diff saved to https://phabricator.wikimedia.org/P56481 and previous config saved to /var/cache/conftool/dbconfig/20240207-211803-marostegui.json
  • 21:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 21:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 21:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T355609)', diff saved to https://phabricator.wikimedia.org/P56480 and previous config saved to /var/cache/conftool/dbconfig/20240207-211741-marostegui.json
  • 21:09 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1008.eqiad.wmnet with OS bullseye
  • 21:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P56479 and previous config saved to /var/cache/conftool/dbconfig/20240207-210235-marostegui.json
  • 20:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P56478 and previous config saved to /var/cache/conftool/dbconfig/20240207-204728-marostegui.json
  • 20:43 brennen@deploy2002: Finished scap: Backport for Fix regression in HLS track content type (T356780) (duration: 10m 20s)
  • 20:37 brennen@deploy2002: brennen: Continuing with sync
  • 20:37 brennen@deploy2002: brennen: Backport for Fix regression in HLS track content type (T356780) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:33 brennen@deploy2002: Started scap: Backport for Fix regression in HLS track content type (T356780)
  • 20:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T355609)', diff saved to https://phabricator.wikimedia.org/P56477 and previous config saved to /var/cache/conftool/dbconfig/20240207-203222-marostegui.json
  • 20:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T355609)', diff saved to https://phabricator.wikimedia.org/P56475 and previous config saved to /var/cache/conftool/dbconfig/20240207-202123-marostegui.json
  • 20:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 20:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 20:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T355609)', diff saved to https://phabricator.wikimedia.org/P56474 and previous config saved to /var/cache/conftool/dbconfig/20240207-202101-marostegui.json
  • 20:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.eqiad.wmnet with OS bullseye
  • 20:09 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1008
  • 20:08 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1008
  • 20:07 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:07 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: migrate cloudelastic1008 to private IPs - bking@cumin2002"
  • 20:06 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: migrate cloudelastic1008 to private IPs - bking@cumin2002"
  • 20:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P56473 and previous config saved to /var/cache/conftool/dbconfig/20240207-200555-marostegui.json
  • 20:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@ea0a3db]: Analytics Hotfix [airflow-dags/analytics@ea0a3db2] (duration: 00m 40s)
  • 20:03 joal@deploy2002: Started deploy [airflow-dags/analytics@ea0a3db]: Analytics Hotfix [airflow-dags/analytics@ea0a3db2]
  • 20:00 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudelastic1008.wikimedia.org
  • 19:56 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:56 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic1008.wikimedia.org decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 19:55 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic1008.wikimedia.org decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 19:52 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 19:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P56472 and previous config saved to /var/cache/conftool/dbconfig/20240207-195047-marostegui.json
  • 19:47 joal@deploy2002: Finished deploy [analytics/refinery@80b329b] (hadoop-test): Analytics Hotfix - TEST [analytics/refinery@80b329b5] (duration: 03m 40s)
  • 19:45 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudelastic1008.wikimedia.org
  • 19:43 joal@deploy2002: Started deploy [analytics/refinery@80b329b] (hadoop-test): Analytics Hotfix - TEST [analytics/refinery@80b329b5]
  • 19:42 joal@deploy2002: Finished deploy [analytics/refinery@80b329b] (thin): Analytics Hotfix -THIN [analytics/refinery@80b329b5] (duration: 00m 05s)
  • 19:42 joal@deploy2002: Started deploy [analytics/refinery@80b329b] (thin): Analytics Hotfix -THIN [analytics/refinery@80b329b5]
  • 19:42 joal@deploy2002: Finished deploy [analytics/refinery@80b329b]: Analytics Hotfix [analytics/refinery@80b329b5] (duration: 10m 28s)
  • 19:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T355609)', diff saved to https://phabricator.wikimedia.org/P56471 and previous config saved to /var/cache/conftool/dbconfig/20240207-193540-marostegui.json
  • 19:32 joal@deploy2002: Started deploy [analytics/refinery@80b329b]: Analytics Hotfix [analytics/refinery@80b329b5]
  • 19:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T355609)', diff saved to https://phabricator.wikimedia.org/P56470 and previous config saved to /var/cache/conftool/dbconfig/20240207-193016-marostegui.json
  • 19:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 19:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 19:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T355609)', diff saved to https://phabricator.wikimedia.org/P56469 and previous config saved to /var/cache/conftool/dbconfig/20240207-192953-marostegui.json
  • 19:19 mutante: people1004 systemctl stop confd; running puppet; checking to remove confd remnants from people* hosts - T356296
  • 19:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P56468 and previous config saved to /var/cache/conftool/dbconfig/20240207-191446-marostegui.json
  • 19:01 brennen: train 1.42.0-wmf.17 (T354435): a couple of blockers currently, waiting on resolution before rolling
  • 18:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P56467 and previous config saved to /var/cache/conftool/dbconfig/20240207-185940-marostegui.json
  • 18:49 btullis@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 18:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T355609)', diff saved to https://phabricator.wikimedia.org/P56466 and previous config saved to /var/cache/conftool/dbconfig/20240207-184433-marostegui.json
  • 18:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T355609)', diff saved to https://phabricator.wikimedia.org/P56465 and previous config saved to /var/cache/conftool/dbconfig/20240207-183912-marostegui.json
  • 18:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 18:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 18:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T355609)', diff saved to https://phabricator.wikimedia.org/P56464 and previous config saved to /var/cache/conftool/dbconfig/20240207-183849-marostegui.json
  • 18:30 btullis@cumin1002: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 18:25 btullis@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-jumbo-eqiad
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P56463 and previous config saved to /var/cache/conftool/dbconfig/20240207-182342-marostegui.json
  • 18:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P56462 and previous config saved to /var/cache/conftool/dbconfig/20240207-180835-marostegui.json
  • 17:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T355609)', diff saved to https://phabricator.wikimedia.org/P56461 and previous config saved to /var/cache/conftool/dbconfig/20240207-175328-marostegui.json
  • 17:52 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
  • 17:52 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
  • 17:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T355609)', diff saved to https://phabricator.wikimedia.org/P56460 and previous config saved to /var/cache/conftool/dbconfig/20240207-174807-marostegui.json
  • 17:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 17:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 17:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T355609)', diff saved to https://phabricator.wikimedia.org/P56459 and previous config saved to /var/cache/conftool/dbconfig/20240207-174745-marostegui.json
  • 17:32 jgiannelos@deploy2002: Finished deploy [restbase/deploy@1007273]: Disabling storage for jawiki (duration: 07m 19s)
  • 17:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P56458 and previous config saved to /var/cache/conftool/dbconfig/20240207-173238-marostegui.json
  • 17:26 btullis@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
  • 17:25 jgiannelos@deploy2002: Started deploy [restbase/deploy@1007273]: Disabling storage for jawiki
  • 17:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P56457 and previous config saved to /var/cache/conftool/dbconfig/20240207-171732-marostegui.json
  • 17:11 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=thumbor
  • 17:04 sbailey@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 17:04 sbailey@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 17:03 sbailey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 17:03 sbailey@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 17:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T355609)', diff saved to https://phabricator.wikimedia.org/P56456 and previous config saved to /var/cache/conftool/dbconfig/20240207-170225-marostegui.json
  • 16:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T355609)', diff saved to https://phabricator.wikimedia.org/P56455 and previous config saved to /var/cache/conftool/dbconfig/20240207-165703-marostegui.json
  • 16:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:55 sbailey@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 16:54 sbailey@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 16:52 hnowlan@cumin2002: conftool action : set/pooled=yes; selector: name=(mw2377.codfw.wmnet|mw2378.codfw.wmnet|mw2406.codfw.wmnet|mw2301.codfw.wmnet|mw2310.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 16:52 hnowlan@cumin2002: conftool action : set/weight=10; selector: name=(mw2377.codfw.wmnet|mw2378.codfw.wmnet|mw2406.codfw.wmnet|mw2301.codfw.wmnet|mw2310.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 16:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 16:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 16:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T355609)', diff saved to https://phabricator.wikimedia.org/P56454 and previous config saved to /var/cache/conftool/dbconfig/20240207-164738-marostegui.json
  • 16:47 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for asw-a-codfw,cr[1-2]-codfw,lsw1-a2-codfw.mgmt
  • 16:47 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for asw-a-codfw,cr[1-2]-codfw,lsw1-a2-codfw.mgmt
  • 16:47 ejegg: fundraising civicrm upgraded from c3dff157 to 98d35c79
  • 16:46 hnowlan: homer 'cr*codfw*' commit 'T354791' for 5 new k8s ex-appservers
  • 16:39 btullis@cumin1002: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 16:35 sbailey@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 16:34 sbailey@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 16:33 sbailey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 16:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P56452 and previous config saved to /var/cache/conftool/dbconfig/20240207-163231-marostegui.json
  • 16:32 sbailey@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 16:25 sbailey@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 16:24 sbailey@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 16:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P56451 and previous config saved to /var/cache/conftool/dbconfig/20240207-161725-marostegui.json
  • 16:17 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-cache2001.codfw.wmnet
  • 16:16 klausman@cumin2002: START - Cookbook sre.hosts.remove-downtime for ml-cache2001.codfw.wmnet
  • 16:16 Emperor: repool codfw dnsdisc T355861
  • 16:16 mvernon@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=swift,name=codfw
  • 16:16 Emperor: repool thanos-fe2001 T355861
  • 16:15 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki ukwiki --current --all --touched-after=20230613000000 --start '["1685316"]' | tee ~/T315510-ukwiki # in tmux
  • 16:10 herron: hard reboot titan1002
  • 16:07 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 16:05 jelto: import etherpad-lite 1.9.7-1 on apt1001 host - T316421
  • 16:04 vgutierrez: <topranks> Commencing server uplink moves from old switch to new in codfw rack A2 T355861
  • 16:03 Lucas_WMDE: STOP persistRevisionThreadItems on rowiki for T315510 – according to T315510#9328399, it should be done already (it was at --start '["2075226"]' and had processed 31000, updated 0) [relog from 15:45, stashbot was down]
  • 15:42 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2377.codfw.wmnet with OS bullseye
  • 15:40 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --start '["67578461"]' | tee ~/T315510-enwiki # in tmux
  • 15:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P56447 and previous config saved to /var/cache/conftool/dbconfig/20240207-153656-marostegui.json
  • 15:34 Lucas_WMDE: backport+config window done
  • 15:33 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2378.codfw.wmnet with reason: host reimage
  • 15:32 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for WebVideoTranscodeJob: also add time limits (T356780) (duration: 07m 48s)
  • 15:31 Lucas_WMDE: STOP persistRevisionThreadItems on frwiki for T315510 – 100% CPU usage, 15G RAM and counting, no progress output: clearly stuck on something
  • 15:30 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2406.codfw.wmnet with reason: host reimage
  • 15:28 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2301.codfw.wmnet with reason: host reimage
  • 15:26 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and oblivian: Continuing with sync
  • 15:26 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and oblivian: Backport for WebVideoTranscodeJob: also add time limits (T356780) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:25 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2310.codfw.wmnet with reason: host reimage
  • 15:24 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for WebVideoTranscodeJob: also add time limits (T356780)
  • 15:22 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2377.codfw.wmnet with reason: host reimage
  • 15:21 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2310.codfw.wmnet with reason: host reimage
  • 15:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P56446 and previous config saved to /var/cache/conftool/dbconfig/20240207-152150-marostegui.json
  • 15:21 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2301.codfw.wmnet with reason: host reimage
  • 15:21 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2406.codfw.wmnet with reason: host reimage
  • 15:20 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2378.codfw.wmnet with reason: host reimage
  • 15:20 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2377.codfw.wmnet with reason: host reimage
  • 15:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2044-2050].codfw.wmnet
  • 15:13 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:13 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2044-2050].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
  • 15:12 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki rowiki --current --all --touched-after=20230613000000 --start '["2041962"]' | tee ~/T315510-rowiki # in tmux
  • 15:10 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2044-2050].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
  • 15:07 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 15:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T355609)', diff saved to https://phabricator.wikimedia.org/P56445 and previous config saved to /var/cache/conftool/dbconfig/20240207-150643-marostegui.json
  • 15:05 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki frwiki --current --all --touched-after=20230613000000 --start '["7544396"]' | tee ~/T315510-frwiki # in tmux
  • 15:05 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2310.codfw.wmnet with OS bullseye
  • 15:05 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2301.codfw.wmnet with OS bullseye
  • 15:05 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2406.codfw.wmnet with OS bullseye
  • 15:04 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2378.codfw.wmnet with OS bullseye
  • 15:04 Lucas_WMDE: STOP script for T315510, forgot to tee it somewhere useful
  • 15:04 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host mw2377.codfw.wmnet with OS bullseye
  • 15:02 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint2002:~$ mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki frwiki --current --all --touched-after=20230613000000 --start '["7544396"]' # T315510, in tmux
  • 15:01 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-role (exit_code=99) for role: elasticsearch::cirrus
  • 15:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T355609)', diff saved to https://phabricator.wikimedia.org/P56444 and previous config saved to /var/cache/conftool/dbconfig/20240207-150121-marostegui.json
  • 15:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:58 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for ParserObserver: Limit the size of cache of previous parse traces (T351732), ParserObserver: Limit the size of cache of previous parse traces (T351732) (duration: 08m 08s)
  • 14:57 vgutierrez: reboot ncredir2001
  • 14:52 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and matmarex: Continuing with sync
  • 14:52 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and matmarex: Backport for ParserObserver: Limit the size of cache of previous parse traces (T351732), ParserObserver: Limit the size of cache of previous parse traces (T351732) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:50 vgutierrez: reboot ncredir2001
  • 14:50 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for ParserObserver: Limit the size of cache of previous parse traces (T351732), ParserObserver: Limit the size of cache of previous parse traces (T351732)
  • 14:48 arnaudb@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 100%: kernel upgrade done', diff saved to https://phabricator.wikimedia.org/P56443 and previous config saved to /var/cache/conftool/dbconfig/20240207-144822-arnaudb.json
  • 14:44 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ml-cache2001.codfw.wmnet with reason: Machine network link move (T355861)
  • 14:44 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on ml-cache2001.codfw.wmnet with reason: Machine network link move (T355861)
  • 14:40 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2044-2050].codfw.wmnet
  • 14:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 75%: kernel upgrade done', diff saved to https://phabricator.wikimedia.org/P56442 and previous config saved to /var/cache/conftool/dbconfig/20240207-143317-arnaudb.json
  • 14:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: elasticsearch::cirrus
  • 14:32 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 14:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ms-be2045.codfw.wmnet
  • 14:32 mvernon@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2045.codfw.wmnet
  • 14:32 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 14:29 volans: deploying debmonitor-client_0.3.5 fleet-wide
  • 14:24 arnaudb@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 100%: kernel upgrade done', diff saved to https://phabricator.wikimedia.org/P56441 and previous config saved to /var/cache/conftool/dbconfig/20240207-142423-arnaudb.json
  • 14:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 14:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 14:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 60%: kernel upgrade done', diff saved to https://phabricator.wikimedia.org/P56440 and previous config saved to /var/cache/conftool/dbconfig/20240207-141812-arnaudb.json
  • 14:17 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for Fix PermissionException being logged (T356223), Fix PermissionException being logged (T356223) (duration: 08m 08s)
  • 14:11 logmsgbot: lucaswerkmeister-wmde@deploy2002 jforrester and lucaswerkmeister-wmde: Continuing with sync
  • 14:11 logmsgbot: lucaswerkmeister-wmde@deploy2002 jforrester and lucaswerkmeister-wmde: Backport for Fix PermissionException being logged (T356223), Fix PermissionException being logged (T356223) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:09 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for Fix PermissionException being logged (T356223), Fix PermissionException being logged (T356223)
  • 14:09 arnaudb@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 75%: kernel upgrade done', diff saved to https://phabricator.wikimedia.org/P56439 and previous config saved to /var/cache/conftool/dbconfig/20240207-140918-arnaudb.json
  • 14:03 arnaudb@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 30%: kernel upgrade done', diff saved to https://phabricator.wikimedia.org/P56438 and previous config saved to /var/cache/conftool/dbconfig/20240207-140306-arnaudb.json
  • 13:54 arnaudb@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 60%: kernel upgrade done', diff saved to https://phabricator.wikimedia.org/P56437 and previous config saved to /var/cache/conftool/dbconfig/20240207-135412-arnaudb.json
  • 13:54 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:53 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:53 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:52 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:52 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:52 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 13:48 arnaudb@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 15%: kernel upgrade done', diff saved to https://phabricator.wikimedia.org/P56436 and previous config saved to /var/cache/conftool/dbconfig/20240207-134801-arnaudb.json
  • 13:39 arnaudb@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 30%: kernel upgrade done', diff saved to https://phabricator.wikimedia.org/P56435 and previous config saved to /var/cache/conftool/dbconfig/20240207-133907-arnaudb.json
  • 13:32 jmm@cumin2002: END (FAIL) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=1) rolling reboot on A:ncredir
  • 13:26 arnaudb@cumin1002: dbctl commit (dc=all): 'T344589 - depool es2024', diff saved to https://phabricator.wikimedia.org/P56434 and previous config saved to /var/cache/conftool/dbconfig/20240207-132559-arnaudb.json
  • 13:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2024.codfw.wmnet with reason: T344589 - kernel upgrade
  • 13:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on es2024.codfw.wmnet with reason: T344589 - kernel upgrade
  • 13:24 arnaudb@cumin1002: dbctl commit (dc=all): 'db2105 (re)pooling @ 15%: kernel upgrade done', diff saved to https://phabricator.wikimedia.org/P56433 and previous config saved to /var/cache/conftool/dbconfig/20240207-132402-arnaudb.json
  • 12:46 arnaudb@cumin1002: dbctl commit (dc=all): 'T344589 - depool db2105', diff saved to https://phabricator.wikimedia.org/P56432 and previous config saved to /var/cache/conftool/dbconfig/20240207-124605-arnaudb.json
  • 12:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2105.codfw.wmnet with reason: T344589 - kernel upgrade
  • 12:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2105.codfw.wmnet with reason: T344589 - kernel upgrade
  • 12:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T355609)', diff saved to https://phabricator.wikimedia.org/P56431 and previous config saved to /var/cache/conftool/dbconfig/20240207-124409-marostegui.json
  • 12:35 hnowlan@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T334488)
  • 12:34 hnowlan@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T334488)
  • 12:33 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir
  • 12:32 hnowlan@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T334488)
  • 12:31 hnowlan@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T334488)
  • 12:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1016.eqiad.wmnet
  • 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P56430 and previous config saved to /var/cache/conftool/dbconfig/20240207-122903-marostegui.json
  • 12:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1016.eqiad.wmnet
  • 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1015.eqiad.wmnet
  • 12:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1015.eqiad.wmnet
  • 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1014.eqiad.wmnet
  • 12:18 claime: trafficserver: move 40% of traffic to mw on k8s - T355532
  • 12:14 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 12:14 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 12:14 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1014.eqiad.wmnet
  • 12:14 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P56429 and previous config saved to /var/cache/conftool/dbconfig/20240207-121356-marostegui.json
  • 12:13 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 12:12 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 12:12 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 12:12 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 12:12 claime: mw-web, mw-api-ext: Raise replicas for 40% traffic - T355532
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1013.eqiad.wmnet
  • 12:02 volans: uploaded debmonitor-client_0.3.5 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
  • 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-eqiad
  • 11:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T355609)', diff saved to https://phabricator.wikimedia.org/P56428 and previous config saved to /var/cache/conftool/dbconfig/20240207-115849-marostegui.json
  • 11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1013.eqiad.wmnet
  • 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors
  • 11:56 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors
  • 11:56 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-eqiad
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw
  • 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors
  • 11:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors
  • 11:48 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T355609)', diff saved to https://phabricator.wikimedia.org/P56427 and previous config saved to /var/cache/conftool/dbconfig/20240207-113339-marostegui.json
  • 11:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 11:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T355609)', diff saved to https://phabricator.wikimedia.org/P56426 and previous config saved to /var/cache/conftool/dbconfig/20240207-113317-marostegui.json
  • 11:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P56425 and previous config saved to /var/cache/conftool/dbconfig/20240207-111810-marostegui.json
  • 11:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 11:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mariadb::parsercache
  • 11:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P56424 and previous config saved to /var/cache/conftool/dbconfig/20240207-110304-marostegui.json
  • 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic2109.codfw.wmnet
  • 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic2109.codfw.wmnet
  • 10:51 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: mariadb::parsercache
  • 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic2108.codfw.wmnet
  • 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T355609)', diff saved to https://phabricator.wikimedia.org/P56423 and previous config saved to /var/cache/conftool/dbconfig/20240207-104757-marostegui.json
  • 10:44 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic2108.codfw.wmnet
  • 10:39 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:37 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:36 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T355609)', diff saved to https://phabricator.wikimedia.org/P56422 and previous config saved to /var/cache/conftool/dbconfig/20240207-102535-marostegui.json
  • 10:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 10:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T355609)', diff saved to https://phabricator.wikimedia.org/P56421 and previous config saved to /var/cache/conftool/dbconfig/20240207-102513-marostegui.json
  • 10:24 Dreamy_Jazz: Finished security deploys for T356183
  • 10:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
  • 10:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1009.eqiad.wmnet
  • 10:19 logmsgbot: dreamyjazz Deployed security patch for T356183
  • 10:12 Dreamy_Jazz: Continuing security deploy for T356183
  • 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P56420 and previous config saved to /var/cache/conftool/dbconfig/20240207-101006-marostegui.json
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P56419 and previous config saved to /var/cache/conftool/dbconfig/20240207-095500-marostegui.json
  • 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic2107.codfw.wmnet
  • 09:45 logmsgbot: dreamyjazz Deployed security patch for T356183
  • 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic2107.codfw.wmnet
  • 09:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T355609)', diff saved to https://phabricator.wikimedia.org/P56418 and previous config saved to /var/cache/conftool/dbconfig/20240207-093953-marostegui.json
  • 09:31 jayme: removing a bunch of old kernel versions from chartmuseum* to free ~3.5GB disk space
  • 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: mariadb::core_test
  • 09:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P56417 and previous config saved to /var/cache/conftool/dbconfig/20240207-092614-root.json
  • 09:24 Dreamy_Jazz: Doing security deploy for T356183
  • 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T355609)', diff saved to https://phabricator.wikimedia.org/P56416 and previous config saved to /var/cache/conftool/dbconfig/20240207-092248-marostegui.json
  • 09:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T355609)', diff saved to https://phabricator.wikimedia.org/P56415 and previous config saved to /var/cache/conftool/dbconfig/20240207-092210-marostegui.json
  • 09:19 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: mariadb::core_test
  • 09:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P56414 and previous config saved to /var/cache/conftool/dbconfig/20240207-090703-marostegui.json
  • 09:03 arnaudb@cumin1002: dbctl commit (dc=all): 'mathching old db1238 weight https://phabricator.wikimedia.org/P56404', diff saved to https://phabricator.wikimedia.org/P56413 and previous config saved to /var/cache/conftool/dbconfig/20240207-090316-arnaudb.json
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P56412 and previous config saved to /var/cache/conftool/dbconfig/20240207-085157-marostegui.json
  • 08:47 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P56411 and previous config saved to /var/cache/conftool/dbconfig/20240207-084738-root.json
  • 08:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1238 to s4 primary T356649', diff saved to https://phabricator.wikimedia.org/P56410 and previous config saved to /var/cache/conftool/dbconfig/20240207-084654-arnaudb.json
  • 08:45 arnaudb: Starting s4 eqiad failover from db1160 to db1238 - T356649
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T355609)', diff saved to https://phabricator.wikimedia.org/P56409 and previous config saved to /var/cache/conftool/dbconfig/20240207-083650-marostegui.json
  • 08:32 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P56408 and previous config saved to /var/cache/conftool/dbconfig/20240207-083233-root.json
  • 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P56407 and previous config saved to /var/cache/conftool/dbconfig/20240207-081727-root.json
  • 08:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T355609)', diff saved to https://phabricator.wikimedia.org/P56406 and previous config saved to /var/cache/conftool/dbconfig/20240207-081433-marostegui.json
  • 08:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 08:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 08:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T355609)', diff saved to https://phabricator.wikimedia.org/P56405 and previous config saved to /var/cache/conftool/dbconfig/20240207-081410-marostegui.json
  • 08:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1238 with weight 0 T356649', diff saved to https://phabricator.wikimedia.org/P56404 and previous config saved to /var/cache/conftool/dbconfig/20240207-081220-arnaudb.json
  • 08:12 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P56403 and previous config saved to /var/cache/conftool/dbconfig/20240207-081213-root.json
  • 08:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 38 hosts with reason: Primary switchover s4 T356649
  • 08:11 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 38 hosts with reason: Primary switchover s4 T356649
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2030.codfw.wmnet
  • 08:02 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P56402 and previous config saved to /var/cache/conftool/dbconfig/20240207-080222-root.json
  • 08:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
  • 07:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2029.codfw.wmnet
  • 07:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P56401 and previous config saved to /var/cache/conftool/dbconfig/20240207-075904-marostegui.json
  • 07:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
  • 07:57 oblivian@deploy2002: Finished scap: Backport for Set the memory limit in bytes. (T356780) (duration: 09m 36s)
  • 07:57 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P56400 and previous config saved to /var/cache/conftool/dbconfig/20240207-075708-root.json
  • 07:51 moritzm: rebalance ganeti codfw/row B following completed switch maintenance T355860
  • 07:50 oblivian@deploy2002: oblivian and jforrester: Continuing with sync
  • 07:49 oblivian@deploy2002: oblivian and jforrester: Backport for Set the memory limit in bytes. (T356780) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:47 oblivian@deploy2002: Started scap: Backport for Set the memory limit in bytes. (T356780)
  • 07:47 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P56399 and previous config saved to /var/cache/conftool/dbconfig/20240207-074717-root.json
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P56398 and previous config saved to /var/cache/conftool/dbconfig/20240207-074357-marostegui.json
  • 07:42 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P56397 and previous config saved to /var/cache/conftool/dbconfig/20240207-074203-root.json
  • 07:32 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P56396 and previous config saved to /var/cache/conftool/dbconfig/20240207-073212-root.json
  • 07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T355609)', diff saved to https://phabricator.wikimedia.org/P56395 and previous config saved to /var/cache/conftool/dbconfig/20240207-072851-marostegui.json
  • 07:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P56394 and previous config saved to /var/cache/conftool/dbconfig/20240207-072657-root.json
  • 07:17 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 1%: After reimage', diff saved to https://phabricator.wikimedia.org/P56393 and previous config saved to /var/cache/conftool/dbconfig/20240207-071707-root.json
  • 07:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1032.eqiad.wmnet with OS bookworm
  • 07:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P56392 and previous config saved to /var/cache/conftool/dbconfig/20240207-071152-root.json
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2109 (T355609)', diff saved to https://phabricator.wikimedia.org/P56391 and previous config saved to /var/cache/conftool/dbconfig/20240207-070007-marostegui.json
  • 07:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T355609)', diff saved to https://phabricator.wikimedia.org/P56390 and previous config saved to /var/cache/conftool/dbconfig/20240207-065944-marostegui.json
  • 06:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1032.eqiad.wmnet with reason: host reimage
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P56389 and previous config saved to /var/cache/conftool/dbconfig/20240207-065647-root.json
  • 06:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1032.eqiad.wmnet with reason: host reimage
  • 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P56388 and previous config saved to /var/cache/conftool/dbconfig/20240207-064438-marostegui.json
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: After reimage', diff saved to https://phabricator.wikimedia.org/P56387 and previous config saved to /var/cache/conftool/dbconfig/20240207-064142-root.json
  • 06:41 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1032.eqiad.wmnet with OS bookworm
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1032', diff saved to https://phabricator.wikimedia.org/P56386 and previous config saved to /var/cache/conftool/dbconfig/20240207-063957-root.json
  • 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'Switch es1 master T351916', diff saved to https://phabricator.wikimedia.org/P56385 and previous config saved to /var/cache/conftool/dbconfig/20240207-063659-marostegui.json
  • 06:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2030.codfw.wmnet with OS bookworm
  • 06:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P56384 and previous config saved to /var/cache/conftool/dbconfig/20240207-062931-marostegui.json
  • 06:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2030.codfw.wmnet with reason: host reimage
  • 06:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2030.codfw.wmnet with reason: host reimage
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T355609)', diff saved to https://phabricator.wikimedia.org/P56383 and previous config saved to /var/cache/conftool/dbconfig/20240207-061424-marostegui.json
  • 05:55 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es2030.codfw.wmnet with OS bookworm
  • 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P56382 and previous config saved to /var/cache/conftool/dbconfig/20240207-055301-root.json
  • 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2105 (T355609)', diff saved to https://phabricator.wikimedia.org/P56381 and previous config saved to /var/cache/conftool/dbconfig/20240207-055210-marostegui.json
  • 05:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 05:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 02:11 zabe: zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki mediawikiwiki "Wikimedia Apps/Suggested edits" "Wikimedia Apps/Android Suggested edits" "Zabe" --reason "per request T348875"
  • 02:10 zabe@deploy2002: Finished scap: Backport for Deleting Ns:104 in itwikivoyage, throttle: Remove expired throttle (duration: 08m 22s)
  • 02:04 zabe@deploy2002: caenus and zabe: Continuing with sync
  • 02:03 zabe@deploy2002: caenus and zabe: Backport for Deleting Ns:104 in itwikivoyage, throttle: Remove expired throttle synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 02:02 zabe@deploy2002: Started scap: Backport for Deleting Ns:104 in itwikivoyage, throttle: Remove expired throttle
  • 01:30 zabe@deploy2002: Finished scap: Backport for Update mediawiki/mediawiki-codesniffer to 43.0.0 (duration: 08m 25s)
  • 01:24 zabe@deploy2002: zabe: Continuing with sync
  • 01:23 zabe@deploy2002: zabe: Backport for Update mediawiki/mediawiki-codesniffer to 43.0.0 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 01:22 zabe@deploy2002: Started scap: Backport for Update mediawiki/mediawiki-codesniffer to 43.0.0
  • 00:51 tzatziki: removing 21 files for legal compliance

2024-02-06

  • 23:24 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@034ea4b]: (no justification provided) (duration: 00m 27s)
  • 23:24 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@034ea4b]: (no justification provided)
  • 23:18 brennen@deploy2002: Finished scap: Backport for WebRequest: Fix default for backwards compat (T356800) (duration: 09m 02s)
  • 23:11 brennen@deploy2002: taavi and brennen: Continuing with sync
  • 23:10 brennen@deploy2002: taavi and brennen: Backport for WebRequest: Fix default for backwards compat (T356800) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:09 brennen@deploy2002: Started scap: Backport for WebRequest: Fix default for backwards compat (T356800)
  • 22:27 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@11e5c60]: (no justification provided) (duration: 00m 28s)
  • 22:27 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@11e5c60]: (no justification provided)
  • 22:01 cjming: end of UTC late backport window
  • 22:01 cjming@deploy2002: Finished scap: Backport for Reduce font size of diff heading (T356728) (duration: 08m 24s)
  • 21:54 cjming@deploy2002: cjming and jdlrobson: Continuing with sync
  • 21:54 cjming@deploy2002: cjming and jdlrobson: Backport for Reduce font size of diff heading (T356728) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:52 cjming@deploy2002: Started scap: Backport for Reduce font size of diff heading (T356728)
  • 21:49 ejegg: fundraising civicrm upgraded from 684eb057 to c3dff157
  • 21:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 21:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 21:37 cjming@deploy2002: Finished scap: Backport for Reduce font size of diff heading (T356728) (duration: 08m 37s)
  • 21:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 21:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 21:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T355609)', diff saved to https://phabricator.wikimedia.org/P56380 and previous config saved to /var/cache/conftool/dbconfig/20240206-213621-marostegui.json
  • 21:31 cjming@deploy2002: cjming and jdlrobson: Continuing with sync
  • 21:30 cjming@deploy2002: cjming and jdlrobson: Backport for Reduce font size of diff heading (T356728) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:29 cjming@deploy2002: Started scap: Backport for Reduce font size of diff heading (T356728)
  • 21:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P56379 and previous config saved to /var/cache/conftool/dbconfig/20240206-212114-marostegui.json
  • 21:07 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@916bff2]: (no justification provided) (duration: 00m 29s)
  • 21:07 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@916bff2]: (no justification provided)
  • 21:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P56378 and previous config saved to /var/cache/conftool/dbconfig/20240206-210607-marostegui.json
  • 20:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T355609)', diff saved to https://phabricator.wikimedia.org/P56377 and previous config saved to /var/cache/conftool/dbconfig/20240206-205101-marostegui.json
  • 20:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T355609)', diff saved to https://phabricator.wikimedia.org/P56376 and previous config saved to /var/cache/conftool/dbconfig/20240206-204115-marostegui.json
  • 20:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 20:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 20:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T355609)', diff saved to https://phabricator.wikimedia.org/P56375 and previous config saved to /var/cache/conftool/dbconfig/20240206-204053-marostegui.json
  • 20:27 bking@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=cloudelastic,name=cloudelastic1009.eqiad.wmnet
  • 20:27 bking@cumin2002: conftool action : set/weight=10; selector: name=cloudelastic1009.eqiad.wmnet
  • 20:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P56374 and previous config saved to /var/cache/conftool/dbconfig/20240206-202546-marostegui.json
  • 20:21 joal@deploy2002: Finished deploy [airflow-dags/analytics@09b8dc5]: Regular analytics weekly train [airflow-dags/analytics@09b8dc55] (duration: 00m 28s)
  • 20:21 joal@deploy2002: Started deploy [airflow-dags/analytics@09b8dc5]: Regular analytics weekly train [airflow-dags/analytics@09b8dc55]
  • 20:10 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@5f38647]: (no justification provided) (duration: 00m 27s)
  • 20:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P56373 and previous config saved to /var/cache/conftool/dbconfig/20240206-201039-marostegui.json
  • 20:10 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@5f38647]: (no justification provided)
  • 20:07 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
  • 20:07 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
  • 19:57 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@93fa570]: (no justification provided) (duration: 00m 28s)
  • 19:56 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@93fa570]: (no justification provided)
  • 19:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T355609)', diff saved to https://phabricator.wikimedia.org/P56372 and previous config saved to /var/cache/conftool/dbconfig/20240206-195532-marostegui.json
  • 19:53 joal@deploy2002: Finished deploy [analytics/refinery@718fc41] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@718fc417] (duration: 03m 33s)
  • 19:49 joal@deploy2002: Started deploy [analytics/refinery@718fc41] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@718fc417]
  • 19:49 joal@deploy2002: Finished deploy [analytics/refinery@718fc41] (thin): Regular analytics weekly train THIN [analytics/refinery@718fc417] (duration: 00m 06s)
  • 19:49 joal@deploy2002: Started deploy [analytics/refinery@718fc41] (thin): Regular analytics weekly train THIN [analytics/refinery@718fc417]
  • 19:47 joal@deploy2002: Finished deploy [analytics/refinery@718fc41]: Regular analytics weekly train [analytics/refinery@718fc417] (duration: 12m 17s)
  • 19:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T355609)', diff saved to https://phabricator.wikimedia.org/P56371 and previous config saved to /var/cache/conftool/dbconfig/20240206-194639-marostegui.json
  • 19:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 19:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 19:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T355609)', diff saved to https://phabricator.wikimedia.org/P56370 and previous config saved to /var/cache/conftool/dbconfig/20240206-194558-marostegui.json
  • 19:35 joal@deploy2002: Started deploy [analytics/refinery@718fc41]: Regular analytics weekly train [analytics/refinery@718fc417]
  • 19:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P56368 and previous config saved to /var/cache/conftool/dbconfig/20240206-193052-marostegui.json
  • 19:23 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "finish cloudelastic1009 private IP migration - bking@cumin2002 - T355617"
  • 19:22 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "finish cloudelastic1009 private IP migration - bking@cumin2002 - T355617"
  • 19:21 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.42.0-wmf.17 refs T354435
  • 19:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye
  • 19:21 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 19:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P56367 and previous config saved to /var/cache/conftool/dbconfig/20240206-191544-marostegui.json
  • 19:06 brennen: train 1.42.0-wmf.17: considering unblocked for group0, rolling forward.
  • 19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T355609)', diff saved to https://phabricator.wikimedia.org/P56366 and previous config saved to /var/cache/conftool/dbconfig/20240206-190037-marostegui.json
  • 18:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T355609)', diff saved to https://phabricator.wikimedia.org/P56365 and previous config saved to /var/cache/conftool/dbconfig/20240206-185223-marostegui.json
  • 18:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 18:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 18:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T355609)', diff saved to https://phabricator.wikimedia.org/P56364 and previous config saved to /var/cache/conftool/dbconfig/20240206-185201-marostegui.json
  • 18:48 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase[2013-2020].codfw.wmnet
  • 18:48 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:48 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[2013-2020].codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
  • 18:47 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[2013-2020].codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
  • 18:45 eevans@cumin1002: START - Cookbook sre.dns.netbox
  • 18:42 oblivian@deploy2002: Finished scap: Backport for Do not add env variables when they're empty (T356780) (duration: 11m 57s)
  • 18:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P56363 and previous config saved to /var/cache/conftool/dbconfig/20240206-183654-marostegui.json
  • 18:36 oblivian@deploy2002: oblivian: Continuing with sync
  • 18:32 oblivian@deploy2002: oblivian: Backport for Do not add env variables when they're empty (T356780) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:30 oblivian@deploy2002: Started scap: Backport for Do not add env variables when they're empty (T356780)
  • 18:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 7 hosts with reason: T355860
  • 18:27 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 7 hosts with reason: T355860
  • 18:22 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts restbase[2013-2020].codfw.wmnet
  • 18:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P56362 and previous config saved to /var/cache/conftool/dbconfig/20240206-182148-marostegui.json
  • 18:20 kamila_: wikikube codfw: uncordon new nodes
  • 18:13 kamila_: wikikube codfw: belated homer commit of new nodes
  • 18:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T355609)', diff saved to https://phabricator.wikimedia.org/P56360 and previous config saved to /var/cache/conftool/dbconfig/20240206-180641-marostegui.json
  • 17:59 kamila_: wikikube codfw: drain newly added nodes
  • 17:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T355609)', diff saved to https://phabricator.wikimedia.org/P56359 and previous config saved to /var/cache/conftool/dbconfig/20240206-175822-marostegui.json
  • 17:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 17:58 claime: uncordoning kubernetes2010
  • 17:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 17:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T355609)', diff saved to https://phabricator.wikimedia.org/P56358 and previous config saved to /var/cache/conftool/dbconfig/20240206-175800-marostegui.json
  • 17:56 kamila_: wikikube: cordon nodes added earlier today in codfw
  • 17:51 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2010.codfw.wmnet
  • 17:47 bking@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bullseye
  • 17:43 cgoubert@cumin2002: START - Cookbook sre.hosts.reboot-single for host kubernetes2010.codfw.wmnet
  • 17:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P56357 and previous config saved to /var/cache/conftool/dbconfig/20240206-174253-marostegui.json
  • 17:37 claime: rebooting kubernetes2010.codfw.wmnet
  • 17:36 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sessionstore[1001-1003].eqiad.wmnet
  • 17:36 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:36 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sessionstore[1001-1003].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
  • 17:35 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sessionstore[1001-1003].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
  • 17:33 eevans@cumin1002: START - Cookbook sre.dns.netbox
  • 17:27 cgoubert@cumin2002: conftool action : set/pooled=yes; selector: name=mw.*,dc=eqiad,cluster=kubernetes,service=kubesvc
  • 17:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P56356 and previous config saved to /var/cache/conftool/dbconfig/20240206-172747-marostegui.json
  • 17:27 cgoubert@cumin2002: conftool action : set/weight=10; selector: name=mw.*,dc=eqiad,cluster=kubernetes,service=kubesvc
  • 17:26 oblivian@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,service=kubesvc,name=mw.*
  • 17:25 oblivian@puppetmaster1001: conftool action : set/weight=10; selector: dc=codfw,service=kubesvc,name=mw.*
  • 17:22 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts sessionstore[1001-1003].eqiad.wmnet
  • 17:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T355609)', diff saved to https://phabricator.wikimedia.org/P56355 and previous config saved to /var/cache/conftool/dbconfig/20240206-171240-marostegui.json
  • 17:11 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
  • 17:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T355609)', diff saved to https://phabricator.wikimedia.org/P56354 and previous config saved to /var/cache/conftool/dbconfig/20240206-170431-marostegui.json
  • 17:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 17:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 17:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T355609)', diff saved to https://phabricator.wikimedia.org/P56353 and previous config saved to /var/cache/conftool/dbconfig/20240206-170408-marostegui.json
  • 16:54 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad
  • 16:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage
  • 16:51 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage
  • 16:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P56352 and previous config saved to /var/cache/conftool/dbconfig/20240206-164902-marostegui.json
  • 16:38 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 16:35 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye
  • 16:35 claime: Roll-restarting mw-api-ext deployment in codfw
  • 16:34 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.mgmt.eqiad.wmnet on all recursors
  • 16:34 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.mgmt.eqiad.wmnet on all recursors
  • 16:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P56349 and previous config saved to /var/cache/conftool/dbconfig/20240206-163355-marostegui.json
  • 16:30 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=(cdn|ats-be)
  • 16:30 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=(cdn|ats-be)
  • 16:29 bking@cumin2002: conftool action : set/pooled=yes; selector: name=wdqs2016.codfw.wmnet
  • 16:29 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad
  • 16:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp[2033-2034].codfw.wmnet
  • 16:29 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp[2033-2034].codfw.wmnet
  • 16:27 Daimona: T353459 Running mwscript CampaignEvents:GenerateInvitationList --wiki=metawiki --listfile=/home/daimona/list.txt
  • 16:26 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2087*,elastic2037*,elastic2038*,elastic2055*,elastic2088*,elastic2073*,elastic2074* for switch maintenance - bking@cumin2002 - T355860
  • 16:26 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2087*,elastic2037*,elastic2038*,elastic2055*,elastic2088*,elastic2073*,elastic2074* for switch maintenance - bking@cumin2002 - T355860
  • 16:26 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T355609)', diff saved to https://phabricator.wikimedia.org/P56348 and previous config saved to /var/cache/conftool/dbconfig/20240206-161849-marostegui.json
  • 16:18 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
  • 16:18 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
  • 16:15 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:13 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:10 topranks: Hosts migrated and basic connectivity ok codfw rack B4 T355860
  • 16:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T355609)', diff saved to https://phabricator.wikimedia.org/P56347 and previous config saved to /var/cache/conftool/dbconfig/20240206-161043-marostegui.json
  • 16:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 16:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 16:08 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009
  • 16:07 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009
  • 16:05 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:05 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: migrate cloudelastic1009 to private IPs - bking@cumin2002"
  • 16:05 topranks: Commencing server uplink moves from old switch to new in codfw rack B4 T355860
  • 16:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: migrate cloudelastic1009 to private IPs - bking@cumin2002"
  • 16:03 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 23 hosts with reason: Migrate servers in codfw rack B4 from asw-b4-codfw to lsw1-b4-codfw
  • 16:02 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 23 hosts with reason: Migrate servers in codfw rack B4 from asw-b4-codfw to lsw1-b4-codfw
  • 16:01 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 16:01 jgiannelos@deploy2002: Finished deploy [restbase/deploy@05fa5c9]: Disabling storage for ptwiki (duration: 17m 39s)
  • 16:00 topranks: configuring lsw1-b4-codfw with port config for new hosts T355860
  • 15:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 15:59 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr[1-2]-codfw with reason: prepping for server uplink migration
  • 15:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw with reason: prepping for server uplink migration
  • 15:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 15:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on asw-b-codfw,lsw1-b4-codfw.mgmt with reason: prepping for server uplink migration
  • 15:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on asw-b-codfw,lsw1-b4-codfw.mgmt with reason: prepping for server uplink migration
  • 15:56 topranks: moving Netbox server uplinks from asw-b4-codfw to lsw1-b4-codfw to prep config for server moves T355860
  • 15:53 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudelastic1009.wikimedia.org
  • 15:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:52 btullis@deploy2002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 15:51 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 15:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp[2033-2034].codfw.wmnet with reason: T355860
  • 15:50 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp[2033-2034].codfw.wmnet with reason: T355860
  • 15:48 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudelastic1009.wikimedia.org
  • 15:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 15:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 15:46 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db2169.codfw.wmnet onto db2194.codfw.wmnet
  • 15:44 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2058*,elastic2070*,elastic2095*,elastic2096* for switch maintenance - bking@cumin2002 - T355860
  • 15:44 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2058*,elastic2070*,elastic2095*,elastic2096* for switch maintenance - bking@cumin2002 - T355860
  • 15:44 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2094.codfw.wmnet with OS bullseye
  • 15:43 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2058* for switch maintenance - bking@cumin2002 - T355860
  • 15:43 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2058* for switch maintenance - bking@cumin2002 - T355860
  • 15:43 jgiannelos@deploy2002: Started deploy [restbase/deploy@05fa5c9]: Disabling storage for ptwiki
  • 15:43 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic2058 for switch maintenance - bking@cumin2002 - T355860
  • 15:43 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2058 for switch maintenance - bking@cumin2002 - T355860
  • 15:42 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic2058,elastic2070,elastic2095,elastic2096 for switch maintenance - bking@cumin2002 - T355860
  • 15:42 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2058,elastic2070,elastic2095,elastic2096 for switch maintenance - bking@cumin2002 - T355860
  • 15:41 btullis@deploy2002: helmfile [codfw] START helmfile.d/services/datahub: sync on main
  • 15:41 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: eelastic2058,elastic2070,elastic2095,elastic2096 for switch maintenance - bking@cumin2002 - T355860
  • 15:41 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: eelastic2058,elastic2070,elastic2095,elastic2096 for switch maintenance - bking@cumin2002 - T355860
  • 15:37 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning all hosts in row B for switch maintenance - bking@cumin2002 - T355860
  • 15:37 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning all hosts in row B for switch maintenance - bking@cumin2002 - T355860
  • 15:34 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning all hosts in row B4 for switch maintenance - bking@cumin2002 - T355860
  • 15:34 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning all hosts in row B4 for switch maintenance - bking@cumin2002 - T355860
  • 15:28 btullis@cumin1002: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 15:27 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning all hosts in row B for switch maintenance - bking@cumin2002 - T355860
  • 15:27 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning all hosts in row B for switch maintenance - bking@cumin2002 - T355860
  • 15:26 bking@cumin2002: conftool action : set/pooled=no; selector: name=wdqs2016.codfw.wmnet
  • 15:26 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2094.codfw.wmnet with reason: host reimage
  • 15:25 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning all hosts in row B for switch maintenance - bking@cumin2002 - T355860
  • 15:25 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning all hosts in row B for switch maintenance - bking@cumin2002 - T355860
  • 15:23 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2094.codfw.wmnet with reason: host reimage
  • 15:14 filippo@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 15:14 filippo@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 15:11 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw
  • 15:07 topranks: Disabling netbox service on netbox1002 prior to db restore from backup
  • 15:06 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
  • 15:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P56344 and previous config saved to /var/cache/conftool/dbconfig/20240206-150649-root.json
  • 15:06 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
  • 14:56 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 14:54 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host elastic2094.codfw.wmnet with OS bullseye
  • 14:54 hashar@deploy2002: Finished deploy [gerrit/gerrit@2e441ac]: wm-checks-api: handle Zuul 'Merge failed' messages - T356647 (duration: 00m 07s)
  • 14:54 hashar@deploy2002: Started deploy [gerrit/gerrit@2e441ac]: wm-checks-api: handle Zuul 'Merge failed' messages - T356647
  • 14:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P56343 and previous config saved to /var/cache/conftool/dbconfig/20240206-145144-root.json
  • 14:51 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.convert-disks (exit_code=99) for host ms-be2044
  • 14:49 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudelastic1009.wikimedia.org
  • 14:49 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:49 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic1009.wikimedia.org decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 14:48 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudelastic1009.wikimedia.org decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
  • 14:47 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw
  • 14:45 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:45 bking@cumin2002: START - Cookbook sre.dns.netbox
  • 14:44 logmsgbot: lucaswerkmeister-wmde@deploy2002 Finished scap: Backport for Load Filepage.css when previewing File pages (T356505) (duration: 10m 51s)
  • 14:38 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudelastic1009.wikimedia.org
  • 14:37 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and lucaswerkmeister: Continuing with sync
  • 14:36 logmsgbot: lucaswerkmeister-wmde@deploy2002 lucaswerkmeister-wmde and lucaswerkmeister: Backport for Load Filepage.css when previewing File pages (T356505) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P56341 and previous config saved to /var/cache/conftool/dbconfig/20240206-143639-root.json
  • 14:33 logmsgbot: lucaswerkmeister-wmde@deploy2002 Started scap: Backport for Load Filepage.css when previewing File pages (T356505)
  • 14:32 mvernon@cumin2002: START - Cookbook sre.swift.convert-disks for host ms-be2044
  • 14:32 Emperor: debug convert-disks cookbook against out-of-use ms-be2044 T308677
  • 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-web1001.eqiad.wmnet
  • 14:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P56340 and previous config saved to /var/cache/conftool/dbconfig/20240206-142134-root.json
  • 14:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-web1001.eqiad.wmnet
  • 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::webserver
  • 14:09 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes knwiki --fix # T355662 (crashed)
  • 14:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P56339 and previous config saved to /var/cache/conftool/dbconfig/20240206-140629-root.json
  • 14:05 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::webserver
  • 14:02 jmm@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad
  • 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host config-master1001.eqiad.wmnet
  • 14:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2194.codfw.wmnet with OS bookworm
  • 14:00 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-web1001.eqiad.wmnet with OS bullseye
  • 13:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host config-master1001.eqiad.wmnet
  • 13:57 jmm@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=eqiad
  • 13:56 jmm@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=eqiad
  • 13:56 jmm@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=config-master,name=codfw
  • 13:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host config-master2001.codfw.wmnet
  • 13:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host config-master2001.codfw.wmnet
  • 13:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P56338 and previous config saved to /var/cache/conftool/dbconfig/20240206-135124-root.json
  • 13:50 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2356.codfw.wmnet with OS bullseye
  • 13:50 jmm@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=config-master,name=codfw
  • 13:47 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2354.codfw.wmnet with OS bullseye
  • 13:45 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2352.codfw.wmnet with OS bullseye
  • 13:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 13:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:39 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683] (eqiad): Ensure that all eqiad nodes are running the same revision (duration: 00m 31s)
  • 13:38 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (eqiad): Ensure that all eqiad nodes are running the same revision
  • 13:38 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683] (codfw): Ensure that all codfw nodes are running the same revision (duration: 00m 32s)
  • 13:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: host reimage
  • 13:37 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (codfw): Ensure that all codfw nodes are running the same revision
  • 13:37 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 13:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-web1001.eqiad.wmnet with reason: host reimage
  • 13:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: After reimage', diff saved to https://phabricator.wikimedia.org/P56337 and previous config saved to /var/cache/conftool/dbconfig/20240206-133619-root.json
  • 13:34 moritzm: installing openjdk-17 security updates
  • 13:33 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-web1001.eqiad.wmnet with reason: host reimage
  • 13:32 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2319.codfw.wmnet with OS bullseye
  • 13:32 stevemunene@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 13:30 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2356.codfw.wmnet with reason: host reimage
  • 13:30 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2318.codfw.wmnet with OS bullseye
  • 13:29 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683] (codfw): (no justification provided) (duration: 00m 17s)
  • 13:29 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (codfw): (no justification provided)
  • 13:28 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2317.codfw.wmnet with OS bullseye
  • 13:28 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2356.codfw.wmnet with reason: host reimage
  • 13:27 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2354.codfw.wmnet with reason: host reimage
  • 13:25 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2352.codfw.wmnet with reason: host reimage
  • 13:24 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2354.codfw.wmnet with reason: host reimage
  • 13:22 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1408.eqiad.wmnet with OS bullseye
  • 13:22 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2352.codfw.wmnet with reason: host reimage
  • 13:22 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2350.codfw.wmnet with reason: host reimage
  • 13:22 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided) (duration: 00m 12s)
  • 13:21 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided)
  • 13:21 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided) (duration: 00m 05s)
  • 13:20 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided)
  • 13:20 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1396.eqiad.wmnet with OS bullseye
  • 13:19 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2350.codfw.wmnet with reason: host reimage
  • 13:19 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db2194.codfw.wmnet with OS bookworm
  • 13:17 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1394.eqiad.wmnet with OS bullseye
  • 13:16 moritzm: pruning unneeded openjdk-17-jre-headless packages on restbase* hosts
  • 13:15 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-web1001.eqiad.wmnet with OS bullseye
  • 13:15 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1392.eqiad.wmnet with OS bullseye
  • 13:14 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided) (duration: 00m 05s)
  • 13:14 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided)
  • 13:13 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2319.codfw.wmnet with reason: host reimage
  • 13:11 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw2356.codfw.wmnet with OS bullseye
  • 13:11 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2318.codfw.wmnet with reason: host reimage
  • 13:11 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2319.codfw.wmnet with reason: host reimage
  • 13:10 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1388.eqiad.wmnet
  • 13:10 kamila@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1388.eqiad.wmnet
  • 13:10 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-web1001.eqiad.wmnet with OS bullseye
  • 13:09 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2318.codfw.wmnet with reason: host reimage
  • 13:09 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2317.codfw.wmnet with reason: host reimage
  • 13:07 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw2354.codfw.wmnet with OS bullseye
  • 13:06 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1388.eqiad.wmnet with OS bullseye
  • 13:05 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw2352.codfw.wmnet with OS bullseye
  • 13:04 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1408.eqiad.wmnet with reason: host reimage
  • 13:03 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1390.eqiad.wmnet with OS bullseye
  • 13:03 claime: Relaunching build-production-images
  • 13:02 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw2350.codfw.wmnet with OS bullseye
  • 13:02 aokoth@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM vrts1002.eqiad.wmnet
  • 13:02 claime: build2001 - Total reclaimed space: 23.31GB
  • 13:01 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1396.eqiad.wmnet with reason: host reimage
  • 13:01 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided) (duration: 00m 01s)
  • 13:01 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided)
  • 13:00 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided) (duration: 00m 01s)
  • 13:00 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided)
  • 12:59 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-web1001.eqiad.wmnet with OS bullseye
  • 12:59 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1408.eqiad.wmnet with reason: host reimage
  • 12:59 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1394.eqiad.wmnet with reason: host reimage
  • 12:59 claime: Pruning images older than 45 days on build2001: docker image prune -a --filter "until=1080h"/25
  • 12:59 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1396.eqiad.wmnet with reason: host reimage
  • 12:58 aokoth@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM vrts1002.eqiad.wmnet
  • 12:57 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided) (duration: 00m 05s)
  • 12:57 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided)
  • 12:56 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1392.eqiad.wmnet with reason: host reimage
  • 12:55 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1394.eqiad.wmnet with reason: host reimage
  • 12:54 moritzm: installing openjdk-11 security updates
  • 12:54 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw2319.codfw.wmnet with OS bullseye
  • 12:53 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1392.eqiad.wmnet with reason: host reimage
  • 12:52 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw2318.codfw.wmnet with OS bullseye
  • 12:51 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1386.eqiad.wmnet with OS bullseye
  • 12:51 jgiannelos@deploy2002: deploy aborted: (no justification provided) (duration: 00m 04s)
  • 12:50 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided)
  • 12:49 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw2317.codfw.wmnet with OS bullseye
  • 12:46 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1408.eqiad.wmnet with OS bullseye
  • 12:45 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided) (duration: 00m 05s)
  • 12:45 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1396.eqiad.wmnet with OS bullseye
  • 12:45 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided)
  • 12:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1390.eqiad.wmnet with reason: host reimage
  • 12:40 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1392.eqiad.wmnet with OS bullseye
  • 12:39 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1388.eqiad.wmnet with reason: host reimage
  • 12:37 volans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 12:34 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1386.eqiad.wmnet with reason: host reimage
  • 12:31 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1386.eqiad.wmnet with reason: host reimage
  • 12:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1029.eqiad.wmnet with OS bookworm
  • 12:28 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1390.eqiad.wmnet with OS bullseye
  • 12:26 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1388.eqiad.wmnet with OS bullseye
  • 12:21 volans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:18 volans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:18 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1386.eqiad.wmnet with OS bullseye
  • 12:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 12:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1029.eqiad.wmnet with reason: host reimage
  • 12:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1029.eqiad.wmnet with reason: host reimage
  • 12:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 12:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 12:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56335 and previous config saved to /var/cache/conftool/dbconfig/20240206-121034-marostegui.json
  • 12:02 volans@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 11:58 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1029.eqiad.wmnet with OS bookworm
  • 11:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244:3315', diff saved to https://phabricator.wikimedia.org/P56334 and previous config saved to /var/cache/conftool/dbconfig/20240206-115527-marostegui.json
  • 11:49 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-web1001.eqiad.wmnet with OS bullseye
  • 11:46 volans@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host mw1408.eqiad.wmnet
  • 11:43 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-web1001.eqiad.wmnet with reason: host reimage
  • 11:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244:3315', diff saved to https://phabricator.wikimedia.org/P56332 and previous config saved to /var/cache/conftool/dbconfig/20240206-114020-marostegui.json
  • 11:40 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-web1001.eqiad.wmnet with reason: host reimage
  • 11:31 ladsgroup@deploy2002: Finished scap: Backport for Switch the pagelinks default to add read new (T351237) (duration: 10m 38s)
  • 11:30 volans@cumin1002: START - Cookbook sre.hosts.dhcp for host mw1408.eqiad.wmnet
  • 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56331 and previous config saved to /var/cache/conftool/dbconfig/20240206-112514-marostegui.json
  • 11:25 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 11:22 ladsgroup@deploy2002: ladsgroup: Backport for Switch the pagelinks default to add read new (T351237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:22 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided) (duration: 00m 04s)
  • 11:22 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided)
  • 11:21 ladsgroup@deploy2002: Started scap: Backport for Switch the pagelinks default to add read new (T351237)
  • 11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56330 and previous config saved to /var/cache/conftool/dbconfig/20240206-111923-marostegui.json
  • 11:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 11:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T355609)', diff saved to https://phabricator.wikimedia.org/P56329 and previous config saved to /var/cache/conftool/dbconfig/20240206-111901-marostegui.json
  • 11:15 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host stat1010.eqiad.wmnet
  • 11:13 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host stat1010.eqiad.wmnet
  • 11:12 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided) (duration: 00m 05s)
  • 11:12 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided)
  • 11:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P56328 and previous config saved to /var/cache/conftool/dbconfig/20240206-110354-marostegui.json
  • 11:03 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-web1001.eqiad.wmnet with OS bullseye
  • 10:57 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided) (duration: 00m 05s)
  • 10:57 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683] (eqiad): (no justification provided)
  • 10:53 arnaudb@cumin1002: dbctl commit (dc=all): 'db1235 (re)pooling @ 100%: 5', diff saved to https://phabricator.wikimedia.org/P56327 and previous config saved to /var/cache/conftool/dbconfig/20240206-105328-arnaudb.json
  • 10:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P56326 and previous config saved to /var/cache/conftool/dbconfig/20240206-104848-marostegui.json
  • 10:45 jgiannelos@deploy2002: Finished deploy [kartotherian/deploy@3325683]: (no justification provided) (duration: 00m 22s)
  • 10:45 jgiannelos@deploy2002: Started deploy [kartotherian/deploy@3325683]: (no justification provided)
  • 10:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db1235 (re)pooling @ 75%: 5', diff saved to https://phabricator.wikimedia.org/P56325 and previous config saved to /var/cache/conftool/dbconfig/20240206-103823-arnaudb.json
  • 10:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T355609)', diff saved to https://phabricator.wikimedia.org/P56324 and previous config saved to /var/cache/conftool/dbconfig/20240206-103341-marostegui.json
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P56323 and previous config saved to /var/cache/conftool/dbconfig/20240206-103133-root.json
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T355609)', diff saved to https://phabricator.wikimedia.org/P56322 and previous config saved to /var/cache/conftool/dbconfig/20240206-102932-marostegui.json
  • 10:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 10:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 10:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 10:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56321 and previous config saved to /var/cache/conftool/dbconfig/20240206-102445-marostegui.json
  • 10:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1235 (re)pooling @ 50%: 5', diff saved to https://phabricator.wikimedia.org/P56320 and previous config saved to /var/cache/conftool/dbconfig/20240206-102318-arnaudb.json
  • 10:22 akosiaris: roll restart all pods in wikikube@codfw, wikikube@staging-codfw, wikikube@staging-eqiad
  • 10:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1009.eqiad.wmnet
  • 10:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
  • 10:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P56319 and previous config saved to /var/cache/conftool/dbconfig/20240206-101628-root.json
  • 10:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P56317 and previous config saved to /var/cache/conftool/dbconfig/20240206-100938-marostegui.json
  • 10:08 arnaudb@cumin1002: dbctl commit (dc=all): 'db1235 (re)pooling @ 40%: 5', diff saved to https://phabricator.wikimedia.org/P56316 and previous config saved to /var/cache/conftool/dbconfig/20240206-100813-arnaudb.json
  • 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P56315 and previous config saved to /var/cache/conftool/dbconfig/20240206-100123-root.json
  • 09:56 moritzm: installing mariadb-10.5 security/bugfix updates from Bullseye point release (as packaged by Debian, unrelated to wmf-mariadb packages)
  • 09:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P56314 and previous config saved to /var/cache/conftool/dbconfig/20240206-095432-marostegui.json
  • 09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'db1235 (re)pooling @ 30%: 5', diff saved to https://phabricator.wikimedia.org/P56313 and previous config saved to /var/cache/conftool/dbconfig/20240206-095308-arnaudb.json
  • 09:47 akosiaris: roll restart all pods in wikikube@eqiad
  • 09:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P56312 and previous config saved to /var/cache/conftool/dbconfig/20240206-094617-root.json
  • 09:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56311 and previous config saved to /var/cache/conftool/dbconfig/20240206-093925-marostegui.json
  • 09:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db1235 (re)pooling @ 20%: 5', diff saved to https://phabricator.wikimedia.org/P56310 and previous config saved to /var/cache/conftool/dbconfig/20240206-093803-arnaudb.json
  • 09:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2031.codfw.wmnet
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1213:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56309 and previous config saved to /var/cache/conftool/dbconfig/20240206-093440-marostegui.json
  • 09:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 09:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T355609)', diff saved to https://phabricator.wikimedia.org/P56308 and previous config saved to /var/cache/conftool/dbconfig/20240206-093418-marostegui.json
  • 09:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P56307 and previous config saved to /var/cache/conftool/dbconfig/20240206-093112-root.json
  • 09:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1235 (re)pooling @ 10%: 5', diff saved to https://phabricator.wikimedia.org/P56306 and previous config saved to /var/cache/conftool/dbconfig/20240206-092257-arnaudb.json
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P56305 and previous config saved to /var/cache/conftool/dbconfig/20240206-091911-marostegui.json
  • 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P56304 and previous config saved to /var/cache/conftool/dbconfig/20240206-091607-root.json
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P56303 and previous config saved to /var/cache/conftool/dbconfig/20240206-090405-marostegui.json
  • 09:01 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host build2001.codfw.wmnet
  • 09:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 1%: After schema change', diff saved to https://phabricator.wikimedia.org/P56302 and previous config saved to /var/cache/conftool/dbconfig/20240206-090102-root.json
  • 08:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
  • 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T355609)', diff saved to https://phabricator.wikimedia.org/P56301 and previous config saved to /var/cache/conftool/dbconfig/20240206-084858-marostegui.json
  • 08:47 moritzm: pruning unneeded openjdk-17-jre-headless packages on aqs* hosts
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T355609)', diff saved to https://phabricator.wikimedia.org/P56300 and previous config saved to /var/cache/conftool/dbconfig/20240206-084315-marostegui.json
  • 08:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 08:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T355609)', diff saved to https://phabricator.wikimedia.org/P56299 and previous config saved to /var/cache/conftool/dbconfig/20240206-084253-marostegui.json
  • 08:42 slyngs: Increase severity of failed systemd units when alerting from AlertManager
  • 08:32 moritzm: pruning unneeded openjdk-17-jre-headless packages on ml-cache* hosts
  • 08:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P56298 and previous config saved to /var/cache/conftool/dbconfig/20240206-082746-marostegui.json
  • 08:17 hoo@deploy2002: Finished scap: Backport for Add wgVirtualDomainsMapping for Cognate (T348526) (duration: 08m 51s)
  • 08:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P56297 and previous config saved to /var/cache/conftool/dbconfig/20240206-081239-marostegui.json
  • 08:11 hoo@deploy2002: hoo: Continuing with sync
  • 08:10 hoo@deploy2002: hoo: Backport for Add wgVirtualDomainsMapping for Cognate (T348526) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:08 hoo@deploy2002: Started scap: Backport for Add wgVirtualDomainsMapping for Cognate (T348526)
  • 08:06 hoo@deploy2002: backport Cancelled
  • 08:00 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 08:00 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 07:57 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 07:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T355609)', diff saved to https://phabricator.wikimedia.org/P56296 and previous config saved to /var/cache/conftool/dbconfig/20240206-075733-marostegui.json
  • 07:57 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 07:56 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 07:56 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T355609)', diff saved to https://phabricator.wikimedia.org/P56295 and previous config saved to /var/cache/conftool/dbconfig/20240206-075251-marostegui.json
  • 07:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T355609)', diff saved to https://phabricator.wikimedia.org/P56294 and previous config saved to /var/cache/conftool/dbconfig/20240206-075228-marostegui.json
  • 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P56293 and previous config saved to /var/cache/conftool/dbconfig/20240206-073721-marostegui.json
  • 07:25 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es1029.eqiad.wmnet with OS bullseye
  • 07:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P56292 and previous config saved to /var/cache/conftool/dbconfig/20240206-072215-marostegui.json
  • 07:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T355609)', diff saved to https://phabricator.wikimedia.org/P56291 and previous config saved to /var/cache/conftool/dbconfig/20240206-070708-marostegui.json
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T355609)', diff saved to https://phabricator.wikimedia.org/P56290 and previous config saved to /var/cache/conftool/dbconfig/20240206-070251-marostegui.json
  • 07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T355609)', diff saved to https://phabricator.wikimedia.org/P56289 and previous config saved to /var/cache/conftool/dbconfig/20240206-070228-marostegui.json
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P56288 and previous config saved to /var/cache/conftool/dbconfig/20240206-064722-marostegui.json
  • 06:38 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1029.eqiad.wmnet with OS bullseye
  • 06:37 marostegui@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es1029.eqiad.wmnet with OS bookworm
  • 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P56287 and previous config saved to /var/cache/conftool/dbconfig/20240206-063215-marostegui.json
  • 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T355609)', diff saved to https://phabricator.wikimedia.org/P56285 and previous config saved to /var/cache/conftool/dbconfig/20240206-061116-marostegui.json
  • 06:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 06:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 06:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 06:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1186', diff saved to https://phabricator.wikimedia.org/P56284 and previous config saved to /var/cache/conftool/dbconfig/20240206-060942-root.json
  • 06:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 06:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 06:01 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host es1029.eqiad.wmnet with OS bookworm
  • 05:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1029 T351916', diff saved to https://phabricator.wikimedia.org/P56283 and previous config saved to /var/cache/conftool/dbconfig/20240206-055835-root.json
  • 04:55 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.42.0-wmf.17 refs T354435 (duration: 51m 02s)
  • 04:04 mwpresync@deploy2002: Started scap: testwikis wikis to 1.42.0-wmf.17 refs T354435
  • 04:02 mwpresync@deploy2002: Pruned MediaWiki: 1.42.0-wmf.14 (duration: 02m 07s)
  • 03:00 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase[2030-2035].codfw.wmnet: Apply updated JVM — T356648 - eevans@cumin1002
  • 02:03 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[2030-2035].codfw.wmnet: Apply updated JVM — T356648 - eevans@cumin1002
  • 02:00 eevans@cumin1002: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase[2026-2035].codfw.wmnet: Apply updated JVM — T356648 - eevans@cumin1002
  • 01:19 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[2026-2035].codfw.wmnet: Apply updated JVM — T356648 - eevans@cumin1002
  • 01:18 eevans@cumin1002: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase[2021-2035].codfw.wmnet: Apply updated JVM — T356648 - eevans@cumin1002
  • 00:29 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[2021-2035].codfw.wmnet: Apply updated JVM — T356648 - eevans@cumin1002
  • 00:26 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase[1028-1033].eqiad.wmnet: Apply updated JVM — T356648 - eevans@cumin1002

2024-02-05

  • 23:39 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply cluster settings before private IP migration - bking@cumin2002 - T355617
  • 23:31 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[1028-1033].eqiad.wmnet: Apply updated JVM — T356648 - eevans@cumin1002
  • 23:19 eevans@cumin1002: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching A:restbase-eqiad: Apply updated JVM — T356648 - eevans@cumin1002
  • 23:12 zabe@deploy2002: Finished scap: Backport for MobileFrontend hook should not apply outside mobile view (T356711) (duration: 09m 18s)
  • 23:06 zabe@deploy2002: zabe: Continuing with sync
  • 23:04 zabe@deploy2002: zabe: Backport for MobileFrontend hook should not apply outside mobile view (T356711) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:03 zabe@deploy2002: Started scap: Backport for MobileFrontend hook should not apply outside mobile view (T356711)
  • 22:30 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply cluster settings before private IP migration - bking@cumin2002 - T355617
  • 22:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cloudelastic1009.wikimedia.org with reason: T355617
  • 22:28 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cloudelastic1009.wikimedia.org with reason: T355617
  • 22:25 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1009.wikimedia.org for migrate cloudelastic1009 to private IP - bking@cumin2002 - T355617
  • 22:25 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1009.wikimedia.org for migrate cloudelastic1009 to private IP - bking@cumin2002 - T355617
  • 22:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T355609)', diff saved to https://phabricator.wikimedia.org/P56282 and previous config saved to /var/cache/conftool/dbconfig/20240205-222507-marostegui.json
  • 22:23 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on sessionstore1003.eqiad.wmnet with reason: Decommissioning — T353405
  • 22:23 bking@cumin2002: conftool action : set/pooled=no; selector: name=cloudelastic1009.wikimedia.org
  • 22:23 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on sessionstore1003.eqiad.wmnet with reason: Decommissioning — T353405
  • 22:23 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on sessionstore1002.eqiad.wmnet with reason: Decommissioning — T353405
  • 22:23 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on sessionstore1002.eqiad.wmnet with reason: Decommissioning — T353405
  • 22:23 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on sessionstore1001.eqiad.wmnet with reason: Decommissioning — T353405
  • 22:22 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on sessionstore1001.eqiad.wmnet with reason: Decommissioning — T353405
  • 22:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P56281 and previous config saved to /var/cache/conftool/dbconfig/20240205-221001-marostegui.json
  • 22:06 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncmonitor1001.eqiad.wmnet - brett@cumin2002"
  • 22:05 urandom: Decommissioning Cassandra, sessionstore1001 — T353405
  • 22:05 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncmonitor1001.eqiad.wmnet - brett@cumin2002"
  • 22:04 brett@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncmonitor1001.eqiad.wmnet on all recursors
  • 22:04 brett@cumin2002: START - Cookbook sre.dns.wipe-cache ncmonitor1001.eqiad.wmnet on all recursors
  • 22:04 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:04 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncmonitor1001.eqiad.wmnet - brett@cumin2002"
  • 22:03 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncmonitor1001.eqiad.wmnet - brett@cumin2002"
  • 22:01 brett@cumin2002: START - Cookbook sre.dns.netbox
  • 22:01 brett@cumin2002: START - Cookbook sre.ganeti.makevm for new host ncmonitor1001.eqiad.wmnet
  • 21:58 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Apply updated JVM — T356648 - eevans@cumin1002
  • 21:56 cjming: end of UTC late backport window
  • 21:56 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore[1001-1006].eqiad.wmnet: Apply updated JVM — T356648 - eevans@cumin1002
  • 21:55 cjming@deploy2002: Finished scap: Backport for Use decodeURI for comment ID searches as well as heading searches (T356199) (duration: 13m 08s)
  • 21:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P56280 and previous config saved to /var/cache/conftool/dbconfig/20240205-215454-marostegui.json
  • 21:49 cjming@deploy2002: cjming and kemayo: Continuing with sync
  • 21:43 cjming@deploy2002: cjming and kemayo: Backport for Use decodeURI for comment ID searches as well as heading searches (T356199) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:42 cjming@deploy2002: Started scap: Backport for Use decodeURI for comment ID searches as well as heading searches (T356199)
  • 21:40 cjming@deploy2002: Finished scap: Backport for Enable desktop diff HTML on mobile pages for all logged in users (T350181) (duration: 09m 29s)
  • 21:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2003.codfw.wmnet with OS bullseye
  • 21:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T355609)', diff saved to https://phabricator.wikimedia.org/P56279 and previous config saved to /var/cache/conftool/dbconfig/20240205-213947-marostegui.json
  • 21:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T355609)', diff saved to https://phabricator.wikimedia.org/P56278 and previous config saved to /var/cache/conftool/dbconfig/20240205-213726-marostegui.json
  • 21:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 21:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 21:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T355609)', diff saved to https://phabricator.wikimedia.org/P56277 and previous config saved to /var/cache/conftool/dbconfig/20240205-213703-marostegui.json
  • 21:36 bking@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bullseye
  • 21:34 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
  • 21:32 cjming@deploy2002: jdlrobson and cjming: Backport for Enable desktop diff HTML on mobile pages for all logged in users (T350181) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:31 cjming@deploy2002: Started scap: Backport for Enable desktop diff HTML on mobile pages for all logged in users (T350181)
  • 21:27 cjming@deploy2002: Finished scap: Backport for Turn on DT visual enhancements on wikitech (T355374) (duration: 11m 12s)
  • 21:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P56276 and previous config saved to /var/cache/conftool/dbconfig/20240205-212157-marostegui.json
  • 21:21 cjming@deploy2002: cjming and cscott: Continuing with sync
  • 21:20 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore[1001-1006].eqiad.wmnet: Apply updated JVM — T356648 - eevans@cumin1002
  • 21:17 cjming@deploy2002: cjming and cscott: Backport for Turn on DT visual enhancements on wikitech (T355374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:16 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw2317.codfw.wmnet with OS bullseye
  • 21:16 cjming@deploy2002: Started scap: Backport for Turn on DT visual enhancements on wikitech (T355374)
  • 21:12 cjming@deploy2002: Finished scap: Backport for Update Android Metrics Platform stream configs (T355360) (duration: 09m 10s)
  • 21:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P56275 and previous config saved to /var/cache/conftool/dbconfig/20240205-210650-marostegui.json
  • 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore[2001-2003].codfw.wmnet: Apply updated JVM — T356648 - eevans@cumin1002
  • 21:05 cjming@deploy2002: cjming: Continuing with sync
  • 21:05 cjming@deploy2002: cjming: Backport for Update Android Metrics Platform stream configs (T355360) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:02 cjming@deploy2002: Started scap: Backport for Update Android Metrics Platform stream configs (T355360)
  • 20:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T355609)', diff saved to https://phabricator.wikimedia.org/P56274 and previous config saved to /var/cache/conftool/dbconfig/20240205-205144-marostegui.json
  • 20:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T355609)', diff saved to https://phabricator.wikimedia.org/P56273 and previous config saved to /var/cache/conftool/dbconfig/20240205-204922-marostegui.json
  • 20:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 20:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 20:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56272 and previous config saved to /var/cache/conftool/dbconfig/20240205-204900-marostegui.json
  • 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore[2001-2003].codfw.wmnet: Apply updated JVM — T356648 - eevans@cumin1002
  • 20:44 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs20[04-12].codfw.wmnet: Apply updated JVM — T356648 - eevans@cumin1002
  • 20:39 eevans@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 20:38 eevans@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 20:36 eevans@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 20:35 eevans@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 20:34 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 20:34 eevans@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 20:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P56271 and previous config saved to /var/cache/conftool/dbconfig/20240205-203353-marostegui.json
  • 20:26 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling reboot on A:durum and A:durum
  • 20:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P56270 and previous config saved to /var/cache/conftool/dbconfig/20240205-201847-marostegui.json
  • 20:06 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw1396.eqiad.wmnet with OS bullseye
  • 20:05 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw1408.eqiad.wmnet with OS bullseye
  • 20:04 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw1394.eqiad.wmnet with OS bullseye
  • 20:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56269 and previous config saved to /var/cache/conftool/dbconfig/20240205-200340-marostegui.json
  • 20:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56268 and previous config saved to /var/cache/conftool/dbconfig/20240205-200119-marostegui.json
  • 20:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 20:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 20:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T355609)', diff saved to https://phabricator.wikimedia.org/P56267 and previous config saved to /var/cache/conftool/dbconfig/20240205-200056-marostegui.json
  • 19:58 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw2317.codfw.wmnet with OS bullseye
  • 19:47 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1386.eqiad.wmnet with OS bullseye
  • 19:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P56266 and previous config saved to /var/cache/conftool/dbconfig/20240205-194550-marostegui.json
  • 19:44 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw1392.eqiad.wmnet with OS bullseye
  • 19:43 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling reboot on A:durum and A:durum
  • 19:43 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw1390.eqiad.wmnet with OS bullseye
  • 19:42 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw1388.eqiad.wmnet with OS bullseye
  • 19:41 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw1386.eqiad.wmnet with OS bullseye
  • 19:36 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs20[04-12].codfw.wmnet: Apply updated JVM — T356648 - eevans@cumin1002
  • 19:32 eevans@cumin1002: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching A:aqs-codfw: Apply updated JVM — T356648 - eevans@cumin1002
  • 19:31 eevans@cumin1002: conftool action : set/weight=0; selector: cluster=restbase,dc=codfw,name=restbase2020.codfw.wmnet
  • 19:31 eevans@cumin1002: conftool action : set/weight=0; selector: cluster=restbase,dc=codfw,name=restbase2018.codfw.wmnet
  • 19:31 eevans@cumin1002: conftool action : set/weight=0; selector: cluster=restbase,dc=codfw,name=restbase2017.codfw.wmnet
  • 19:31 eevans@cumin1002: conftool action : set/weight=0; selector: cluster=restbase,dc=codfw,name=restbase2016.codfw.wmnet
  • 19:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P56265 and previous config saved to /var/cache/conftool/dbconfig/20240205-193043-marostegui.json
  • 19:29 eevans@cumin1002: conftool action : set/weight=0; selector: cluster=restbase,dc=codfw,name=restbase2015.codfw.wmnet
  • 19:28 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough and not A:wikidough-drmrs and A:wikidough
  • 19:20 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2005.codfw.wmnet with OS bookworm
  • 19:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T355609)', diff saved to https://phabricator.wikimedia.org/P56264 and previous config saved to /var/cache/conftool/dbconfig/20240205-191537-marostegui.json
  • 19:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T355609)', diff saved to https://phabricator.wikimedia.org/P56263 and previous config saved to /var/cache/conftool/dbconfig/20240205-191315-marostegui.json
  • 19:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 19:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 19:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56262 and previous config saved to /var/cache/conftool/dbconfig/20240205-191252-marostegui.json
  • 19:10 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Apply updated JVM — T356648 - eevans@cumin1002
  • 19:09 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Apply updated JVM — T356648 - eevans@cumin1002
  • 18:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P56261 and previous config saved to /var/cache/conftool/dbconfig/20240205-185745-marostegui.json
  • 18:47 damilare: civicrm upgraded from 427c40f5 to 684eb057
  • 18:45 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1408.eqiad.wmnet with OS bullseye
  • 18:45 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1396.eqiad.wmnet with OS bullseye
  • 18:44 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1394.eqiad.wmnet with OS bullseye
  • 18:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P56260 and previous config saved to /var/cache/conftool/dbconfig/20240205-184239-marostegui.json
  • 18:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh1002.wikimedia.org
  • 18:36 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh1002.wikimedia.org
  • 18:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56259 and previous config saved to /var/cache/conftool/dbconfig/20240205-182732-marostegui.json
  • 18:25 bking@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 18:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2137:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56258 and previous config saved to /var/cache/conftool/dbconfig/20240205-182511-marostegui.json
  • 18:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 18:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 18:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T355609)', diff saved to https://phabricator.wikimedia.org/P56257 and previous config saved to /var/cache/conftool/dbconfig/20240205-182448-marostegui.json
  • 18:24 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1392.eqiad.wmnet with OS bullseye
  • 18:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1390.eqiad.wmnet with OS bullseye
  • 18:22 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1388.eqiad.wmnet with OS bullseye
  • 18:22 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host mw1386.eqiad.wmnet with OS bullseye
  • 18:16 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough and not A:wikidough-drmrs and A:wikidough
  • 18:15 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough-drmrs and A:wikidough
  • 18:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P56256 and previous config saved to /var/cache/conftool/dbconfig/20240205-180942-marostegui.json
  • 18:04 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough-drmrs and A:wikidough
  • 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest2003.codfw.wmnet']
  • 18:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest2003.codfw.wmnet']
  • 17:59 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bullseye
  • 17:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P56255 and previous config saved to /var/cache/conftool/dbconfig/20240205-175435-marostegui.json
  • 17:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bullseye
  • 17:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bullseye
  • 17:42 bking@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bullseye
  • 17:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T355609)', diff saved to https://phabricator.wikimedia.org/P56251 and previous config saved to /var/cache/conftool/dbconfig/20240205-173928-marostegui.json
  • 17:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T355609)', diff saved to https://phabricator.wikimedia.org/P56250 and previous config saved to /var/cache/conftool/dbconfig/20240205-173707-marostegui.json
  • 17:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 17:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 17:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T355609)', diff saved to https://phabricator.wikimedia.org/P56249 and previous config saved to /var/cache/conftool/dbconfig/20240205-173640-marostegui.json
  • 17:36 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Apply updated JVM — T356648 - eevans@cumin1002
  • 17:25 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2003.codfw.wmnet with OS bullseye
  • 17:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P56248 and previous config saved to /var/cache/conftool/dbconfig/20240205-172133-marostegui.json
  • 17:19 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes enwikiquote --add-prefix='Wikiquote:T355195/' --fix
  • 17:11 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db1135.eqiad.wmnet onto db1235.eqiad.wmnet
  • 17:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1244:3314 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P56247 and previous config saved to /var/cache/conftool/dbconfig/20240205-170737-arnaudb.json
  • 17:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246:3312 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P56246 and previous config saved to /var/cache/conftool/dbconfig/20240205-170735-arnaudb.json
  • 17:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P56245 and previous config saved to /var/cache/conftool/dbconfig/20240205-170627-marostegui.json
  • 16:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1244:3314 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P56244 and previous config saved to /var/cache/conftool/dbconfig/20240205-165232-arnaudb.json
  • 16:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246:3312 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P56243 and previous config saved to /var/cache/conftool/dbconfig/20240205-165230-arnaudb.json
  • 16:51 moritzm: pruning unneeded openjdk-17-jre-headless packages on cassandra-dev* hosts
  • 16:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T355609)', diff saved to https://phabricator.wikimedia.org/P56242 and previous config saved to /var/cache/conftool/dbconfig/20240205-165120-marostegui.json
  • 16:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bullseye
  • 16:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2123 (T355609)', diff saved to https://phabricator.wikimedia.org/P56241 and previous config saved to /var/cache/conftool/dbconfig/20240205-164859-marostegui.json
  • 16:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T355609)', diff saved to https://phabricator.wikimedia.org/P56240 and previous config saved to /var/cache/conftool/dbconfig/20240205-164836-marostegui.json
  • 16:42 moritzm: pruning unneeded openjdk-17-jre-headless packages on sessionstore* hosts
  • 16:42 sukhe: adding cdobbins to cn=wmf and cn=ops
  • 16:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1244:3314 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P56239 and previous config saved to /var/cache/conftool/dbconfig/20240205-163727-arnaudb.json
  • 16:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246:3312 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P56238 and previous config saved to /var/cache/conftool/dbconfig/20240205-163725-arnaudb.json
  • 16:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P56237 and previous config saved to /var/cache/conftool/dbconfig/20240205-163329-marostegui.json
  • 16:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts debmonitor1002.eqiad.wmnet
  • 16:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: debmonitor1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 16:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1244:3314 (re)pooling @ 40%: 10', diff saved to https://phabricator.wikimedia.org/P56236 and previous config saved to /var/cache/conftool/dbconfig/20240205-162222-arnaudb.json
  • 16:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246:3312 (re)pooling @ 40%: 10', diff saved to https://phabricator.wikimedia.org/P56235 and previous config saved to /var/cache/conftool/dbconfig/20240205-162220-arnaudb.json
  • 16:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: debmonitor1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 16:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P56234 and previous config saved to /var/cache/conftool/dbconfig/20240205-161822-marostegui.json
  • 16:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 16:11 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts debmonitor1002.eqiad.wmnet
  • 16:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts debmonitor2002.codfw.wmnet
  • 16:10 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:10 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: debmonitor2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 16:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: debmonitor2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 16:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1244:3314 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P56233 and previous config saved to /var/cache/conftool/dbconfig/20240205-160717-arnaudb.json
  • 16:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246:3312 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P56232 and previous config saved to /var/cache/conftool/dbconfig/20240205-160715-arnaudb.json
  • 16:07 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 16:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T355609)', diff saved to https://phabricator.wikimedia.org/P56231 and previous config saved to /var/cache/conftool/dbconfig/20240205-160316-marostegui.json
  • 16:02 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts debmonitor2002.codfw.wmnet
  • 16:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2111 (T355609)', diff saved to https://phabricator.wikimedia.org/P56230 and previous config saved to /var/cache/conftool/dbconfig/20240205-160055-marostegui.json
  • 16:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 16:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 16:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 16:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1244:3314 (re)pooling @ 20%: 10', diff saved to https://phabricator.wikimedia.org/P56229 and previous config saved to /var/cache/conftool/dbconfig/20240205-155212-arnaudb.json
  • 15:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246:3312 (re)pooling @ 20%: 10', diff saved to https://phabricator.wikimedia.org/P56228 and previous config saved to /var/cache/conftool/dbconfig/20240205-155210-arnaudb.json
  • 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: elasticsearch::cloudelastic
  • 15:38 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: elasticsearch::cloudelastic
  • 15:35 claime: Building production images for 987443 - T283861
  • 15:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for debmonitor2003.codfw.wmnet
  • 15:04 slyngshede@cumin1002: START - Cookbook sre.hosts.remove-downtime for debmonitor2003.codfw.wmnet
  • 14:56 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host debmonitor2003.codfw.wmnet with OS bookworm
  • 14:44 hashar@deploy2002: Finished deploy [gerrit/gerrit@79dc8f5]: Add rename-project plugin - T201953 (duration: 00m 07s)
  • 14:44 hashar@deploy2002: Started deploy [gerrit/gerrit@79dc8f5]: Add rename-project plugin - T201953
  • 14:28 godog: bounce prometheus@k8s and @k8s-aux in eqiad - T343529
  • 14:26 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:18 hashar@deploy2002: Finished deploy [integration/docroot@8e28943]: Update phpunit and npm dependencies (noop for prod) (duration: 00m 06s)
  • 14:18 hashar@deploy2002: Started deploy [integration/docroot@8e28943]: Update phpunit and npm dependencies (noop for prod)
  • 14:16 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db1135.eqiad.wmnet onto db1235.eqiad.wmnet
  • 14:14 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Mo Houtti out of all services on: 2205 hosts
  • 14:14 root@cumin2002: START - Cookbook sre.idm.logout Logging Mo Houtti out of all services on: 2205 hosts
  • 14:05 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
  • 14:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246:3314 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P56224 and previous config saved to /var/cache/conftool/dbconfig/20240205-140343-arnaudb.json
  • 14:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db1244:3315 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P56223 and previous config saved to /var/cache/conftool/dbconfig/20240205-140332-arnaudb.json
  • 14:02 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on debmonitor2003.codfw.wmnet with reason: host reimage
  • 13:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246:3314 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P56222 and previous config saved to /var/cache/conftool/dbconfig/20240205-134838-arnaudb.json
  • 13:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db1244:3315 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P56221 and previous config saved to /var/cache/conftool/dbconfig/20240205-134827-arnaudb.json
  • 13:45 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host debmonitor2003.codfw.wmnet with OS bookworm
  • 13:35 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 13:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246:3314 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P56219 and previous config saved to /var/cache/conftool/dbconfig/20240205-133333-arnaudb.json
  • 13:33 arnaudb@cumin1002: dbctl commit (dc=all): 'db1244:3315 (re)pooling @ 50%: 10', diff saved to https://phabricator.wikimedia.org/P56218 and previous config saved to /var/cache/conftool/dbconfig/20240205-133322-arnaudb.json
  • 13:26 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 13:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246:3314 (re)pooling @ 40%: 10', diff saved to https://phabricator.wikimedia.org/P56217 and previous config saved to /var/cache/conftool/dbconfig/20240205-131828-arnaudb.json
  • 13:18 arnaudb@cumin1002: dbctl commit (dc=all): 'db1244:3315 (re)pooling @ 40%: 10', diff saved to https://phabricator.wikimedia.org/P56216 and previous config saved to /var/cache/conftool/dbconfig/20240205-131817-arnaudb.json
  • 13:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T355609)', diff saved to https://phabricator.wikimedia.org/P56215 and previous config saved to /var/cache/conftool/dbconfig/20240205-130956-marostegui.json
  • 13:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246:3314 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P56214 and previous config saved to /var/cache/conftool/dbconfig/20240205-130323-arnaudb.json
  • 13:03 arnaudb@cumin1002: dbctl commit (dc=all): 'db1244:3315 (re)pooling @ 30%: 10', diff saved to https://phabricator.wikimedia.org/P56213 and previous config saved to /var/cache/conftool/dbconfig/20240205-130312-arnaudb.json
  • 13:01 moritzm: installing perf updates on bookworm hosts
  • 12:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P56212 and previous config saved to /var/cache/conftool/dbconfig/20240205-125450-marostegui.json
  • 12:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db1135 in db1235 for T344036', diff saved to https://phabricator.wikimedia.org/P56211 and previous config saved to /var/cache/conftool/dbconfig/20240205-125444-arnaudb.json
  • 12:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: provisionning db1235.eqiad.wmnet - T344036
  • 12:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: provisionning db1235.eqiad.wmnet - T344036
  • 12:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: provisionning db1235.eqiad.wmnet - T344036
  • 12:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: provisionning db1235.eqiad.wmnet - T344036
  • 12:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246:3314 (re)pooling @ 20%: 10', diff saved to https://phabricator.wikimedia.org/P56210 and previous config saved to /var/cache/conftool/dbconfig/20240205-124817-arnaudb.json
  • 12:48 arnaudb@cumin1002: dbctl commit (dc=all): 'db1244:3315 (re)pooling @ 20%: 10', diff saved to https://phabricator.wikimedia.org/P56209 and previous config saved to /var/cache/conftool/dbconfig/20240205-124807-arnaudb.json
  • 12:40 moritzm: installing bind9 security updates (client-side libs/tools only)
  • 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P56208 and previous config saved to /var/cache/conftool/dbconfig/20240205-123923-marostegui.json
  • 12:37 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1002.wikimedia.org
  • 12:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp1002.wikimedia.org
  • 12:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T355609)', diff saved to https://phabricator.wikimedia.org/P56207 and previous config saved to /var/cache/conftool/dbconfig/20240205-122416-marostegui.json
  • 12:23 moritzm: installing runc security updates on aux-k8s cluster
  • 12:22 jnuche@deploy2002: Installation of scap version "4.65.3" completed for 505 hosts
  • 12:21 jnuche@deploy2002: Installing scap version "4.65.3" for 505 hosts
  • 12:20 jnuche@deploy2002: Installing scap version "4.65.3" for 505 hosts
  • 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T355609)', diff saved to https://phabricator.wikimedia.org/P56206 and previous config saved to /var/cache/conftool/dbconfig/20240205-121911-marostegui.json
  • 12:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 12:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T355609)', diff saved to https://phabricator.wikimedia.org/P56205 and previous config saved to /var/cache/conftool/dbconfig/20240205-121850-marostegui.json
  • 12:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P56204 and previous config saved to /var/cache/conftool/dbconfig/20240205-120343-marostegui.json
  • 11:58 arturo: downgrade docker to v23 in thirdparty/kubeadm-k8s-1-23 for T356507 and T356629 in apt1001
  • 11:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P56203 and previous config saved to /var/cache/conftool/dbconfig/20240205-114837-marostegui.json
  • 11:34 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:33 volans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Unracked cloudcephosd10[35-39,40] - volans@cumin1002"
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T355609)', diff saved to https://phabricator.wikimedia.org/P56201 and previous config saved to /var/cache/conftool/dbconfig/20240205-113330-marostegui.json
  • 11:33 volans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Unracked cloudcephosd10[35-39,40] - volans@cumin1002"
  • 11:30 volans@cumin1002: START - Cookbook sre.dns.netbox
  • 11:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T355609)', diff saved to https://phabricator.wikimedia.org/P56200 and previous config saved to /var/cache/conftool/dbconfig/20240205-112812-marostegui.json
  • 11:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56199 and previous config saved to /var/cache/conftool/dbconfig/20240205-112750-marostegui.json
  • 11:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P56198 and previous config saved to /var/cache/conftool/dbconfig/20240205-111243-marostegui.json
  • 10:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P56197 and previous config saved to /var/cache/conftool/dbconfig/20240205-105736-marostegui.json
  • 10:54 arturo: update thirdparty/kubeadm-k8s-1-23 packages for T356507 in apt1001
  • 10:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56196 and previous config saved to /var/cache/conftool/dbconfig/20240205-104230-marostegui.json
  • 10:42 moritzm: installing runc security updates on releases hosts
  • 10:38 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 10:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56195 and previous config saved to /var/cache/conftool/dbconfig/20240205-103547-marostegui.json
  • 10:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T355609)', diff saved to https://phabricator.wikimedia.org/P56194 and previous config saved to /var/cache/conftool/dbconfig/20240205-103525-marostegui.json
  • 10:32 moritzm: installing runc security updates on DSE cluster
  • 10:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P56193 and previous config saved to /var/cache/conftool/dbconfig/20240205-102018-marostegui.json
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P56192 and previous config saved to /var/cache/conftool/dbconfig/20240205-100511-marostegui.json
  • 09:54 jelto@cumin1002: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T355609)', diff saved to https://phabricator.wikimedia.org/P56191 and previous config saved to /var/cache/conftool/dbconfig/20240205-095005-marostegui.json
  • 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T355609)', diff saved to https://phabricator.wikimedia.org/P56190 and previous config saved to /var/cache/conftool/dbconfig/20240205-094329-marostegui.json
  • 09:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 09:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56189 and previous config saved to /var/cache/conftool/dbconfig/20240205-094306-marostegui.json
  • 09:42 moritzm: installing runc security updates on gitlab runners
  • 09:38 jynus: restarted db2097
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P56188 and previous config saved to /var/cache/conftool/dbconfig/20240205-092800-marostegui.json
  • 09:23 jynus: INFO: About to transfer /srv/backups/snapshots/latest/snapshot.s3.2024-02-05--04-31-35.tar.gz from dbprov1001.eqiad.wmnet to ['db1240.eqiad.wmnet']:['/srv/sqldata.s3'] (403339334478 bytes)
  • 09:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P56187 and previous config saved to /var/cache/conftool/dbconfig/20240205-091253-marostegui.json
  • 09:12 moritzm: installing Java 8 security updates
  • 09:09 moritzm: installing perf updates on bullseye hosts
  • 08:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56186 and previous config saved to /var/cache/conftool/dbconfig/20240205-085746-marostegui.json
  • 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2137:3315 (T355609)', diff saved to https://phabricator.wikimedia.org/P56185 and previous config saved to /var/cache/conftool/dbconfig/20240205-084949-marostegui.json
  • 08:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 08:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T355609)', diff saved to https://phabricator.wikimedia.org/P56184 and previous config saved to /var/cache/conftool/dbconfig/20240205-084927-marostegui.json
  • 08:42 zabe@deploy2002: Finished scap: Backport for specials: Remove null comments from formatter on Special:ProtectedPages (T356337), namespaceDupes: Reduce batchsize to 100 for link update (duration: 18m 18s)
  • 08:35 zabe@deploy2002: zabe and jforrester: Continuing with sync
  • 08:35 moritzm: uploaded openjdk-8 8u402-ga-2~deb11u1 (latest Java 8 security fixes for Bullseye)
  • 08:35 zabe@deploy2002: zabe and jforrester: Backport for specials: Remove null comments from formatter on Special:ProtectedPages (T356337), namespaceDupes: Reduce batchsize to 100 for link update synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P56183 and previous config saved to /var/cache/conftool/dbconfig/20240205-083420-marostegui.json
  • 08:24 zabe@deploy2002: Started scap: Backport for specials: Remove null comments from formatter on Special:ProtectedPages (T356337), namespaceDupes: Reduce batchsize to 100 for link update
  • 08:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P56182 and previous config saved to /var/cache/conftool/dbconfig/20240205-081914-marostegui.json
  • 08:10 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nskaggs out of all services on: 2205 hosts
  • 08:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Nskaggs out of all services on: 2205 hosts
  • 08:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T355609)', diff saved to https://phabricator.wikimedia.org/P56181 and previous config saved to /var/cache/conftool/dbconfig/20240205-080407-marostegui.json
  • 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T355609)', diff saved to https://phabricator.wikimedia.org/P56180 and previous config saved to /var/cache/conftool/dbconfig/20240205-075856-marostegui.json
  • 07:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 07:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 07:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 07:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T355609)', diff saved to https://phabricator.wikimedia.org/P56179 and previous config saved to /var/cache/conftool/dbconfig/20240205-075818-marostegui.json
  • 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
  • 07:55 zabe: zabe@mwmaint2002:/tmp/uploads$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user="Illegitimate Barrister" . # T356607
  • 07:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
  • 07:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
  • 07:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P56178 and previous config saved to /var/cache/conftool/dbconfig/20240205-074312-marostegui.json
  • 07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P56177 and previous config saved to /var/cache/conftool/dbconfig/20240205-072805-marostegui.json
  • 07:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T355609)', diff saved to https://phabricator.wikimedia.org/P56176 and previous config saved to /var/cache/conftool/dbconfig/20240205-071259-marostegui.json
  • 07:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2123 (T355609)', diff saved to https://phabricator.wikimedia.org/P56175 and previous config saved to /var/cache/conftool/dbconfig/20240205-070745-marostegui.json
  • 07:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T355609)', diff saved to https://phabricator.wikimedia.org/P56174 and previous config saved to /var/cache/conftool/dbconfig/20240205-070723-marostegui.json
  • 06:56 marostegui: dbmaint Drop indexes on site table on s8 T356417
  • 06:56 marostegui: dbamaint Drop mathoid, mathlatexml tables T355050
  • 06:54 marostegui: Drop indexes on site table on s8 T356417
  • 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P56173 and previous config saved to /var/cache/conftool/dbconfig/20240205-065216-marostegui.json
  • 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P56172 and previous config saved to /var/cache/conftool/dbconfig/20240205-063709-marostegui.json
  • 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T355609)', diff saved to https://phabricator.wikimedia.org/P56171 and previous config saved to /var/cache/conftool/dbconfig/20240205-062203-marostegui.json
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2111 (T355609)', diff saved to https://phabricator.wikimedia.org/P56170 and previous config saved to /var/cache/conftool/dbconfig/20240205-061511-marostegui.json
  • 06:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 06:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 06:11 marostegui: Drop mathoid, mathlatexml tables T355050
  • 06:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 06:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 00:59 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 00:59 rzl@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.

2024-02-04

  • 01:53 urandom: decommissioning cassandra, restbase2018-{a,b,c} — T352469
  • 01:49 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2018.codfw.wmnet with reason: Decommissioning — T352469
  • 01:49 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2018.codfw.wmnet with reason: Decommissioning — T352469

2024-02-03

  • 13:30 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2017.codfw.wmnet with reason: Decommissioning — T352469
  • 13:30 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2017.codfw.wmnet with reason: Decommissioning — T352469
  • 08:19 ryankemper: [cloudelastic] Replica shards have re-initialized; cluster is back to green. Will probably see a wall of `ElasticSearch unassigned shard check - 9400` resolve messages soon, fingers crossed
  • 08:15 ryankemper: [cloduelastic] Re-enabled replica allocation on `cloudelastic-omega-eqiad` => `curl -H 'Content-Type: application/json' -XPUT https://cloudelastic.wikimedia.org:9443/_cluster/settings -d '{"transient":{"cluster.routing.allocation":{"enable": "all"}'`}}
  • 08:10 ryankemper: [cloudelastic] Seeing `replica allocations are forbidden due to cluster setting [cluster.routing.allocation.enable=primaries`; that likely explains the many unassigned shards of cloudelastic.wikimedia.org:9400 ... feels like a previous cookbook run didn't back out successfully leaving replica allocation disabled
  • 08:09 ryankemper: [cloudelastic] current state: `{"cluster_name":"cloudelastic-omega-eqiad","status":"yellow","number_of_nodes":10,"number_of_data_nodes":10,"active_primary_shards":798,"active_shards":1438,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":160,"delayed_unassigned_shards":0,"active_shards_percent_as_number":89.98748435544431}`
  • 01:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:13 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T355609)', diff saved to https://phabricator.wikimedia.org/P56168 and previous config saved to /var/cache/conftool/dbconfig/20240203-011337-marostegui.json
  • 00:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P56167 and previous config saved to /var/cache/conftool/dbconfig/20240203-005830-marostegui.json
  • 00:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P56166 and previous config saved to /var/cache/conftool/dbconfig/20240203-004324-marostegui.json
  • 00:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T355609)', diff saved to https://phabricator.wikimedia.org/P56165 and previous config saved to /var/cache/conftool/dbconfig/20240203-002817-marostegui.json
  • 00:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T355609)', diff saved to https://phabricator.wikimedia.org/P56164 and previous config saved to /var/cache/conftool/dbconfig/20240203-000314-marostegui.json
  • 00:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 00:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 00:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T355609)', diff saved to https://phabricator.wikimedia.org/P56163 and previous config saved to /var/cache/conftool/dbconfig/20240203-000252-marostegui.json

2024-02-02

  • 23:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P56162 and previous config saved to /var/cache/conftool/dbconfig/20240202-234745-marostegui.json
  • 23:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P56161 and previous config saved to /var/cache/conftool/dbconfig/20240202-233239-marostegui.json
  • 23:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T355609)', diff saved to https://phabricator.wikimedia.org/P56160 and previous config saved to /var/cache/conftool/dbconfig/20240202-231732-marostegui.json
  • 22:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T355609)', diff saved to https://phabricator.wikimedia.org/P56159 and previous config saved to /var/cache/conftool/dbconfig/20240202-224357-marostegui.json
  • 22:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 22:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 22:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T355609)', diff saved to https://phabricator.wikimedia.org/P56158 and previous config saved to /var/cache/conftool/dbconfig/20240202-224334-marostegui.json
  • 22:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P56157 and previous config saved to /var/cache/conftool/dbconfig/20240202-222828-marostegui.json
  • 22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P56156 and previous config saved to /var/cache/conftool/dbconfig/20240202-221321-marostegui.json
  • 21:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T355609)', diff saved to https://phabricator.wikimedia.org/P56155 and previous config saved to /var/cache/conftool/dbconfig/20240202-215815-marostegui.json
  • 21:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T355609)', diff saved to https://phabricator.wikimedia.org/P56154 and previous config saved to /var/cache/conftool/dbconfig/20240202-213504-marostegui.json
  • 21:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 21:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 21:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 21:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 20:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 20:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 20:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T355609)', diff saved to https://phabricator.wikimedia.org/P56153 and previous config saved to /var/cache/conftool/dbconfig/20240202-205722-marostegui.json
  • 20:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P56152 and previous config saved to /var/cache/conftool/dbconfig/20240202-204215-marostegui.json
  • 20:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P56151 and previous config saved to /var/cache/conftool/dbconfig/20240202-202709-marostegui.json
  • 20:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T355609)', diff saved to https://phabricator.wikimedia.org/P56150 and previous config saved to /var/cache/conftool/dbconfig/20240202-201202-marostegui.json
  • 19:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T355609)', diff saved to https://phabricator.wikimedia.org/P56149 and previous config saved to /var/cache/conftool/dbconfig/20240202-194359-marostegui.json
  • 19:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 19:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 19:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T355609)', diff saved to https://phabricator.wikimedia.org/P56148 and previous config saved to /var/cache/conftool/dbconfig/20240202-194338-marostegui.json
  • 19:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P56147 and previous config saved to /var/cache/conftool/dbconfig/20240202-192831-marostegui.json
  • 19:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P56146 and previous config saved to /var/cache/conftool/dbconfig/20240202-191325-marostegui.json
  • 18:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T355609)', diff saved to https://phabricator.wikimedia.org/P56145 and previous config saved to /var/cache/conftool/dbconfig/20240202-185818-marostegui.json
  • 18:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T355609)', diff saved to https://phabricator.wikimedia.org/P56144 and previous config saved to /var/cache/conftool/dbconfig/20240202-183510-marostegui.json
  • 18:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 18:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 18:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T355609)', diff saved to https://phabricator.wikimedia.org/P56143 and previous config saved to /var/cache/conftool/dbconfig/20240202-183448-marostegui.json
  • 18:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P56142 and previous config saved to /var/cache/conftool/dbconfig/20240202-181941-marostegui.json
  • 18:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P56141 and previous config saved to /var/cache/conftool/dbconfig/20240202-180435-marostegui.json
  • 18:02 ejegg: fundraising civicrm upgraded from f89f3a58 to 427c40f5
  • 17:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T355609)', diff saved to https://phabricator.wikimedia.org/P56140 and previous config saved to /var/cache/conftool/dbconfig/20240202-174929-marostegui.json
  • 17:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T355609)', diff saved to https://phabricator.wikimedia.org/P56139 and previous config saved to /var/cache/conftool/dbconfig/20240202-172532-marostegui.json
  • 17:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 17:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 17:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T355609)', diff saved to https://phabricator.wikimedia.org/P56138 and previous config saved to /var/cache/conftool/dbconfig/20240202-172510-marostegui.json
  • 17:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P56137 and previous config saved to /var/cache/conftool/dbconfig/20240202-171003-marostegui.json
  • 16:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P56136 and previous config saved to /var/cache/conftool/dbconfig/20240202-165457-marostegui.json
  • 16:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T355609)', diff saved to https://phabricator.wikimedia.org/P56135 and previous config saved to /var/cache/conftool/dbconfig/20240202-163950-marostegui.json
  • 16:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T355609)', diff saved to https://phabricator.wikimedia.org/P56133 and previous config saved to /var/cache/conftool/dbconfig/20240202-161249-marostegui.json
  • 16:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T355609)', diff saved to https://phabricator.wikimedia.org/P56132 and previous config saved to /var/cache/conftool/dbconfig/20240202-161227-marostegui.json
  • 15:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P56131 and previous config saved to /var/cache/conftool/dbconfig/20240202-155721-marostegui.json
  • 15:52 cgoubert@cumin2002: conftool action : set/pooled=yes; selector: name=mw1494.eqiad.wmnet,cluster=jobrunner
  • 15:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P56130 and previous config saved to /var/cache/conftool/dbconfig/20240202-154214-marostegui.json
  • 15:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T355609)', diff saved to https://phabricator.wikimedia.org/P56128 and previous config saved to /var/cache/conftool/dbconfig/20240202-152707-marostegui.json
  • 15:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T355609)', diff saved to https://phabricator.wikimedia.org/P56127 and previous config saved to /var/cache/conftool/dbconfig/20240202-150236-marostegui.json
  • 15:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 15:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 15:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T355609)', diff saved to https://phabricator.wikimedia.org/P56126 and previous config saved to /var/cache/conftool/dbconfig/20240202-150155-marostegui.json
  • 14:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P56125 and previous config saved to /var/cache/conftool/dbconfig/20240202-144648-marostegui.json
  • 14:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P56123 and previous config saved to /var/cache/conftool/dbconfig/20240202-143139-marostegui.json
  • 14:21 urandom: decommissioning cassandra, restbase2017-{a,b,c} — T352469
  • 14:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T355609)', diff saved to https://phabricator.wikimedia.org/P56122 and previous config saved to /var/cache/conftool/dbconfig/20240202-141632-marostegui.json
  • 13:56 jynus: INFO: About to transfer /srv/backups/snapshots/latest/snapshot.s4.2024-02-02--09-03-48.tar.gz from dbprov1003.eqiad.wmnet to ['db1245.eqiad.wmnet']:['/srv/sqldata.s4'] (575311400085 bytes)
  • 13:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T355609)', diff saved to https://phabricator.wikimedia.org/P56121 and previous config saved to /var/cache/conftool/dbconfig/20240202-135300-marostegui.json
  • 13:53 jynus: INFO: About to transfer /srv/backups/snapshots/latest/snapshot.s1.2024-02-02--09-03-48.tar.gz from dbprov1001.eqiad.wmnet to ['db1239.eqiad.wmnet']:['/srv/sqldata.s1'] (478462104090 bytes)
  • 13:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 13:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T355609)', diff saved to https://phabricator.wikimedia.org/P56120 and previous config saved to /var/cache/conftool/dbconfig/20240202-135237-marostegui.json
  • 13:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P56119 and previous config saved to /var/cache/conftool/dbconfig/20240202-133730-marostegui.json
  • 13:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P56118 and previous config saved to /var/cache/conftool/dbconfig/20240202-132224-marostegui.json
  • 13:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
  • 13:18 moritzm: installing Linux 4.19.304 updates on Buster hosts
  • 13:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
  • 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2004.codfw.wmnet
  • 13:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T355609)', diff saved to https://phabricator.wikimedia.org/P56117 and previous config saved to /var/cache/conftool/dbconfig/20240202-130717-marostegui.json
  • 13:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2004.codfw.wmnet
  • 13:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1002.wikimedia.org
  • 12:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1002.wikimedia.org
  • 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2002.wikimedia.org
  • 12:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P56116 and previous config saved to /var/cache/conftool/dbconfig/20240202-125131-root.json
  • 12:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2002.wikimedia.org
  • 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
  • 12:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
  • 12:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T355609)', diff saved to https://phabricator.wikimedia.org/P56115 and previous config saved to /var/cache/conftool/dbconfig/20240202-124243-marostegui.json
  • 12:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 12:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P56114 and previous config saved to /var/cache/conftool/dbconfig/20240202-123625-root.json
  • 12:35 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1496.eqiad.wmnet with OS bullseye
  • 12:31 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1419.eqiad.wmnet with OS bullseye
  • 12:29 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1488.eqiad.wmnet with OS bullseye
  • 12:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 12:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 12:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 12:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P56113 and previous config saved to /var/cache/conftool/dbconfig/20240202-122120-root.json
  • 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 12:18 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:18 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:16 claime: Restarting ferm.service on k8s node mw1424 - T354855
  • 12:15 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1496.eqiad.wmnet with reason: host reimage
  • 12:12 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1419.eqiad.wmnet with reason: host reimage
  • 12:10 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1488.eqiad.wmnet with reason: host reimage
  • 12:08 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1496.eqiad.wmnet with reason: host reimage
  • 12:08 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1419.eqiad.wmnet with reason: host reimage
  • 12:07 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1488.eqiad.wmnet with reason: host reimage
  • 12:06 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudlb2003-dev.codfw.wmnet
  • 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P56112 and previous config saved to /var/cache/conftool/dbconfig/20240202-120615-root.json
  • 12:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 11:54 cgoubert@cumin2002: START - Cookbook sre.hosts.reimage for host mw1419.eqiad.wmnet with OS bullseye
  • 11:54 cgoubert@cumin2002: START - Cookbook sre.hosts.reimage for host mw1496.eqiad.wmnet with OS bullseye
  • 11:54 cgoubert@cumin2002: START - Cookbook sre.hosts.reimage for host mw1488.eqiad.wmnet with OS bullseye
  • 11:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
  • 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P56111 and previous config saved to /var/cache/conftool/dbconfig/20240202-115110-root.json
  • 11:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1002.eqiad.wmnet
  • 11:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-airflow1002.eqiad.wmnet
  • 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P56110 and previous config saved to /var/cache/conftool/dbconfig/20240202-113605-root.json
  • 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::airflow::research
  • 11:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 1%: After schema change', diff saved to https://phabricator.wikimedia.org/P56109 and previous config saved to /var/cache/conftool/dbconfig/20240202-112100-root.json
  • 11:20 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::airflow::research
  • 11:07 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-airflow1002.eqiad.wmnet with OS bullseye
  • 10:55 Emperor: stop puppet and swift on ms-be2044-50 T353149
  • 10:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 7 hosts with reason: due for decomm
  • 10:53 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 7 hosts with reason: due for decomm
  • 10:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-airflow1004.eqiad.wmnet
  • 10:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-airflow1004.eqiad.wmnet
  • 10:44 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1002.eqiad.wmnet with reason: host reimage
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::airflow::platform_eng
  • 10:41 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1002.eqiad.wmnet with reason: host reimage
  • 10:33 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::airflow::platform_eng
  • 10:32 Emperor: restart codfw swift-container (-b 1 -s 3)
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T355609)', diff saved to https://phabricator.wikimedia.org/P56108 and previous config saved to /var/cache/conftool/dbconfig/20240202-102943-marostegui.json
  • 10:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 10:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 10:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 10:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 10:28 Emperor: restart codfw swift-account (-b 1 -s 3)
  • 10:27 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-airflow1002.eqiad.wmnet with OS bullseye
  • 10:20 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-airflow1004.eqiad.wmnet with OS bullseye
  • 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T355609)', diff saved to https://phabricator.wikimedia.org/P56107 and previous config saved to /var/cache/conftool/dbconfig/20240202-101311-marostegui.json
  • 10:01 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1004.eqiad.wmnet with reason: host reimage
  • 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P56106 and previous config saved to /var/cache/conftool/dbconfig/20240202-095805-marostegui.json
  • 09:56 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1004.eqiad.wmnet with reason: host reimage
  • 09:45 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-airflow1004.eqiad.wmnet with OS bullseye
  • 09:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P56104 and previous config saved to /var/cache/conftool/dbconfig/20240202-094258-marostegui.json
  • 09:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:35 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:35 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T355609)', diff saved to https://phabricator.wikimedia.org/P56103 and previous config saved to /var/cache/conftool/dbconfig/20240202-092752-marostegui.json
  • 09:18 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe
  • 09:09 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe
  • 09:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2137:3314 (T355609)', diff saved to https://phabricator.wikimedia.org/P56102 and previous config saved to /var/cache/conftool/dbconfig/20240202-090209-marostegui.json
  • 09:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T355609)', diff saved to https://phabricator.wikimedia.org/P56101 and previous config saved to /var/cache/conftool/dbconfig/20240202-090146-marostegui.json
  • 08:57 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe1011.eqiad.wmnet} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:57 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe1011.eqiad.wmnet} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P56100 and previous config saved to /var/cache/conftool/dbconfig/20240202-084640-marostegui.json
  • 08:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P56099 and previous config saved to /var/cache/conftool/dbconfig/20240202-083133-marostegui.json
  • 08:30 tstarling@deploy2002: Synchronized wmf-config/CommonSettings.php: Enable UrlShortener QR code everywhere (T348487) (duration: 07m 23s)
  • 08:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T355609)', diff saved to https://phabricator.wikimedia.org/P56098 and previous config saved to /var/cache/conftool/dbconfig/20240202-081626-marostegui.json
  • 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3007.wikimedia.org
  • 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T355609)', diff saved to https://phabricator.wikimedia.org/P56097 and previous config saved to /var/cache/conftool/dbconfig/20240202-071555-marostegui.json
  • 07:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 07:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 07:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 07:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T355609)', diff saved to https://phabricator.wikimedia.org/P56096 and previous config saved to /var/cache/conftool/dbconfig/20240202-071516-marostegui.json
  • 07:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3007.wikimedia.org
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P56095 and previous config saved to /var/cache/conftool/dbconfig/20240202-070009-marostegui.json
  • 06:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6003.wikimedia.org
  • 06:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6003.wikimedia.org
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P56094 and previous config saved to /var/cache/conftool/dbconfig/20240202-064502-marostegui.json
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Change db1163 weight', diff saved to https://phabricator.wikimedia.org/P56093 and previous config saved to /var/cache/conftool/dbconfig/20240202-063858-marostegui.json
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Change db1163 weight', diff saved to https://phabricator.wikimedia.org/P56092 and previous config saved to /var/cache/conftool/dbconfig/20240202-063844-marostegui.json
  • 06:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1106.eqiad.wmnet
  • 06:14 marostegui@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:14 marostegui@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1106.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
  • 06:13 marostegui@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1106.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
  • 06:11 marostegui@cumin1002: START - Cookbook sre.dns.netbox
  • 06:06 marostegui@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1106.eqiad.wmnet
  • 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T355609)', diff saved to https://phabricator.wikimedia.org/P56090 and previous config saved to /var/cache/conftool/dbconfig/20240202-060504-marostegui.json
  • 06:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 06:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 00:23 thcipriani@deploy2002: Finished scap: Backport for wikitech: Update Gerrit blocking logic (T307558) (duration: 09m 06s)
  • 00:16 thcipriani@deploy2002: thcipriani and bd808: Continuing with sync
  • 00:15 thcipriani@deploy2002: thcipriani and bd808: Backport for wikitech: Update Gerrit blocking logic (T307558) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 00:14 thcipriani@deploy2002: Started scap: Backport for wikitech: Update Gerrit blocking logic (T307558)

2024-02-01

  • 22:54 bking@cumin2002: conftool action : set/weight=10; selector: name=cloudelastic1010.eqiad.wmnet
  • 22:52 bking@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=cloudelastic,name=cloudelastic1010.eqiad.wmnet
  • 22:51 bking@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=cloudelastic,name=cloudelastic1003.wikimedia.org
  • 22:51 bking@cumin2002: conftool action : set/weight=10; selector: name=cloudelastic1010.
  • 21:57 mutante: LDAP - added wmdecyn to wmde and nda groups T355937
  • 21:38 tchanders@deploy2002: Finished scap: Backport for Set $wgEnablePartialActionBlocks true for most wikis (T353495) (duration: 10m 04s)
  • 21:32 tchanders@deploy2002: tchanders: Continuing with sync
  • 21:30 tchanders@deploy2002: tchanders: Backport for Set $wgEnablePartialActionBlocks true for most wikis (T353495) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:28 tchanders@deploy2002: Started scap: Backport for Set $wgEnablePartialActionBlocks true for most wikis (T353495)
  • 21:21 urbanecm@deploy2002: Finished scap: Backport for Add testwiki config to test Contact page for account vanishing. (T343536) (duration: 09m 10s)
  • 21:15 urbanecm@deploy2002: urbanecm and dbrant: Continuing with sync
  • 21:14 urbanecm@deploy2002: urbanecm and dbrant: Backport for Add testwiki config to test Contact page for account vanishing. (T343536) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:12 urbanecm@deploy2002: Started scap: Backport for Add testwiki config to test Contact page for account vanishing. (T343536)
  • 21:12 urbanecm@deploy2002: Finished scap: Backport for New stream config for mobileapps Places feature (T351165) (duration: 09m 21s)
  • 21:10 eileen: civicrm upgraded from 21bf2138 to f89f3a58
  • 21:05 urbanecm@deploy2002: sharvaniharan and urbanecm: Continuing with sync
  • 21:04 urbanecm@deploy2002: sharvaniharan and urbanecm: Backport for New stream config for mobileapps Places feature (T351165) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:02 urbanecm@deploy2002: Started scap: Backport for New stream config for mobileapps Places feature (T351165)
  • 20:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T355609)', diff saved to https://phabricator.wikimedia.org/P56089 and previous config saved to /var/cache/conftool/dbconfig/20240201-201304-marostegui.json
  • 19:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P56088 and previous config saved to /var/cache/conftool/dbconfig/20240201-195758-marostegui.json
  • 19:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P56087 and previous config saved to /var/cache/conftool/dbconfig/20240201-194251-marostegui.json
  • 19:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T355609)', diff saved to https://phabricator.wikimedia.org/P56086 and previous config saved to /var/cache/conftool/dbconfig/20240201-192745-marostegui.json
  • 19:12 dancy@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.42.0-wmf.16 refs T354434
  • 19:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T355609)', diff saved to https://phabricator.wikimedia.org/P56085 and previous config saved to /var/cache/conftool/dbconfig/20240201-190419-marostegui.json
  • 19:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 19:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 19:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T355609)', diff saved to https://phabricator.wikimedia.org/P56084 and previous config saved to /var/cache/conftool/dbconfig/20240201-190357-marostegui.json
  • 18:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P56083 and previous config saved to /var/cache/conftool/dbconfig/20240201-184850-marostegui.json
  • 18:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db1146.eqiad.wmnet onto db1246.eqiad.wmnet
  • 18:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P56082 and previous config saved to /var/cache/conftool/dbconfig/20240201-183343-marostegui.json
  • 18:32 sukhe: running dummy authdns-update
  • 18:24 aokoth@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM vrts2001.codfw.wmnet
  • 18:20 aokoth@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM vrts2001.codfw.wmnet
  • 18:19 aokoth@cumin2002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM vrts2001.codfw.wmnet
  • 18:19 aokoth@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM vrts2001.codfw.wmnet
  • 18:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T355609)', diff saved to https://phabricator.wikimedia.org/P56081 and previous config saved to /var/cache/conftool/dbconfig/20240201-181837-marostegui.json
  • 18:17 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:17 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:17 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:16 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:16 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:15 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T355609)', diff saved to https://phabricator.wikimedia.org/P56079 and previous config saved to /var/cache/conftool/dbconfig/20240201-175303-marostegui.json
  • 17:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T355609)', diff saved to https://phabricator.wikimedia.org/P56078 and previous config saved to /var/cache/conftool/dbconfig/20240201-175241-marostegui.json
  • 17:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P56077 and previous config saved to /var/cache/conftool/dbconfig/20240201-173735-marostegui.json
  • 17:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P56076 and previous config saved to /var/cache/conftool/dbconfig/20240201-172228-marostegui.json
  • 17:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T355609)', diff saved to https://phabricator.wikimedia.org/P56075 and previous config saved to /var/cache/conftool/dbconfig/20240201-170722-marostegui.json
  • 16:55 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2088.codfw.wmnet with OS bullseye
  • 16:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) Will create a clone of db1144.eqiad.wmnet onto db1244.eqiad.wmnet
  • 16:42 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2094.codfw.wmnet with OS bullseye
  • 16:38 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host elastic2094.codfw.wmnet with OS bullseye
  • 16:38 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2088.codfw.wmnet with reason: host reimage
  • 16:35 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2088.codfw.wmnet with reason: host reimage
  • 16:33 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2449.codfw.wmnet with OS bullseye
  • 16:30 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2447.codfw.wmnet with OS bullseye
  • 16:26 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2448.codfw.wmnet with OS bullseye
  • 16:19 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host elastic2088.codfw.wmnet with OS bullseye
  • 16:16 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 16:15 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 16:13 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2449.codfw.wmnet with reason: host reimage
  • 16:12 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2088.codfw.wmnet with OS bullseye
  • 16:11 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2447.codfw.wmnet with reason: host reimage
  • 16:11 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:10 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 16:10 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 16:09 cgoubert@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2448.codfw.wmnet with reason: host reimage
  • 16:09 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 16:09 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:09 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 16:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T355609)', diff saved to https://phabricator.wikimedia.org/P56074 and previous config saved to /var/cache/conftool/dbconfig/20240201-160650-marostegui.json
  • 16:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:08 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 16:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 16:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T355609)', diff saved to https://phabricator.wikimedia.org/P56073 and previous config saved to /var/cache/conftool/dbconfig/20240201-160600-marostegui.json
  • 16:06 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2449.codfw.wmnet with reason: host reimage
  • 16:05 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2447.codfw.wmnet with reason: host reimage
  • 16:05 cgoubert@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2448.codfw.wmnet with reason: host reimage
  • 16:04 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 15:52 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1106 from dbctl T327616', diff saved to https://phabricator.wikimedia.org/P56072 and previous config saved to /var/cache/conftool/dbconfig/20240201-155203-marostegui.json
  • 15:51 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P56071 and previous config saved to /var/cache/conftool/dbconfig/20240201-155054-marostegui.json
  • 15:50 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:50 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:49 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:49 cgoubert@cumin2002: START - Cookbook sre.hosts.reimage for host mw2447.codfw.wmnet with OS bullseye
  • 15:48 cgoubert@cumin2002: START - Cookbook sre.hosts.reimage for host mw2449.codfw.wmnet with OS bullseye
  • 15:48 cgoubert@cumin2002: START - Cookbook sre.hosts.reimage for host mw2448.codfw.wmnet with OS bullseye
  • 15:47 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P56070 and previous config saved to /var/cache/conftool/dbconfig/20240201-153547-marostegui.json
  • 15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T355609)', diff saved to https://phabricator.wikimedia.org/P56069 and previous config saved to /var/cache/conftool/dbconfig/20240201-152040-marostegui.json
  • 15:20 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 15:20 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 15:13 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 15:12 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 14:52 btullis@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic2094.codfw.wmnet']
  • 14:51 btullis@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2094.codfw.wmnet with OS bullseye
  • 14:50 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host elastic2088.codfw.wmnet with OS bullseye
  • 14:46 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic2088.codfw.wmnet with OS bullseye
  • 14:42 filippo@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:42 filippo@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 14:31 btullis@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2094.codfw.wmnet']
  • 14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T355609)', diff saved to https://phabricator.wikimedia.org/P56068 and previous config saved to /var/cache/conftool/dbconfig/20240201-142009-marostegui.json
  • 14:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 14:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 14:18 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db1146.eqiad.wmnet onto db1246.eqiad.wmnet
  • 14:16 btullis@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic2094.codfw.wmnet']
  • 14:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db1146 in db1246 for T350458', diff saved to https://phabricator.wikimedia.org/P56067 and previous config saved to /var/cache/conftool/dbconfig/20240201-141531-arnaudb.json
  • 14:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: provisionning db1246.eqiad.wmnet - T350458
  • 14:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: provisionning db1246.eqiad.wmnet - T350458
  • 14:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: provisionning db1246.eqiad.wmnet - T350458
  • 14:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: provisionning db1246.eqiad.wmnet - T350458
  • 14:02 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2005.codfw.wmnet with OS bookworm
  • 14:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 14:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T355609)', diff saved to https://phabricator.wikimedia.org/P56066 and previous config saved to /var/cache/conftool/dbconfig/20240201-135951-marostegui.json
  • 13:47 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P56065 and previous config saved to /var/cache/conftool/dbconfig/20240201-134445-marostegui.json
  • 13:44 btullis@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2094.codfw.wmnet']
  • 13:44 btullis@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2094.codfw.wmnet']
  • 13:44 arnaudb@cumin1002: START - Cookbook sre.mysql.clone Will create a clone of db1144.eqiad.wmnet onto db1244.eqiad.wmnet
  • 13:42 btullis@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2094.codfw.wmnet']
  • 13:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db1144 in db1244 for T350458', diff saved to https://phabricator.wikimedia.org/P56064 and previous config saved to /var/cache/conftool/dbconfig/20240201-134107-arnaudb.json
  • 13:40 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 13:40 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 13:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: provisionning db1244.eqiad.wmnet - T350458
  • 13:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: provisionning db1244.eqiad.wmnet - T350458
  • 13:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: provisionning db1244.eqiad.wmnet - T350458
  • 13:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: provisionning db1244.eqiad.wmnet - T350458
  • 13:35 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudlb2002-dev.codfw.wmnet
  • 13:33 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 13:31 btullis@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2094.codfw.wmnet with OS bullseye
  • 13:31 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 13:30 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T350458
  • 13:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P56062 and previous config saved to /var/cache/conftool/dbconfig/20240201-132938-marostegui.json
  • 13:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T350458
  • 13:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T350458
  • 13:29 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: provisionning db1234.eqiad.wmnet - T350458
  • 13:27 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:26 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:25 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2088.codfw.wmnet with reason: host reimage
  • 13:24 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 13:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudlb2002-dev.codfw.wmnet
  • 13:22 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2088.codfw.wmnet with reason: host reimage
  • 13:20 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 13:16 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2005.codfw.wmnet with OS bookworm
  • 13:16 btullis@cumin1002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch
  • 13:15 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
  • 13:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T355609)', diff saved to https://phabricator.wikimedia.org/P56061 and previous config saved to /var/cache/conftool/dbconfig/20240201-131432-marostegui.json
  • 13:08 btullis@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch
  • 13:03 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:03 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:59 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:59 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:58 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudlb2001-dev.codfw.wmnet
  • 12:58 btullis@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2088.codfw.wmnet']
  • 12:57 btullis@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2088.codfw.wmnet']
  • 12:55 btullis@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic2088.codfw.wmnet']
  • 12:54 btullis@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2088.codfw.wmnet']
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2138:3314 (T355609)', diff saved to https://phabricator.wikimedia.org/P56060 and previous config saved to /var/cache/conftool/dbconfig/20240201-124928-marostegui.json
  • 12:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T355609)', diff saved to https://phabricator.wikimedia.org/P56059 and previous config saved to /var/cache/conftool/dbconfig/20240201-124906-marostegui.json
  • 12:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudlb2001-dev.codfw.wmnet
  • 12:36 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host elastic2088.codfw.wmnet with OS bullseye
  • 12:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P56058 and previous config saved to /var/cache/conftool/dbconfig/20240201-123400-marostegui.json
  • 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp1001.eqiad.wmnet
  • 12:21 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
  • 12:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P56057 and previous config saved to /var/cache/conftool/dbconfig/20240201-121853-marostegui.json
  • 12:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp1001.eqiad.wmnet
  • 12:17 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host arclamp2001.codfw.wmnet
  • 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host arclamp2001.codfw.wmnet
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
  • 12:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T355609)', diff saved to https://phabricator.wikimedia.org/P56056 and previous config saved to /var/cache/conftool/dbconfig/20240201-120346-marostegui.json
  • 12:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
  • 11:22 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:21 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1008.eqiad.wmnet
  • 11:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1008.eqiad.wmnet
  • 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: analytics_cluster::hadoop::yarn
  • 11:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2137:3314 (T355609)', diff saved to https://phabricator.wikimedia.org/P56054 and previous config saved to /var/cache/conftool/dbconfig/20240201-110315-marostegui.json
  • 11:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T355609)', diff saved to https://phabricator.wikimedia.org/P56053 and previous config saved to /var/cache/conftool/dbconfig/20240201-110252-marostegui.json
  • 10:54 phuedx@deploy2002: Finished deploy [analytics/refinery@0d8e976] (hadoop-test): Remove trvwikisource from scoop list (duration: 03m 30s)
  • 10:52 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: analytics_cluster::hadoop::yarn
  • 10:51 phuedx@deploy2002: Started deploy [analytics/refinery@0d8e976] (hadoop-test): Remove trvwikisource from scoop list
  • 10:50 phuedx@deploy2002: Finished deploy [analytics/refinery@0d8e976] (thin): Remove trvwikisource from scoop list (duration: 00m 05s)
  • 10:50 phuedx@deploy2002: Started deploy [analytics/refinery@0d8e976] (thin): Remove trvwikisource from scoop list
  • 10:49 phuedx@deploy2002: Finished deploy [analytics/refinery@0d8e976]: analytics/refinery: Remove trvwikisource from scoop list (duration: 10m 20s)
  • 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P56052 and previous config saved to /var/cache/conftool/dbconfig/20240201-104746-marostegui.json
  • 10:39 phuedx@deploy2002: Started deploy [analytics/refinery@0d8e976]: analytics/refinery: Remove trvwikisource from scoop list
  • 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P56051 and previous config saved to /var/cache/conftool/dbconfig/20240201-103239-marostegui.json
  • 10:32 moritzm: installing openjdk-11 security updates
  • 10:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T355609)', diff saved to https://phabricator.wikimedia.org/P56049 and previous config saved to /var/cache/conftool/dbconfig/20240201-101733-marostegui.json
  • 10:11 hashar: Restarting CI Jenkins on contint2002
  • 10:10 btullis@deploy2002: Finished deploy [analytics/superset/deploy@26c0d49]: (no justification provided) (duration: 00m 59s)
  • 10:09 btullis@deploy2002: Started deploy [analytics/superset/deploy@26c0d49]: (no justification provided)
  • 10:01 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: JRE update for DSA 5604 - klausman@cumin2002
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T355609)', diff saved to https://phabricator.wikimedia.org/P56048 and previous config saved to /var/cache/conftool/dbconfig/20240201-095150-marostegui.json
  • 09:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 09:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T355609)', diff saved to https://phabricator.wikimedia.org/P56047 and previous config saved to /var/cache/conftool/dbconfig/20240201-095128-marostegui.json
  • 09:49 joal@deploy2002: Finished deploy [airflow-dags/analytics@6b84b7a]: (no justification provided) (duration: 00m 28s)
  • 09:49 joal@deploy2002: Started deploy [airflow-dags/analytics@6b84b7a]: (no justification provided)
  • 09:43 klausman@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: JRE update for DSA 5604 - klausman@cumin2002
  • 09:43 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: JRE update for DSA 5604 - klausman@cumin2002
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P56046 and previous config saved to /var/cache/conftool/dbconfig/20240201-093621-marostegui.json
  • 09:30 vgutierrez@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1001.eqiad.wmnet
  • 09:26 vgutierrez@cumin2002: START - Cookbook sre.hosts.reboot-single for host acmechief1001.eqiad.wmnet
  • 09:25 klausman@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: JRE update for DSA 5604 - klausman@cumin2002
  • 09:24 vgutierrez@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1002.eqiad.wmnet
  • 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P56045 and previous config saved to /var/cache/conftool/dbconfig/20240201-092115-marostegui.json
  • 09:20 vgutierrez@cumin2002: START - Cookbook sre.hosts.reboot-single for host acmechief1002.eqiad.wmnet
  • 09:18 vgutierrez@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2001.codfw.wmnet
  • 09:14 vgutierrez@cumin2002: START - Cookbook sre.hosts.reboot-single for host acmechief2001.codfw.wmnet
  • 09:12 vgutierrez@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2002.codfw.wmnet
  • 09:08 vgutierrez@cumin2002: START - Cookbook sre.hosts.reboot-single for host acmechief2002.codfw.wmnet
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T355609)', diff saved to https://phabricator.wikimedia.org/P56044 and previous config saved to /var/cache/conftool/dbconfig/20240201-090607-marostegui.json
  • 08:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After switchover', diff saved to https://phabricator.wikimedia.org/P56043 and previous config saved to /var/cache/conftool/dbconfig/20240201-085743-root.json
  • 08:52 hashar: Restarted primary Gerrit on gerrit1003
  • 08:44 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
  • 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After switchover', diff saved to https://phabricator.wikimedia.org/P56042 and previous config saved to /var/cache/conftool/dbconfig/20240201-084238-root.json
  • 08:42 hashar: Restarting Gerrit replica on gerrit2002
  • 08:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2119 (T355609)', diff saved to https://phabricator.wikimedia.org/P56041 and previous config saved to /var/cache/conftool/dbconfig/20240201-084126-marostegui.json
  • 08:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T355609)', diff saved to https://phabricator.wikimedia.org/P56040 and previous config saved to /var/cache/conftool/dbconfig/20240201-084104-marostegui.json
  • 08:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
  • 08:33 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
  • 08:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After switchover', diff saved to https://phabricator.wikimedia.org/P56039 and previous config saved to /var/cache/conftool/dbconfig/20240201-082733-root.json
  • 08:26 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
  • 08:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P56038 and previous config saved to /var/cache/conftool/dbconfig/20240201-082558-marostegui.json
  • 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
  • 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 08:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 08:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
  • 08:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After switchover', diff saved to https://phabricator.wikimedia.org/P56036 and previous config saved to /var/cache/conftool/dbconfig/20240201-081228-root.json
  • 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P56035 and previous config saved to /var/cache/conftool/dbconfig/20240201-081051-marostegui.json
  • 07:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After switchover', diff saved to https://phabricator.wikimedia.org/P56034 and previous config saved to /var/cache/conftool/dbconfig/20240201-075723-root.json
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T355609)', diff saved to https://phabricator.wikimedia.org/P56033 and previous config saved to /var/cache/conftool/dbconfig/20240201-075545-marostegui.json
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 100%: After switchover ', diff saved to https://phabricator.wikimedia.org/P56032 and previous config saved to /var/cache/conftool/dbconfig/20240201-074520-root.json
  • 07:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After switchover', diff saved to https://phabricator.wikimedia.org/P56031 and previous config saved to /var/cache/conftool/dbconfig/20240201-074218-root.json
  • 07:33 slyngs: Failover debmonitor to new Bookworm host
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2110 (T355609)', diff saved to https://phabricator.wikimedia.org/P56030 and previous config saved to /var/cache/conftool/dbconfig/20240201-073053-marostegui.json
  • 07:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 07:30 moritzm: installing openjdk-11 security updates
  • 07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T355609)', diff saved to https://phabricator.wikimedia.org/P56029 and previous config saved to /var/cache/conftool/dbconfig/20240201-073031-marostegui.json
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 75%: After switchover ', diff saved to https://phabricator.wikimedia.org/P56028 and previous config saved to /var/cache/conftool/dbconfig/20240201-073015-root.json
  • 07:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2104 (re)pooling @ 1%: After switchover', diff saved to https://phabricator.wikimedia.org/P56027 and previous config saved to /var/cache/conftool/dbconfig/20240201-072713-root.json
  • 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 T356374', diff saved to https://phabricator.wikimedia.org/P56026 and previous config saved to /var/cache/conftool/dbconfig/20240201-071934-root.json
  • 07:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2107 to s2 primary and set section read-write T356374', diff saved to https://phabricator.wikimedia.org/P56025 and previous config saved to /var/cache/conftool/dbconfig/20240201-071831-marostegui.json
  • 07:18 marostegui@cumin1002: dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - T356374', diff saved to https://phabricator.wikimedia.org/P56024 and previous config saved to /var/cache/conftool/dbconfig/20240201-071807-marostegui.json
  • 07:17 marostegui: Starting s2 codfw failover from db2104 to db2107 - T356374
  • 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P56023 and previous config saved to /var/cache/conftool/dbconfig/20240201-071524-marostegui.json
  • 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 50%: After switchover ', diff saved to https://phabricator.wikimedia.org/P56022 and previous config saved to /var/cache/conftool/dbconfig/20240201-071510-root.json
  • 07:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 07:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 07:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 07:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 07:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 07:01 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 07:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 07:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s2 T356374
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2107 with weight 0 T356374', diff saved to https://phabricator.wikimedia.org/P56021 and previous config saved to /var/cache/conftool/dbconfig/20240201-070057-marostegui.json
  • 07:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s2 T356374
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P56020 and previous config saved to /var/cache/conftool/dbconfig/20240201-070018-marostegui.json
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 25%: After switchover ', diff saved to https://phabricator.wikimedia.org/P56019 and previous config saved to /var/cache/conftool/dbconfig/20240201-070005-root.json
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T355609)', diff saved to https://phabricator.wikimedia.org/P56018 and previous config saved to /var/cache/conftool/dbconfig/20240201-064511-marostegui.json
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 10%: After switchover ', diff saved to https://phabricator.wikimedia.org/P56017 and previous config saved to /var/cache/conftool/dbconfig/20240201-064500-root.json
  • 06:34 marostegui@deploy2002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc1 master (T356068) (duration: 08m 53s)
  • 06:29 marostegui@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 5%: After switchover ', diff saved to https://phabricator.wikimedia.org/P56016 and previous config saved to /var/cache/conftool/dbconfig/20240201-062955-root.json
  • 06:28 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:27 marostegui@deploy2002: marostegui: Backport for ProductionServices.php: Promote pc2014 to pc1 master (T356068) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:25 marostegui@deploy2002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc1 master (T356068)
  • 06:22 marostegui@deploy2002: Finished scap: Backport for Revert "db-production.php: Disable writes on es5" (duration: 08m 10s)
  • 06:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: Primary switchover pc1 T356068
  • 06:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on pc[2011,2014].codfw.wmnet,pc[1011,1014].eqiad.wmnet with reason: Primary switchover pc1 T356068
  • 06:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2106 (T355609)', diff saved to https://phabricator.wikimedia.org/P56015 and previous config saved to /var/cache/conftool/dbconfig/20240201-062128-marostegui.json
  • 06:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 06:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 06:15 marostegui@deploy2002: marostegui: Continuing with sync
  • 06:15 marostegui@deploy2002: marostegui: Backport for Revert "db-production.php: Disable writes on es5" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'es2024 (re)pooling @ 1%: After switchover ', diff saved to https://phabricator.wikimedia.org/P56014 and previous config saved to /var/cache/conftool/dbconfig/20240201-061449-root.json
  • 06:13 marostegui@deploy2002: Started scap: Backport for Revert "db-production.php: Disable writes on es5"
  • 06:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2024 T356235', diff saved to https://phabricator.wikimedia.org/P56013 and previous config saved to /var/cache/conftool/dbconfig/20240201-061041-root.json
  • 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2023 to es5 primary T356235', diff saved to https://phabricator.wikimedia.org/P56012 and previous config saved to /var/cache/conftool/dbconfig/20240201-060853-root.json
  • 06:07 marostegui: Starting es4 codfw failover from es2024 to es2023 - T356235
  • 06:02 marostegui@deploy2002: Finished scap: Backport for db-production.php: Disable writes on es5 (T356235) (duration: 08m 19s)
  • 05:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 05:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 05:55 marostegui@deploy2002: marostegui: Continuing with sync
  • 05:55 marostegui@deploy2002: marostegui: Backport for db-production.php: Disable writes on es5 (T356235) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 05:53 marostegui@deploy2002: Started scap: Backport for db-production.php: Disable writes on es5 (T356235)
  • 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'Set es2023 with weight 0 T356235', diff saved to https://phabricator.wikimedia.org/P56011 and previous config saved to /var/cache/conftool/dbconfig/20240201-055240-marostegui.json
  • 05:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T356235
  • 05:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 T356235
  • 02:55 ejegg: civicrm upgraded from 6e1e0d21 to 21bf2138
  • 00:20 urbanecm@deploy2002: Finished scap: Backport for testwiki: Enable conditional defaults for 4 Echo properties (T353225) (duration: 08m 09s)
  • 00:13 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 00:13 urbanecm@deploy2002: urbanecm: Backport for testwiki: Enable conditional defaults for 4 Echo properties (T353225) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 00:12 urbanecm@deploy2002: Started scap: Backport for testwiki: Enable conditional defaults for 4 Echo properties (T353225)


Other archives

2000s

2010s

2020s