Server Admin Log/Archive 49

From Wikitech
Jump to navigation Jump to search

2022-02-28

  • 22:36 ebernhardson: start in-place reindex of kmwiki kmwiktionary and kmwikibooks on cirrus cloudelsatic cluster T299707
  • 22:00 tzatziki: running extensions/SecurePoll/cli/wm-scripts/ucoc/populateEditCount.php on each wiki (s1 thru s8 simultaneously) (T302433)
  • 21:39 urbanecm: UTC late B&C window done
  • 21:38 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.23/extensions/VisualEditor/modules/ve-mw/init/targets: e22e4d5: b4dd4c4: VisualEditor backports (T302746) (duration: 00m 51s)
  • 21:30 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.23/includes/htmlform/: 67831a3: Revert "htmlform: Replace some uses of isHidden to isDisabled" (T302512) (duration: 00m 48s)
  • 21:24 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.23/extensions/GrowthExperiments/includes/Specials/SpecialMentorDashboard.php: 706c2bc: Mentor dashboard: Mark mentor-tools as stable (T280307) (duration: 00m 49s)
  • 20:45 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 20:45 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 20:21 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 20:20 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 20:03 tzatziki: creating ucoc_edits table on each wiki for elections voterlist (T302433)
  • 19:51 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host datahubsearch1003.eqiad.wmnet
  • 19:50 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:50 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:44 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 19:38 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:28 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:20 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 19:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:18 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 19:18 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host datahubsearch1003.eqiad.wmnet
  • 19:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:13 rzl@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:09 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:05 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 18:52 bblack@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 18:52 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 18:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2007.codfw.wmnet
  • 18:47 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:38 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 7 days, 0:00:00 on datahubsearch1002.eqiad.wmnet with reason: Node is being set up for first time and puppet run failed
  • 18:38 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on datahubsearch1002.eqiad.wmnet with reason: Node is being set up for first time and puppet run failed
  • 18:30 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 18:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2007.codfw.wmnet
  • 18:20 mutante: phabricator/diffusion - disable IO and hide http and ssh URIs for source repo 'word2vec' - it's still possible to pull and push via https (operation/debs/word2vec) - https://phabricator.wikimedia.org/source/word2vec/ - https://en.wikipedia.org/wiki/Word2vec T296022
  • 18:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2007.codfw.wmnet with reason: Remove from Ganeti cluster for decom
  • 18:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2007.codfw.wmnet with reason: Remove from Ganeti cluster for decom
  • 18:04 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host datahubsearch1002.eqiad.wmnet
  • 17:59 mutante: phabricator/diffusion - disable http and ssh URIs for source repo "iltools" - T296022 - https://commons.wikimedia.org/wiki/User_talk:Inductiveload#c-Inductiveload-2022-02-25T22%3A26%3A00.000Z-Mutante-2022-02-25T20%3A37%3A00.000Z
  • 17:51 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:48 bblack: lvs1017-20 (all eqiad lvs) - stopping puppet to attempt deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/765311
  • 17:47 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:45 sukhe: rolling restart of anycast-hc.service on durum* hosts for security updates
  • 17:42 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:40 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:40 sukhe: rolling restart of anycast-hc.service on doh* hosts for security updates
  • 17:35 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 17:35 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host datahubsearch1002.eqiad.wmnet
  • 17:35 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:28 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:21 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2022.codfw.wmnet with OS bullseye
  • 17:21 ebernhardson: manual trigger of cirrus SaneitizeJobs for with 2hr refresh
  • 17:21 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:15 razzi@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host datahubsearch1002.eqiad.wmnet
  • 17:15 razzi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2022.codfw.wmnet with reason: host reimage
  • 17:08 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2022.codfw.wmnet with reason: host reimage
  • 17:05 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 17:05 razzi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:00 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:58 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:56 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 16:56 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host datahubsearch1002.eqiad.wmnet
  • 16:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2022.codfw.wmnet with OS bullseye
  • 16:53 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:51 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:51 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:50 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:45 papaul: rebooting scs-a1-codfw to clear librenms alert
  • 16:42 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2001.codfw.wmnet
  • 16:42 klausman@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:33 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 16:32 klausman@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:27 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 16:27 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2001.codfw.wmnet
  • 16:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2021.codfw.wmnet with OS bullseye
  • 16:13 klausman@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:07 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 16:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2021.codfw.wmnet with reason: host reimage
  • 16:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2021.codfw.wmnet with reason: host reimage
  • 16:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:59 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:59 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:58 klausman@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ml-etcd-staging2001
  • 15:56 vgutierrez: rolling upgrade to HAProxy 2.4.14 on HAProxy caching nodes - T290005
  • 15:54 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:53 klausman@cumin2002: START - Cookbook sre.hosts.decommission for hosts ml-etcd-staging2001
  • 15:53 klausman@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts ml-etcd-staging2001
  • 15:52 klausman@cumin2002: START - Cookbook sre.hosts.decommission for hosts ml-etcd-staging2001
  • 15:50 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2021.codfw.wmnet with OS bullseye
  • 15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:44 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:33 milimetric@deploy1002: Finished deploy [analytics/refinery@84a0770] (hadoop-test): Add a few wikis to the sqoop list (duration: 07m 16s)
  • 15:30 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:26 milimetric@deploy1002: Started deploy [analytics/refinery@84a0770] (hadoop-test): Add a few wikis to the sqoop list
  • 15:25 milimetric@deploy1002: Finished deploy [analytics/refinery@84a0770] (thin): Add a few wikis to the sqoop list (duration: 00m 08s)
  • 15:25 milimetric@deploy1002: Started deploy [analytics/refinery@84a0770] (thin): Add a few wikis to the sqoop list
  • 15:23 milimetric@deploy1002: Finished deploy [analytics/refinery@84a0770]: Add a few wikis to the sqoop list (duration: 21m 18s)
  • 15:18 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host kubernetes2020.codfw.wmnet with OS bullseye
  • 15:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2020.codfw.wmnet with reason: host reimage
  • 15:06 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@0a2ffb8]: (no justification provided) (duration: 00m 07s)
  • 15:06 ntsako@deploy1002: Started deploy [airflow-dags/analytics@0a2ffb8]: (no justification provided)
  • 15:04 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2020.codfw.wmnet with reason: host reimage
  • 15:02 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I616f56 (duration: 00m 49s)
  • 15:02 milimetric@deploy1002: Started deploy [analytics/refinery@84a0770]: Add a few wikis to the sqoop list
  • 14:53 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:50 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2020.codfw.wmnet with OS bullseye
  • 14:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2019.codfw.wmnet with OS bullseye
  • 14:44 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:43 klausman@cumin2001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-etcd-staging2001.codfw.wmnet
  • 14:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2019.codfw.wmnet with reason: host reimage
  • 14:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2019.codfw.wmnet with reason: host reimage
  • 14:33 klausman@cumin2001: START - Cookbook sre.ganeti.makevm for new host ml-etcd-staging2001.codfw.wmnet
  • 14:20 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2019.codfw.wmnet with OS bullseye
  • 14:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2018.codfw.wmnet with OS bullseye
  • 14:09 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 14:09 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 14:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2018.codfw.wmnet with reason: host reimage
  • 14:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2018.codfw.wmnet with reason: host reimage
  • 14:03 jelto: update gitlab-ce to 14.7.4 on all GitLab hosts
  • 14:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@75e8eb7]: (no justification provided) (duration: 00m 14s)
  • 14:00 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 14:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@75e8eb7]: (no justification provided)
  • 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T302185)', diff saved to https://phabricator.wikimedia.org/P21600 and previous config saved to /var/cache/conftool/dbconfig/20220228-135158-ladsgroup.json
  • 13:50 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2018.codfw.wmnet with OS bullseye
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P21599 and previous config saved to /var/cache/conftool/dbconfig/20220228-133653-ladsgroup.json
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P21598 and previous config saved to /var/cache/conftool/dbconfig/20220228-132148-ladsgroup.json
  • 13:14 moritzm: restarting apache on puppet masters to pick up expat security update
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T302185)', diff saved to https://phabricator.wikimedia.org/P21597 and previous config saved to /var/cache/conftool/dbconfig/20220228-130644-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1111.eqiad.wmnet with OS bullseye
  • 12:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1111.eqiad.wmnet with reason: host reimage
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Your commit message', diff saved to https://phabricator.wikimedia.org/P21596 and previous config saved to /var/cache/conftool/dbconfig/20220228-124454-ladsgroup.json
  • 12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1111.eqiad.wmnet with reason: host reimage
  • 12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1111.eqiad.wmnet with OS bullseye
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T302185)', diff saved to https://phabricator.wikimedia.org/P21594 and previous config saved to /var/cache/conftool/dbconfig/20220228-123008-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 12:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5011.eqsin.wmnet with OS buster
  • 12:24 vgutierrez: pool cp5011 running HAProxy as TLS termination layer - T290005 T271421
  • 12:22 vgutierrez: vgutierrez@apt1001:~$ sudo -i reprepro --component thirdparty/haproxy24 update buster-wikimedia - T290005
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300992)', diff saved to https://phabricator.wikimedia.org/P21593 and previous config saved to /var/cache/conftool/dbconfig/20220228-122039-ladsgroup.json
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P21592 and previous config saved to /var/cache/conftool/dbconfig/20220228-120535-ladsgroup.json
  • 11:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5011.eqsin.wmnet with reason: host reimage
  • 11:55 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5011.eqsin.wmnet with reason: host reimage
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P21591 and previous config saved to /var/cache/conftool/dbconfig/20220228-115030-ladsgroup.json
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T302185)', diff saved to https://phabricator.wikimedia.org/P21590 and previous config saved to /var/cache/conftool/dbconfig/20220228-114230-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300992)', diff saved to https://phabricator.wikimedia.org/P21589 and previous config saved to /var/cache/conftool/dbconfig/20220228-113525-ladsgroup.json
  • 11:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5011.eqsin.wmnet with OS buster
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P21588 and previous config saved to /var/cache/conftool/dbconfig/20220228-112726-ladsgroup.json
  • 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T300992)', diff saved to https://phabricator.wikimedia.org/P21587 and previous config saved to /var/cache/conftool/dbconfig/20220228-111700-ladsgroup.json
  • 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1088.eqiad.wmnet with OS buster
  • 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P21586 and previous config saved to /var/cache/conftool/dbconfig/20220228-111221-ladsgroup.json
  • 11:09 vgutierrez: pool cp1088 running HAProxy as TLS termination layer - T290005 T271421
  • 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T302185)', diff saved to https://phabricator.wikimedia.org/P21585 and previous config saved to /var/cache/conftool/dbconfig/20220228-105716-ladsgroup.json
  • 10:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300992)', diff saved to https://phabricator.wikimedia.org/P21584 and previous config saved to /var/cache/conftool/dbconfig/20220228-105447-ladsgroup.json
  • 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1114.eqiad.wmnet with OS bullseye
  • 10:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1088.eqiad.wmnet with reason: host reimage
  • 10:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1088.eqiad.wmnet with reason: host reimage
  • 10:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P21583 and previous config saved to /var/cache/conftool/dbconfig/20220228-103942-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1114.eqiad.wmnet with reason: host reimage
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1114.eqiad.wmnet with reason: host reimage
  • 10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1088.eqiad.wmnet with OS buster
  • 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P21582 and previous config saved to /var/cache/conftool/dbconfig/20220228-102438-ladsgroup.json
  • 10:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1114.eqiad.wmnet with OS bullseye
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T302185)', diff saved to https://phabricator.wikimedia.org/P21581 and previous config saved to /var/cache/conftool/dbconfig/20220228-101815-ladsgroup.json
  • 10:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 10:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T302185)', diff saved to https://phabricator.wikimedia.org/P21580 and previous config saved to /var/cache/conftool/dbconfig/20220228-101726-ladsgroup.json
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300992)', diff saved to https://phabricator.wikimedia.org/P21579 and previous config saved to /var/cache/conftool/dbconfig/20220228-100933-ladsgroup.json
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P21578 and previous config saved to /var/cache/conftool/dbconfig/20220228-100221-ladsgroup.json
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T300992)', diff saved to https://phabricator.wikimedia.org/P21577 and previous config saved to /var/cache/conftool/dbconfig/20220228-095056-ladsgroup.json
  • 09:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P21576 and previous config saved to /var/cache/conftool/dbconfig/20220228-094717-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T302185)', diff saved to https://phabricator.wikimedia.org/P21575 and previous config saved to /var/cache/conftool/dbconfig/20220228-093212-ladsgroup.json
  • 09:29 volans@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300992)', diff saved to https://phabricator.wikimedia.org/P21574 and previous config saved to /var/cache/conftool/dbconfig/20220228-092830-ladsgroup.json
  • 09:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1126.eqiad.wmnet with OS bullseye
  • 09:22 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:16 moritzm: restarting Hue to pick up expat security updates
  • 09:13 moritzm: restarting turnilo to pick up expat security updates
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P21573 and previous config saved to /var/cache/conftool/dbconfig/20220228-091325-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1126.eqiad.wmnet with reason: host reimage
  • 09:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1126.eqiad.wmnet with reason: host reimage
  • 09:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1126.eqiad.wmnet with OS bullseye
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P21572 and previous config saved to /var/cache/conftool/dbconfig/20220228-085820-ladsgroup.json
  • 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T302185)', diff saved to https://phabricator.wikimedia.org/P21571 and previous config saved to /var/cache/conftool/dbconfig/20220228-085329-ladsgroup.json
  • 08:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 08:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T302185)', diff saved to https://phabricator.wikimedia.org/P21570 and previous config saved to /var/cache/conftool/dbconfig/20220228-085224-ladsgroup.json
  • 08:51 moritzm: installing expat security updates
  • 08:47 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300992)', diff saved to https://phabricator.wikimedia.org/P21567 and previous config saved to /var/cache/conftool/dbconfig/20220228-084316-ladsgroup.json
  • 08:39 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P21566 and previous config saved to /var/cache/conftool/dbconfig/20220228-083720-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P21564 and previous config saved to /var/cache/conftool/dbconfig/20220228-082215-ladsgroup.json
  • 08:10 taavi: UTC morning deploys done
  • 08:09 taavi@deploy1002: Synchronized logos/config.yaml: Config: Change temporary logo for slwiki (T302661) (duration: 00m 48s)
  • 08:09 taavi@deploy1002: Synchronized wmf-config/logos.php: Config: Change temporary logo for slwiki (T302661) (duration: 00m 48s)
  • 08:08 taavi@deploy1002: Synchronized static/images/project-logos: Config: Change temporary logo for slwiki (T302661) (duration: 00m 50s)
  • 08:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T302185)', diff saved to https://phabricator.wikimedia.org/P21563 and previous config saved to /var/cache/conftool/dbconfig/20220228-080710-ladsgroup.json
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T300992)', diff saved to https://phabricator.wikimedia.org/P21562 and previous config saved to /var/cache/conftool/dbconfig/20220228-080613-ladsgroup.json
  • 08:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 08:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T300992)', diff saved to https://phabricator.wikimedia.org/P21561 and previous config saved to /var/cache/conftool/dbconfig/20220228-080559-ladsgroup.json
  • 08:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1177.eqiad.wmnet with OS bullseye
  • 08:00 godog: enable notifications for thanos-be1003 in icinga and clear up /srv/swift-storage/sdm1 since it was filling up /
  • 07:58 moritzm: drain instances off ganeti2007 for eventual decom
  • 07:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P21560 and previous config saved to /var/cache/conftool/dbconfig/20220228-075054-ladsgroup.json
  • 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1177.eqiad.wmnet with reason: host reimage
  • 07:45 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:44 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:43 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1177.eqiad.wmnet with reason: host reimage
  • 07:42 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P21559 and previous config saved to /var/cache/conftool/dbconfig/20220228-073550-ladsgroup.json
  • 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1177.eqiad.wmnet with OS bullseye
  • 07:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T302185)', diff saved to https://phabricator.wikimedia.org/P21558 and previous config saved to /var/cache/conftool/dbconfig/20220228-072546-ladsgroup.json
  • 07:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 07:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T302185)', diff saved to https://phabricator.wikimedia.org/P21557 and previous config saved to /var/cache/conftool/dbconfig/20220228-072314-ladsgroup.json
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T300992)', diff saved to https://phabricator.wikimedia.org/P21556 and previous config saved to /var/cache/conftool/dbconfig/20220228-072045-ladsgroup.json
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P21555 and previous config saved to /var/cache/conftool/dbconfig/20220228-070809-ladsgroup.json
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T300992)', diff saved to https://phabricator.wikimedia.org/P21554 and previous config saved to /var/cache/conftool/dbconfig/20220228-070148-ladsgroup.json
  • 07:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 07:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P21553 and previous config saved to /var/cache/conftool/dbconfig/20220228-065304-ladsgroup.json
  • 06:42 XioNoX: configure BGP between codfw and eqdfw
  • 06:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T302185)', diff saved to https://phabricator.wikimedia.org/P21552 and previous config saved to /var/cache/conftool/dbconfig/20220228-063800-ladsgroup.json
  • 06:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1178.eqiad.wmnet with OS bullseye
  • 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 06:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 06:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300992)', diff saved to https://phabricator.wikimedia.org/P21551 and previous config saved to /var/cache/conftool/dbconfig/20220228-062236-ladsgroup.json
  • 06:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: host reimage
  • 06:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: host reimage
  • 06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P21550 and previous config saved to /var/cache/conftool/dbconfig/20220228-060731-ladsgroup.json
  • 06:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1178.eqiad.wmnet with OS bullseye
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T302185)', diff saved to https://phabricator.wikimedia.org/P21549 and previous config saved to /var/cache/conftool/dbconfig/20220228-055626-ladsgroup.json
  • 05:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 05:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T302185)', diff saved to https://phabricator.wikimedia.org/P21548 and previous config saved to /var/cache/conftool/dbconfig/20220228-055530-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P21547 and previous config saved to /var/cache/conftool/dbconfig/20220228-055226-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21546 and previous config saved to /var/cache/conftool/dbconfig/20220228-054025-ladsgroup.json
  • 05:38 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.23/includes/content/ContentHandler.php: Backport: ContentHandler: Use ParserOutputAccess for accessing ParserOutput (T302620) (duration: 00m 49s)
  • 05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300992)', diff saved to https://phabricator.wikimedia.org/P21545 and previous config saved to /var/cache/conftool/dbconfig/20220228-053721-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21544 and previous config saved to /var/cache/conftool/dbconfig/20220228-052521-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T300992)', diff saved to https://phabricator.wikimedia.org/P21543 and previous config saved to /var/cache/conftool/dbconfig/20220228-051905-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 05:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T302185)', diff saved to https://phabricator.wikimedia.org/P21542 and previous config saved to /var/cache/conftool/dbconfig/20220228-051016-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1172.eqiad.wmnet with OS bullseye
  • 04:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 04:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 04:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 04:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 04:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1172.eqiad.wmnet with reason: host reimage
  • 04:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1172.eqiad.wmnet with reason: host reimage
  • 04:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1172.eqiad.wmnet with OS bullseye
  • 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T302185)', diff saved to https://phabricator.wikimedia.org/P21541 and previous config saved to /var/cache/conftool/dbconfig/20220228-043003-ladsgroup.json
  • 04:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 04:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance

2022-02-27

  • 20:42 XioNoX: configure OSPF between cr2-drmrs and cr2-eqdfw

2022-02-25

  • 23:32 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 23:30 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21540 and previous config saved to /var/cache/conftool/dbconfig/20220225-213704-ladsgroup.json
  • 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21539 and previous config saved to /var/cache/conftool/dbconfig/20220225-212159-ladsgroup.json
  • 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21538 and previous config saved to /var/cache/conftool/dbconfig/20220225-210654-ladsgroup.json
  • 21:02 ryankemper: [WDQS] Restarted wdqs eqiad exporters: `ryankemper@cumin1001:~$ sudo -E cumin -b 1 'wdqs1*' 'systemctl restart prometheus-blazegraph-exporter-wdqs-blazegraph.service'`
  • 21:01 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good. Still looking into `Reduced availability for job jmx_wdqs_updater`; will try restarting blazegraph exporters in eqiad
  • 20:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21537 and previous config saved to /var/cache/conftool/dbconfig/20220225-205149-ladsgroup.json
  • 20:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21536 and previous config saved to /var/cache/conftool/dbconfig/20220225-204844-ladsgroup.json
  • 20:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 20:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 20:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21535 and previous config saved to /var/cache/conftool/dbconfig/20220225-204836-ladsgroup.json
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21534 and previous config saved to /var/cache/conftool/dbconfig/20220225-203331-ladsgroup.json
  • 20:31 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 20:31 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 20:31 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 20:30 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@5d384a5]: 0.3.104 (duration: 07m 18s)
  • 20:23 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.104` on canary `wdqs1003`; proceeding to rest of fleet
  • 20:22 ryankemper@deploy1002: Started deploy [wdqs/wdqs@5d384a5]: 0.3.104
  • 20:22 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.104`. Pre-deploy tests passing on canary `wdqs1003`
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21533 and previous config saved to /var/cache/conftool/dbconfig/20220225-201826-ladsgroup.json
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21532 and previous config saved to /var/cache/conftool/dbconfig/20220225-200322-ladsgroup.json
  • 19:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21531 and previous config saved to /var/cache/conftool/dbconfig/20220225-195917-ladsgroup.json
  • 19:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 19:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 19:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 19:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 19:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21530 and previous config saved to /var/cache/conftool/dbconfig/20220225-195658-ladsgroup.json
  • 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21529 and previous config saved to /var/cache/conftool/dbconfig/20220225-194153-ladsgroup.json
  • 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21528 and previous config saved to /var/cache/conftool/dbconfig/20220225-192649-ladsgroup.json
  • 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21527 and previous config saved to /var/cache/conftool/dbconfig/20220225-191144-ladsgroup.json
  • 19:11 jgleeson: payments updated from 4638c0ec to 3dfac3b2
  • 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21526 and previous config saved to /var/cache/conftool/dbconfig/20220225-190939-ladsgroup.json
  • 19:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 19:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 19:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21525 and previous config saved to /var/cache/conftool/dbconfig/20220225-190737-ladsgroup.json
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21524 and previous config saved to /var/cache/conftool/dbconfig/20220225-185233-ladsgroup.json
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21523 and previous config saved to /var/cache/conftool/dbconfig/20220225-183728-ladsgroup.json
  • 18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21522 and previous config saved to /var/cache/conftool/dbconfig/20220225-182223-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21521 and previous config saved to /var/cache/conftool/dbconfig/20220225-181918-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21520 and previous config saved to /var/cache/conftool/dbconfig/20220225-181911-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21519 and previous config saved to /var/cache/conftool/dbconfig/20220225-180406-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21518 and previous config saved to /var/cache/conftool/dbconfig/20220225-174901-ladsgroup.json
  • 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21517 and previous config saved to /var/cache/conftool/dbconfig/20220225-173356-ladsgroup.json
  • 17:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-puppet-dashboard updates: better error messages and code cleanup (prod) (duration: 08m 20s)
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21516 and previous config saved to /var/cache/conftool/dbconfig/20220225-172845-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21515 and previous config saved to /var/cache/conftool/dbconfig/20220225-172837-ladsgroup.json
  • 17:21 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-puppet-dashboard updates: better error messages and code cleanup (prod)
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21514 and previous config saved to /var/cache/conftool/dbconfig/20220225-171333-ladsgroup.json
  • 17:12 ebernhardson: manual trigger of cirrus SaneitizeJobs for with 2hr refresh
  • 17:01 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-puppet-dashboard updates: better error messages and code cleanup (duration: 01m 57s)
  • 16:59 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-puppet-dashboard updates: better error messages and code cleanup
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21513 and previous config saved to /var/cache/conftool/dbconfig/20220225-165828-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21512 and previous config saved to /var/cache/conftool/dbconfig/20220225-164323-ladsgroup.json
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21511 and previous config saved to /var/cache/conftool/dbconfig/20220225-164020-ladsgroup.json
  • 16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:36 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3063.esams.wmnet with OS buster
  • 16:35 vgutierrez: pool cp3063 running HAProxy as TLS termination layer - T290005 T271421
  • 16:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3063.esams.wmnet with reason: host reimage
  • 16:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3063.esams.wmnet with reason: host reimage
  • 15:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3063.esams.wmnet with OS buster
  • 15:36 moritzm: imported PHP 7.4 7.4.28-1+0~20220217.59+debian10~1.gbp1950+wmf1+buster1 to component/php74 for buster-wikimedia T271736
  • 15:25 vgutierrez: pool cp5005 running HAProxy as TLS termination layer - T290005 T271421
  • 15:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5005.eqsin.wmnet with OS buster
  • 14:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5005.eqsin.wmnet with reason: host reimage
  • 14:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5005.eqsin.wmnet with reason: host reimage
  • 14:13 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: fix wmf-puppet-dashboard routes (duration: 07m 47s)
  • 14:05 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: fix wmf-puppet-dashboard routes
  • 14:04 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5005.eqsin.wmnet with OS buster
  • 13:56 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: deploying wmf-proxy-dashboard and wmf-puppet-dashboard changes for real after fixing the scap config (duration: 04m 50s)
  • 13:52 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: deploying wmf-proxy-dashboard and wmf-puppet-dashboard changes for real after fixing the scap config
  • off: restoring psql-all-dbs-20220225.sql.gz into netbox
  • 13:30 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): debugging deployment process (duration: 00m 06s)
  • 13:30 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): debugging deployment process
  • 13:30 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): debugging deployment process
  • 13:29 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): debugging deployment process (duration: 00m 05s)
  • 13:29 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): debugging deployment process
  • 12:46 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: updating wmf-proxy-dashboard on eqiad1 (duration: 02m 04s)
  • 12:44 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: updating wmf-proxy-dashboard on eqiad1
  • 12:39 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): updating wmf-proxy-dashboard (duration: 00m 37s)
  • 12:39 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): updating wmf-proxy-dashboard
  • 12:39 moritzm: drain instances off ganeti2007 T302577
  • 12:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2040.codfw.wmnet with OS buster
  • 12:32 vgutierrez: pool cp2040 running HAProxy as TLS termination layer - T290005 T271421
  • 12:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
  • 12:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2040.codfw.wmnet with reason: host reimage
  • 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2030.codfw.wmnet to ganeti01.svc.codfw.wmnet
  • 11:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2040.codfw.wmnet with OS buster
  • 11:53 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2030.codfw.wmnet to ganeti01.svc.codfw.wmnet
  • 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2029.codfw.wmnet to ganeti01.svc.codfw.wmnet
  • 11:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4025.ulsfo.wmnet with OS buster
  • 11:40 vgutierrez: pool cp4025 running HAProxy as TLS termination layer - T290005 T271421
  • 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
  • 11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
  • 11:20 XioNoX: re-activate BGP session to Seabone in esams
  • 11:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4025.ulsfo.wmnet with reason: host reimage
  • 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4025.ulsfo.wmnet with reason: host reimage
  • 11:04 moritzm: added ganeti2029 to codfw Ganeti cluster T298998
  • 10:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4025.ulsfo.wmnet with OS buster
  • 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2029.codfw.wmnet to ganeti01.svc.codfw.wmnet
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
  • 10:41 moritzm: enabled virtualisation in BIOS for ganeti2029 T298998
  • 10:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2029.codfw.wmnet with reason: Enable virtualisation in BIOS
  • 10:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2029.codfw.wmnet with reason: Enable virtualisation in BIOS
  • 10:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2029.codfw.wmnet to ganeti01.svc.codfw.wmnet
  • 10:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2029.codfw.wmnet to ganeti01.svc.codfw.wmnet
  • 10:17 vgutierrez: rolling upgrade to HAProxy 2.4.13 on HAProxy cache nodes - T290005
  • 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
  • 09:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
  • 02:43 cstone: Donation Interface revision changed from a6a9b63e to 4638c0ec

2022-02-24

  • 23:35 ryankemper: T302526 Deployed https://gerrit.wikimedia.org/r/765652 and ran puppet across wcqs*
  • 22:06 mutante: static-bugzilla.wikimedia.org - kubernetes - deployed gerrit:765572 - first prod service behind a k8s ingress (T290966)
  • 22:05 mutante: phabricator - disabled git repo - labs-tools-harvesting-data-refinery/repository/master/
  • 21:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2086.codfw.wmnet with OS bullseye
  • 21:45 brennen: end of UTC late backport & config window
  • 21:43 dancy@deploy1002: Started scap: testing scap container image building
  • 21:43 tzatziki: removing 1 file for legal compliance
  • 21:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2085.codfw.wmnet with OS bullseye
  • 21:41 mutante: phabricator - disabled git repo "frig" - outdated fundraising stuff, checked with fr-tech, not needed T296022
  • 21:40 brennen@deploy1002: Synchronized php-1.38.0-wmf.23/includes: Backport: Revert "Revert "Revert "Show message fallback keys when using &uselang=qqx""" (duration: 00m 57s)
  • 21:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2086.codfw.wmnet with reason: host reimage
  • 21:36 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2086.codfw.wmnet with reason: host reimage
  • 21:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2085.codfw.wmnet with reason: host reimage
  • 21:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2085.codfw.wmnet with reason: host reimage
  • 21:29 brennen@deploy1002: Synchronized wmf-config/CirrusSearch-production.php: Config: cirrus: Reduce write isolation to only cloudelastic (T295705) (duration: 00m 55s)
  • 21:27 mutante: phabricator - disabling git repo rGEDS (Elasticdash) - only one commit from 2015 - T296022
  • 21:19 tzatziki: removing 1 file for legal compliance
  • 21:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2086.codfw.wmnet with OS bullseye
  • 21:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2083.codfw.wmnet with OS bullseye
  • 21:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2085.codfw.wmnet with OS bullseye
  • 21:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2084.codfw.wmnet with OS bullseye
  • 21:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2083.codfw.wmnet with reason: host reimage
  • 21:05 tzatziki: removing 4 files for legal compilance
  • 21:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2083.codfw.wmnet with reason: host reimage
  • 21:02 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: (no justification provided) (duration: 03m 18s)
  • 21:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2084.codfw.wmnet with reason: host reimage
  • 20:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2083.codfw.wmnet with OS bullseye
  • 20:58 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: (no justification provided)
  • 20:58 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2084.codfw.wmnet with reason: host reimage
  • 20:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2084.codfw.wmnet with OS bullseye
  • 20:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2084.codfw.wmnet with OS bullseye
  • 20:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2083.codfw.wmnet with OS bullseye
  • 20:04 ryankemper: T302526 `ryankemper@cumin1001:~$ sudo -E cumin -b 3 'wcqs*' 'enable-puppet "query_service: Simply jvm arg handling - T302526"; sudo run-puppet-agent'` in tmux `wcqs`
  • 20:02 ryankemper: T302526 Depooled `wcqs1001`, ran puppet agent, and restarted `wcqs-blazegraph`. Service came up healthy, proceeding to rest of wcqs fleet
  • 19:57 ryankemper: T302526 `ryankemper@cumin1001:~$ sudo -E cumin -b 6 'wdqs*' 'enable-puppet "query_service: Simply jvm arg handling - T302526"; sudo run-puppet-agent'` in tmux `deploy_window`
  • 19:55 ryankemper: T302526 Depooled canary `wdqs1003`, ran puppet agent, and restarted `wdqs-blazegraph`. Tests look good, proceeding to rest of wdqs fleet
  • 19:48 ryankemper: T302526 (Forgot to merge patch first, take two)
  • 19:48 ryankemper: T302526 Running puppet on wdqs canary: `ryankemper@wdqs1003:~$ sudo enable-puppet "query_service: Simply jvm arg handling - T302526" && sudo run-puppet-agent`
  • 19:46 ryankemper: T302526 Disabling puppet across entire query service (wdqs & wcqs) fleet for merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/761080: `ryankemper@cumin1001:~$ sudo -E cumin 'w*qs*' 'disable-puppet "query_service: Simply jvm arg handling - T302526"'`
  • 19:06 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.23 refs T300199
  • 19:00 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2084.codfw.wmnet with OS bullseye
  • 18:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2083.codfw.wmnet with OS bullseye
  • 18:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2082.codfw.wmnet with OS bullseye
  • 18:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:45 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host elastic2085.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2084.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2082.codfw.wmnet with reason: host reimage
  • 18:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2082.codfw.wmnet with reason: host reimage
  • 18:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host elastic2084.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2082.codfw.wmnet with OS bullseye
  • 18:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300774)', diff saved to https://phabricator.wikimedia.org/P21508 and previous config saved to /var/cache/conftool/dbconfig/20220224-182102-kormat.json
  • 18:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2083.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2081.codfw.wmnet with OS bullseye
  • 18:05 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21506 and previous config saved to /var/cache/conftool/dbconfig/20220224-180557-kormat.json
  • 18:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host elastic2083.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2081.codfw.wmnet with reason: host reimage
  • 18:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2082.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:02 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 18:01 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 18:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:00 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2081.codfw.wmnet with reason: host reimage
  • 18:00 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 17:59 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 17:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:50 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21504 and previous config saved to /var/cache/conftool/dbconfig/20220224-175052-kormat.json
  • 17:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host elastic2082.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts elastic[1039,1043].eqiad.wmnet
  • 17:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2081.codfw.wmnet with OS bullseye
  • 17:40 elukey: `truncate -s 1g /var/log/auth.log.1` on krb1001 to free space on the root partition
  • 17:38 elukey: `truncate -s 1g /var/log/auth.log` on krb1001 to free space on the root partition
  • 17:35 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300774)', diff saved to https://phabricator.wikimedia.org/P21503 and previous config saved to /var/cache/conftool/dbconfig/20220224-173548-kormat.json
  • 17:33 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T300774)', diff saved to https://phabricator.wikimedia.org/P21502 and previous config saved to /var/cache/conftool/dbconfig/20220224-173307-kormat.json
  • 17:33 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 17:33 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 17:33 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300774)', diff saved to https://phabricator.wikimedia.org/P21501 and previous config saved to /var/cache/conftool/dbconfig/20220224-173259-kormat.json
  • 17:32 krinkle@deploy1002: Synchronized wmf-config/: Ia61fea (duration: 00m 52s)
  • 17:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2080.codfw.wmnet with OS bullseye
  • 17:22 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic[1039,1043].eqiad.wmnet
  • 17:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2080.codfw.wmnet with reason: host reimage
  • 17:17 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21500 and previous config saved to /var/cache/conftool/dbconfig/20220224-171755-kormat.json
  • 17:16 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2080.codfw.wmnet with reason: host reimage
  • 17:11 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:11 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:11 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:11 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:02 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21499 and previous config saved to /var/cache/conftool/dbconfig/20220224-170250-kormat.json
  • 16:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2080.codfw.wmnet with OS bullseye
  • 16:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2079.codfw.wmnet with OS bullseye
  • 16:50 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:50 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:47 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300774)', diff saved to https://phabricator.wikimedia.org/P21498 and previous config saved to /var/cache/conftool/dbconfig/20220224-164745-kormat.json
  • 16:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:45 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T300774)', diff saved to https://phabricator.wikimedia.org/P21497 and previous config saved to /var/cache/conftool/dbconfig/20220224-164506-kormat.json
  • 16:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 16:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 16:45 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300774)', diff saved to https://phabricator.wikimedia.org/P21496 and previous config saved to /var/cache/conftool/dbconfig/20220224-164458-kormat.json
  • 16:44 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2079.codfw.wmnet with reason: host reimage
  • 16:38 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2079.codfw.wmnet with reason: host reimage
  • 16:34 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:29 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21495 and previous config saved to /var/cache/conftool/dbconfig/20220224-162953-kormat.json
  • 16:27 jbond: deploy new firmware fact
  • 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:24 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2079.codfw.wmnet with OS bullseye
  • 16:15 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:15 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:14 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21494 and previous config saved to /var/cache/conftool/dbconfig/20220224-161449-kormat.json
  • 16:14 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:14 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:59 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300774)', diff saved to https://phabricator.wikimedia.org/P21493 and previous config saved to /var/cache/conftool/dbconfig/20220224-155944-kormat.json
  • 15:57 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T300774)', diff saved to https://phabricator.wikimedia.org/P21492 and previous config saved to /var/cache/conftool/dbconfig/20220224-155708-kormat.json
  • 15:57 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:57 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:57 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 15:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 15:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 15:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 15:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 15:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 15:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:55 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300774)', diff saved to https://phabricator.wikimedia.org/P21491 and previous config saved to /var/cache/conftool/dbconfig/20220224-155521-kormat.json
  • 15:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:52 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:52 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:47 moritzm: restarting apache on otrs1001/ticket.wikimedia.org
  • 15:44 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:42 moritzm: restarting apache on people.w.o, planet.w.o, releases* to pick up expat update
  • 15:42 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:40 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21490 and previous config saved to /var/cache/conftool/dbconfig/20220224-154016-kormat.json
  • 15:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:37 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:36 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1032.eqiad.wmnet with OS buster
  • 15:25 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21489 and previous config saved to /var/cache/conftool/dbconfig/20220224-152512-kormat.json
  • 15:10 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300774)', diff saved to https://phabricator.wikimedia.org/P21488 and previous config saved to /var/cache/conftool/dbconfig/20220224-151007-kormat.json
  • 15:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS buster
  • 15:05 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T300774)', diff saved to https://phabricator.wikimedia.org/P21487 and previous config saved to /var/cache/conftool/dbconfig/20220224-150527-kormat.json
  • 15:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 15:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 15:05 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300774)', diff saved to https://phabricator.wikimedia.org/P21486 and previous config saved to /var/cache/conftool/dbconfig/20220224-150520-kormat.json
  • 14:50 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21484 and previous config saved to /var/cache/conftool/dbconfig/20220224-145015-kormat.json
  • 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T300992)', diff saved to https://phabricator.wikimedia.org/P21483 and previous config saved to /var/cache/conftool/dbconfig/20220224-144511-ladsgroup.json
  • 14:35 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21482 and previous config saved to /var/cache/conftool/dbconfig/20220224-143509-kormat.json
  • 14:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P21481 and previous config saved to /var/cache/conftool/dbconfig/20220224-143005-ladsgroup.json
  • 14:20 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300774)', diff saved to https://phabricator.wikimedia.org/P21480 and previous config saved to /var/cache/conftool/dbconfig/20220224-142004-kormat.json
  • 14:19 XioNoX: Prepend AS to anycast prefixes learned on the core routers - T302315
  • 14:17 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T300774)', diff saved to https://phabricator.wikimedia.org/P21479 and previous config saved to /var/cache/conftool/dbconfig/20220224-141724-kormat.json
  • 14:17 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:17 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:17 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300774)', diff saved to https://phabricator.wikimedia.org/P21478 and previous config saved to /var/cache/conftool/dbconfig/20220224-141717-kormat.json
  • 14:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P21477 and previous config saved to /var/cache/conftool/dbconfig/20220224-141501-ladsgroup.json
  • 14:02 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21476 and previous config saved to /var/cache/conftool/dbconfig/20220224-140212-kormat.json
  • 14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2121.codfw.wmnet with OS bullseye
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T300992)', diff saved to https://phabricator.wikimedia.org/P21475 and previous config saved to /var/cache/conftool/dbconfig/20220224-135955-ladsgroup.json
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T300992)', diff saved to https://phabricator.wikimedia.org/P21474 and previous config saved to /var/cache/conftool/dbconfig/20220224-135819-ladsgroup.json
  • 13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T300992)', diff saved to https://phabricator.wikimedia.org/P21473 and previous config saved to /var/cache/conftool/dbconfig/20220224-135811-ladsgroup.json
  • 13:47 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21472 and previous config saved to /var/cache/conftool/dbconfig/20220224-134707-kormat.json
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2121.codfw.wmnet with reason: host reimage
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P21471 and previous config saved to /var/cache/conftool/dbconfig/20220224-134307-ladsgroup.json
  • 13:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2121.codfw.wmnet with reason: host reimage
  • 13:32 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300774)', diff saved to https://phabricator.wikimedia.org/P21470 and previous config saved to /var/cache/conftool/dbconfig/20220224-133202-kormat.json
  • 13:29 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T300774)', diff saved to https://phabricator.wikimedia.org/P21469 and previous config saved to /var/cache/conftool/dbconfig/20220224-132923-kormat.json
  • 13:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:29 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300774)', diff saved to https://phabricator.wikimedia.org/P21468 and previous config saved to /var/cache/conftool/dbconfig/20220224-132915-kormat.json
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P21467 and previous config saved to /var/cache/conftool/dbconfig/20220224-132802-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2121.codfw.wmnet with OS bullseye
  • 13:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 10 hosts with reason: Maintenance
  • 13:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 10 hosts with reason: Maintenance
  • 13:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 13:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 13:23 Amir1: dbmaint on s7@codfw (T302363)
  • 13:14 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21466 and previous config saved to /var/cache/conftool/dbconfig/20220224-131410-kormat.json
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T300992)', diff saved to https://phabricator.wikimedia.org/P21465 and previous config saved to /var/cache/conftool/dbconfig/20220224-131257-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T300992)', diff saved to https://phabricator.wikimedia.org/P21464 and previous config saved to /var/cache/conftool/dbconfig/20220224-131041-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T300992)', diff saved to https://phabricator.wikimedia.org/P21463 and previous config saved to /var/cache/conftool/dbconfig/20220224-131033-ladsgroup.json
  • 13:02 moritzm: restarting apache/uwsgi-puppetboard on puppetboard* to pick up expat security updates
  • 12:59 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21462 and previous config saved to /var/cache/conftool/dbconfig/20220224-125905-kormat.json
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P21461 and previous config saved to /var/cache/conftool/dbconfig/20220224-125528-ladsgroup.json
  • 12:44 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300774)', diff saved to https://phabricator.wikimedia.org/P21460 and previous config saved to /var/cache/conftool/dbconfig/20220224-124401-kormat.json
  • 12:41 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T300774)', diff saved to https://phabricator.wikimedia.org/P21459 and previous config saved to /var/cache/conftool/dbconfig/20220224-124122-kormat.json
  • 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:40 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300774)', diff saved to https://phabricator.wikimedia.org/P21458 and previous config saved to /var/cache/conftool/dbconfig/20220224-124036-kormat.json
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P21457 and previous config saved to /var/cache/conftool/dbconfig/20220224-124024-ladsgroup.json
  • 12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2079.codfw.wmnet with OS bullseye
  • 12:25 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21456 and previous config saved to /var/cache/conftool/dbconfig/20220224-122532-kormat.json
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T300992)', diff saved to https://phabricator.wikimedia.org/P21455 and previous config saved to /var/cache/conftool/dbconfig/20220224-122519-ladsgroup.json
  • 12:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2079.codfw.wmnet with reason: host reimage
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T300992)', diff saved to https://phabricator.wikimedia.org/P21454 and previous config saved to /var/cache/conftool/dbconfig/20220224-122232-ladsgroup.json
  • 12:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T300992)', diff saved to https://phabricator.wikimedia.org/P21453 and previous config saved to /var/cache/conftool/dbconfig/20220224-122224-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2079.codfw.wmnet with reason: host reimage
  • 12:11 Amir1: dbmaint on s8@codfw (T302185)
  • 12:10 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21452 and previous config saved to /var/cache/conftool/dbconfig/20220224-121027-kormat.json
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P21451 and previous config saved to /var/cache/conftool/dbconfig/20220224-120720-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2079.codfw.wmnet with OS bullseye
  • 12:04 arturo: aborrero@apt1001:~$ sudo -i reprepro --component thirdparty/openstack-db update bullseye-wikimedia (T302482)
  • 12:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 11:55 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300774)', diff saved to https://phabricator.wikimedia.org/P21450 and previous config saved to /var/cache/conftool/dbconfig/20220224-115522-kormat.json
  • 11:52 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T300774)', diff saved to https://phabricator.wikimedia.org/P21449 and previous config saved to /var/cache/conftool/dbconfig/20220224-115246-kormat.json
  • 11:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 11:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P21448 and previous config saved to /var/cache/conftool/dbconfig/20220224-115215-ladsgroup.json
  • 11:52 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 11:52 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 11:52 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300774)', diff saved to https://phabricator.wikimedia.org/P21447 and previous config saved to /var/cache/conftool/dbconfig/20220224-115159-kormat.json
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T300992)', diff saved to https://phabricator.wikimedia.org/P21446 and previous config saved to /var/cache/conftool/dbconfig/20220224-113710-ladsgroup.json
  • 11:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21445 and previous config saved to /var/cache/conftool/dbconfig/20220224-113654-kormat.json
  • 11:35 kart_: Updated cxserver to 2022-02-24-035645-production (T301443, T301952)
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T300992)', diff saved to https://phabricator.wikimedia.org/P21444 and previous config saved to /var/cache/conftool/dbconfig/20220224-113453-ladsgroup.json
  • 11:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 11:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T300992)', diff saved to https://phabricator.wikimedia.org/P21443 and previous config saved to /var/cache/conftool/dbconfig/20220224-113439-ladsgroup.json
  • 11:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 11:31 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 11:28 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 11:25 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 11:23 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 11:22 moritzm: rolling restart of thanos frontend swift-proxy/apache to pick up expat security updates
  • 11:22 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 11:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21442 and previous config saved to /var/cache/conftool/dbconfig/20220224-112149-kormat.json
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P21441 and previous config saved to /var/cache/conftool/dbconfig/20220224-111935-ladsgroup.json
  • 11:07 hashar: Updated Jenkins job operations-puppet-tests-buster-docker https://gerrit.wikimedia.org/r/c/integration/config/+/765487
  • 11:06 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300774)', diff saved to https://phabricator.wikimedia.org/P21440 and previous config saved to /var/cache/conftool/dbconfig/20220224-110645-kormat.json
  • 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P21439 and previous config saved to /var/cache/conftool/dbconfig/20220224-110430-ladsgroup.json
  • 11:04 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T300774)', diff saved to https://phabricator.wikimedia.org/P21438 and previous config saved to /var/cache/conftool/dbconfig/20220224-110403-kormat.json
  • 11:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 11:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 11:03 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300774)', diff saved to https://phabricator.wikimedia.org/P21437 and previous config saved to /var/cache/conftool/dbconfig/20220224-110355-kormat.json
  • 11:03 moritzm: restarting apache/carbon-cache on graphite nodes to pickup expat update
  • 10:54 aqu@deploy1002: Finished deploy [airflow-dags/analytics@97759bf]: Set aqs/hourly start date (duration: 00m 06s)
  • 10:54 aqu@deploy1002: Started deploy [airflow-dags/analytics@97759bf]: Set aqs/hourly start date
  • 10:52 moritzm: restarting apache on main prometheus nodes to pickup expat update
  • 10:49 mmandere: enable-puppet on cp instances after finishing successfully testing varnish package component change - T302301
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T300992)', diff saved to https://phabricator.wikimedia.org/P21436 and previous config saved to /var/cache/conftool/dbconfig/20220224-104925-ladsgroup.json
  • 10:48 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21435 and previous config saved to /var/cache/conftool/dbconfig/20220224-104851-kormat.json
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T300992)', diff saved to https://phabricator.wikimedia.org/P21434 and previous config saved to /var/cache/conftool/dbconfig/20220224-104708-ladsgroup.json
  • 10:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T300992)', diff saved to https://phabricator.wikimedia.org/P21433 and previous config saved to /var/cache/conftool/dbconfig/20220224-104700-ladsgroup.json
  • 10:33 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21432 and previous config saved to /var/cache/conftool/dbconfig/20220224-103346-kormat.json
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P21431 and previous config saved to /var/cache/conftool/dbconfig/20220224-103156-ladsgroup.json
  • 10:18 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300774)', diff saved to https://phabricator.wikimedia.org/P21430 and previous config saved to /var/cache/conftool/dbconfig/20220224-101841-kormat.json
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P21429 and previous config saved to /var/cache/conftool/dbconfig/20220224-101651-ladsgroup.json
  • 10:16 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T300774)', diff saved to https://phabricator.wikimedia.org/P21428 and previous config saved to /var/cache/conftool/dbconfig/20220224-101559-kormat.json
  • 10:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 10:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 10:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:15 kormat: deploying schema change to s1 T300774
  • 10:13 mmandere: depool cp4028.ulsfo.wmnet - T302301
  • 10:02 moritzm: restarting apache on edge prometheus nodes to pickup expat update
  • 10:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T300992)', diff saved to https://phabricator.wikimedia.org/P21427 and previous config saved to /var/cache/conftool/dbconfig/20220224-100147-ladsgroup.json
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T300992)', diff saved to https://phabricator.wikimedia.org/P21426 and previous config saved to /var/cache/conftool/dbconfig/20220224-095912-ladsgroup.json
  • 09:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T300992)', diff saved to https://phabricator.wikimedia.org/P21425 and previous config saved to /var/cache/conftool/dbconfig/20220224-095904-ladsgroup.json
  • 09:58 aqu@deploy1002: Finished deploy [airflow-dags/analytics@d28cd92]: Fix aqs/hourly in production by adding memory to driver (duration: 00m 06s)
  • 09:58 aqu@deploy1002: Started deploy [airflow-dags/analytics@d28cd92]: Fix aqs/hourly in production by adding memory to driver
  • 09:58 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@d28cd92]: Fix aqs/hourly in production by adding memory to driver (duration: 00m 09s)
  • 09:58 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@d28cd92]: Fix aqs/hourly in production by adding memory to driver
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P21424 and previous config saved to /var/cache/conftool/dbconfig/20220224-094400-ladsgroup.json
  • 09:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
  • 09:31 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
  • 09:31 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P21423 and previous config saved to /var/cache/conftool/dbconfig/20220224-092855-ladsgroup.json
  • 09:25 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
  • 09:25 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 09:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
  • 09:24 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@17a70a0]: (no justification provided) (duration: 00m 08s)
  • 09:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
  • 09:24 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@17a70a0]: (no justification provided)
  • 09:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:17 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
  • 09:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:15 urbanecm: Morning B&C window is done
  • 09:14 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.23/extensions/GrowthExperiments/modules/ext.growthExperiments.StructuredTask/StructuredTaskArticleTarget.js: Backport: Structured task: Don't show dialog for confirming leaving suggestions mode upon rejection (T302463) (duration: 00m 50s)
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T300992)', diff saved to https://phabricator.wikimedia.org/P21422 and previous config saved to /var/cache/conftool/dbconfig/20220224-091350-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T300992)', diff saved to https://phabricator.wikimedia.org/P21421 and previous config saved to /var/cache/conftool/dbconfig/20220224-091132-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 09:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 09:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 09:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 09:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 09:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 09:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 09:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:01 urbanecm: Morning B&C window is overruning
  • 08:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:16 moritzm: installing expat security updates
  • 06:36 tgr_: T301030#7734236 running UpdateWeightedTags.php on eswiki
  • 02:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2078.codfw.wmnet with OS bullseye
  • 02:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2078.codfw.wmnet with reason: host reimage
  • 01:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2077.codfw.wmnet with OS bullseye
  • 01:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2078.codfw.wmnet with reason: host reimage
  • 01:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2077.codfw.wmnet with reason: host reimage
  • 01:44 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2077.codfw.wmnet with reason: host reimage
  • 01:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2078.codfw.wmnet with OS bullseye
  • 01:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2076.codfw.wmnet with OS bullseye
  • 01:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2076.codfw.wmnet with reason: host reimage
  • 01:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2077.codfw.wmnet with OS bullseye
  • 01:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2075.codfw.wmnet with OS bullseye
  • 01:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2076.codfw.wmnet with reason: host reimage
  • 01:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2075.codfw.wmnet with reason: host reimage
  • 01:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2075.codfw.wmnet with reason: host reimage
  • 01:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2076.codfw.wmnet with OS bullseye
  • 01:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2075.codfw.wmnet with OS bullseye
  • 01:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2075.codfw.wmnet with OS bullseye
  • 00:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2074.codfw.wmnet with OS bullseye
  • 00:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2075.codfw.wmnet with OS bullseye
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2073.codfw.wmnet with OS bullseye
  • 00:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2074.codfw.wmnet with reason: host reimage
  • 00:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2074.codfw.wmnet with reason: host reimage
  • 00:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2073.codfw.wmnet with reason: host reimage
  • 00:38 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2073.codfw.wmnet with reason: host reimage
  • 00:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2074.codfw.wmnet with OS bullseye
  • 00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2079.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2073.codfw.wmnet with OS bullseye
  • 00:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host elastic2079.mgmt.codfw.wmnet with reboot policy FORCED

2022-02-23

  • 23:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2069.codfw.wmnet with OS stretch
  • 23:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2069.codfw.wmnet with reason: host reimage
  • 23:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2069.codfw.wmnet with reason: host reimage
  • 22:58 mutante: phabricator - disabled empty but active repo: wikidata-query-LDFServer (WQLD) created in 2018 by qchris (T296022)
  • 22:51 mutante: phabricator - disabled empty but active repos: dibyaduttabook and xtools-H (T296022)
  • 22:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2069.codfw.wmnet with OS stretch
  • 22:37 mutante: phabricator - disabling repository dibyaduttabook
  • 22:09 reedy@deploy1002: Synchronized php-1.38.0-wmf.23/extensions/SecurePoll/cli/wm-scripts/ucoc/: (no justification provided) (duration: 00m 50s)
  • 22:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:17 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on doh[6001-6002].wikimedia.org with reason: bird6 errors expected, not serving any traffic
  • 21:17 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on doh[6001-6002].wikimedia.org with reason: bird6 errors expected, not serving any traffic
  • 21:11 dduvall@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.23 refs T300199 (duration: 01m 31s)
  • 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:10 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.23 refs T300199
  • 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:44 taavi: run CentralAuthUser::importLocalNames for FuzzyBot T302399
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T302363)', diff saved to https://phabricator.wikimedia.org/P21414 and previous config saved to /var/cache/conftool/dbconfig/20220223-194254-ladsgroup.json
  • 19:35 dancy@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/bin/make -C /srv/mwbuilder/release/make-container-image -f Makefile build-and-push-all-images GIT_BASE=https://gerrit.wikimedia.org/r/ BRANCH=master workdir_volume=/srv/mediawiki-staging mv_image_name=docker-registry.discovery.wmnet/restricted/mediawiki-multiversion webserver_image_name=docker-registry.discovery.wmnet/restricted/mediawik
  • 19:35 dancy@deploy1002: Started scap: testing scap container image building
  • 19:33 dancy@deploy1002: scap failed: CalledProcessError Command 'make -f Makefile build-and-push-all-images GIT_BASE=https://gerrit.wikimedia.org/r/ BRANCH=master workdir_volume=/srv/mediawiki-staging mv_image_name=docker-registry.discovery.wmnet/restricted/mediawiki-multiversion webserver_image_name=docker-registry.discovery.wmnet/restricted/mediawiki-webserver' returned non-zero exit status 2. (duration: 00m 03s)
  • 19:33 dancy@deploy1002: Started scap: testing scap container image building
  • 19:32 dancy@deploy1002: Started scap: testing scap container image building
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P21413 and previous config saved to /var/cache/conftool/dbconfig/20220223-192749-ladsgroup.json
  • 19:27 dancy@deploy1002: scap failed: CalledProcessError Command 'make -f Makefile build-and-push-all-images GIT_BASE=https://gerrit.wikimedia.org/r/ BRANCH=master workdir_volume=/srv/mediawiki-staging mv_image_name=docker-registry.discovery.wmnet/restricted/mediawiki-multiversion webserver_image_name=docker-registry.discovery.wmnet/restricted/mediawiki-webserver' returned non-zero exit status 2. (duration: 00m 51s)
  • 19:26 dancy@deploy1002: Started scap: testing
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P21411 and previous config saved to /var/cache/conftool/dbconfig/20220223-191245-ladsgroup.json
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T302363)', diff saved to https://phabricator.wikimedia.org/P21410 and previous config saved to /var/cache/conftool/dbconfig/20220223-185740-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1158.eqiad.wmnet with OS bullseye
  • 18:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage
  • 18:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1158.eqiad.wmnet with reason: host reimage
  • 18:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2069.codfw.wmnet with OS stretch
  • 18:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1158.eqiad.wmnet with OS bullseye
  • 18:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T302363)', diff saved to https://phabricator.wikimedia.org/P21409 and previous config saved to /var/cache/conftool/dbconfig/20220223-181350-ladsgroup.json
  • 18:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 18:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T302363)', diff saved to https://phabricator.wikimedia.org/P21408 and previous config saved to /var/cache/conftool/dbconfig/20220223-180722-ladsgroup.json
  • 17:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2069.codfw.wmnet with OS stretch
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21407 and previous config saved to /var/cache/conftool/dbconfig/20220223-175217-ladsgroup.json
  • 17:46 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@17a70a0]: (no justification provided) (duration: 00m 07s)
  • 17:46 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@17a70a0]: (no justification provided)
  • 17:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2066.codfw.wmnet with OS stretch
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21406 and previous config saved to /var/cache/conftool/dbconfig/20220223-173711-ladsgroup.json
  • 17:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
  • 17:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T302363)', diff saved to https://phabricator.wikimedia.org/P21404 and previous config saved to /var/cache/conftool/dbconfig/20220223-172206-ladsgroup.json
  • 17:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2066.codfw.wmnet with OS stretch
  • 17:14 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2066.codfw.wmnet with OS stretch
  • 17:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1127.eqiad.wmnet with OS bullseye
  • 17:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1127.eqiad.wmnet with reason: host reimage
  • 16:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1127.eqiad.wmnet with reason: host reimage
  • 16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1127.eqiad.wmnet with OS bullseye
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T302363)', diff saved to https://phabricator.wikimedia.org/P21403 and previous config saved to /var/cache/conftool/dbconfig/20220223-164453-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 16:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2066.codfw.wmnet with OS stretch
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2068.codfw.wmnet with OS stretch
  • 16:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300774)', diff saved to https://phabricator.wikimedia.org/P21401 and previous config saved to /var/cache/conftool/dbconfig/20220223-162125-kormat.json
  • 16:06 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P21400 and previous config saved to /var/cache/conftool/dbconfig/20220223-160621-kormat.json
  • 16:00 vgutierrez: vgutierrez@apt1001:~$ sudo -i reprepro --component thirdparty/haproxy24 update buster-wikimedia - T290005
  • 15:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
  • 15:52 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
  • 15:51 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P21399 and previous config saved to /var/cache/conftool/dbconfig/20220223-155116-kormat.json
  • 15:36 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2068.codfw.wmnet with OS stretch
  • 15:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300774)', diff saved to https://phabricator.wikimedia.org/P21398 and previous config saved to /var/cache/conftool/dbconfig/20220223-153611-kormat.json
  • 15:30 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T300774)', diff saved to https://phabricator.wikimedia.org/P21397 and previous config saved to /var/cache/conftool/dbconfig/20220223-153044-kormat.json
  • 15:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 15:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 15:26 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2068.codfw.wmnet with OS stretch
  • 15:19 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1032.eqiad.wmnet with OS buster
  • 15:17 moritzm: rolling restart of FPM and Apache on mediawiki canaries to pick up expat security updates
  • 15:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 15:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 15:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 15:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 15:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1033.eqiad.wmnet with OS buster
  • 15:12 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T300774)', diff saved to https://phabricator.wikimedia.org/P21396 and previous config saved to /var/cache/conftool/dbconfig/20220223-151207-kormat.json
  • 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
  • 15:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
  • 15:03 moritzm: installing expat security updates
  • 14:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1033.eqiad.wmnet with reason: host reimage
  • 14:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1031.eqiad.wmnet with OS buster
  • 14:57 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P21395 and previous config saved to /var/cache/conftool/dbconfig/20220223-145703-kormat.json
  • 14:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1033.eqiad.wmnet with reason: host reimage
  • 14:48 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2068.codfw.wmnet with OS stretch
  • 14:48 papaul: power down ms-be2068 for re-image
  • 14:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1031.eqiad.wmnet with reason: host reimage
  • 14:41 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P21394 and previous config saved to /var/cache/conftool/dbconfig/20220223-144158-kormat.json
  • 14:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1033.eqiad.wmnet with OS buster
  • 14:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS buster
  • 14:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1031.eqiad.wmnet with reason: host reimage
  • 14:36 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-wikikube
  • 14:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:26 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T300774)', diff saved to https://phabricator.wikimedia.org/P21393 and previous config saved to /var/cache/conftool/dbconfig/20220223-142652-kormat.json
  • 14:26 mmandere: import varnishkafka_1.1.0-1_amd64.deb, varnishkafka_1.1.0-1.dsc, varnishkafka-dbg_1.1.0-1_amd64.deb to main component - T302301
  • 14:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T302363)', diff saved to https://phabricator.wikimedia.org/P21392 and previous config saved to /var/cache/conftool/dbconfig/20220223-142413-ladsgroup.json
  • 14:21 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T300774)', diff saved to https://phabricator.wikimedia.org/P21391 and previous config saved to /var/cache/conftool/dbconfig/20220223-142121-kormat.json
  • 14:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300774)', diff saved to https://phabricator.wikimedia.org/P21390 and previous config saved to /var/cache/conftool/dbconfig/20220223-142113-kormat.json
  • 14:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1031.eqiad.wmnet with OS buster
  • 14:18 mmandere: import varnish-modules_0.15.0-1+wmf1.dsc, varnish-modules-dbgsym_0.15.0-1+wmf1_amd64.deb, varnish-modules_0.15.0-1+wmf1_amd64.deb to main component - T302301
  • 14:18 taavi: UTC afternoon deploys done
  • 14:17 taavi: deploy second patch for T302248
  • 14:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:12 jayme: restarting pybal on lvs1019,lvs2009 - T290966
  • 14:11 mmandere: import libvarnishapi2_6.0.10-1wm1_amd64.deb, libvarnishapi2-dbgsym_6.0.10-1wm1_amd64.deb, libvarnishapi-dev_6.0.10-1wm1_amd64.deb to main component - T302301
  • 14:11 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.23/extensions/DiscussionTools/includes/Hooks/HookUtils.php: 78f0d9d: Fix check for enabling features on mobile (T302388) (duration: 00m 49s)
  • 14:10 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.22/extensions/DiscussionTools/includes/Hooks/HookUtils.php: 815b3d1: Fix check for enabling features on mobile (T302388) (duration: 00m 50s)
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21389 and previous config saved to /var/cache/conftool/dbconfig/20220223-140908-ladsgroup.json
  • 14:08 jayme: restarting pybal on lvs1020,lvs2010 - T290966
  • 14:06 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P21388 and previous config saved to /var/cache/conftool/dbconfig/20220223-140608-kormat.json
  • 14:05 mmandere: import varnish_6.0.10-1wm1.dsc, varnish_6.0.10-1wm1_amd64.deb, varnish-dbg_6.0.6-1wm1_amd64.deb, varnish-dbgsym_6.0.10-1wm1_amd64.deb, varnish-doc_6.0.10-1wm1_all.deb to main component - T302301
  • 13:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21387 and previous config saved to /var/cache/conftool/dbconfig/20220223-135404-ladsgroup.json
  • 13:52 mmandere: import libvmod-re2_1.5.3-1.dsc and libvmod-re2_1.5.3-1_amd64.deb to main component - T302301
  • 13:51 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P21386 and previous config saved to /var/cache/conftool/dbconfig/20220223-135103-kormat.json
  • 13:46 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:45 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:45 Lucas_WMDE: Deployed patch for T302192
  • 13:41 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:41 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:39 mmandere: import libvmod-netmapper_1.9-1.dsc and libvmod-netmapper_1.9-1_amd64.deb to main component - T302301
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T302363)', diff saved to https://phabricator.wikimedia.org/P21385 and previous config saved to /var/cache/conftool/dbconfig/20220223-133858-ladsgroup.json
  • 13:38 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:37 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300774)', diff saved to https://phabricator.wikimedia.org/P21384 and previous config saved to /var/cache/conftool/dbconfig/20220223-133559-kormat.json
  • 13:30 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T300774)', diff saved to https://phabricator.wikimedia.org/P21383 and previous config saved to /var/cache/conftool/dbconfig/20220223-133031-kormat.json
  • 13:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 13:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 13:23 Krinkle: debugging on mwdebug1002
  • 13:19 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300992)', diff saved to https://phabricator.wikimedia.org/P21381 and previous config saved to /var/cache/conftool/dbconfig/20220223-131801-ladsgroup.json
  • 13:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:15 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300774)', diff saved to https://phabricator.wikimedia.org/P21380 and previous config saved to /var/cache/conftool/dbconfig/20220223-131531-kormat.json
  • 13:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1174.eqiad.wmnet with OS bullseye
  • 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P21379 and previous config saved to /var/cache/conftool/dbconfig/20220223-130255-ladsgroup.json
  • 13:00 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P21378 and previous config saved to /var/cache/conftool/dbconfig/20220223-130026-kormat.json
  • 12:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1174.eqiad.wmnet with reason: host reimage
  • 12:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1174.eqiad.wmnet with reason: host reimage
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P21377 and previous config saved to /var/cache/conftool/dbconfig/20220223-124751-ladsgroup.json
  • 12:45 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P21376 and previous config saved to /var/cache/conftool/dbconfig/20220223-124521-kormat.json
  • 12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1174.eqiad.wmnet with OS bullseye
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T302363)', diff saved to https://phabricator.wikimedia.org/P21375 and previous config saved to /var/cache/conftool/dbconfig/20220223-124027-ladsgroup.json
  • 12:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T302363)', diff saved to https://phabricator.wikimedia.org/P21374 and previous config saved to /var/cache/conftool/dbconfig/20220223-123747-ladsgroup.json
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300992)', diff saved to https://phabricator.wikimedia.org/P21373 and previous config saved to /var/cache/conftool/dbconfig/20220223-123246-ladsgroup.json
  • 12:30 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300774)', diff saved to https://phabricator.wikimedia.org/P21372 and previous config saved to /var/cache/conftool/dbconfig/20220223-123017-kormat.json
  • 12:26 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 12:25 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 12:24 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T300774)', diff saved to https://phabricator.wikimedia.org/P21370 and previous config saved to /var/cache/conftool/dbconfig/20220223-122449-kormat.json
  • 12:24 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:24 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21369 and previous config saved to /var/cache/conftool/dbconfig/20220223-122242-ladsgroup.json
  • 12:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:10 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300774)', diff saved to https://phabricator.wikimedia.org/P21368 and previous config saved to /var/cache/conftool/dbconfig/20220223-121036-kormat.json
  • 12:08 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:07 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21367 and previous config saved to /var/cache/conftool/dbconfig/20220223-120738-ladsgroup.json
  • 12:07 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:07 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:04 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 12:02 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 11:55 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P21366 and previous config saved to /var/cache/conftool/dbconfig/20220223-115531-kormat.json
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T302363)', diff saved to https://phabricator.wikimedia.org/P21365 and previous config saved to /var/cache/conftool/dbconfig/20220223-115233-ladsgroup.json
  • 11:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1181.eqiad.wmnet with OS bullseye
  • 11:44 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 11:42 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 11:42 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:42 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:40 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P21364 and previous config saved to /var/cache/conftool/dbconfig/20220223-114026-kormat.json
  • 11:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1181.eqiad.wmnet with reason: host reimage
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T300992)', diff saved to https://phabricator.wikimedia.org/P21363 and previous config saved to /var/cache/conftool/dbconfig/20220223-113226-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300992)', diff saved to https://phabricator.wikimedia.org/P21362 and previous config saved to /var/cache/conftool/dbconfig/20220223-113219-ladsgroup.json
  • 11:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1181.eqiad.wmnet with reason: host reimage
  • 11:28 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 11:28 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 11:28 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:25 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300774)', diff saved to https://phabricator.wikimedia.org/P21361 and previous config saved to /var/cache/conftool/dbconfig/20220223-112522-kormat.json
  • 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1181.eqiad.wmnet with OS bullseye
  • 11:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P21360 and previous config saved to /var/cache/conftool/dbconfig/20220223-111714-ladsgroup.json
  • 11:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:09 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 11:09 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 11:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 11:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T302363)', diff saved to https://phabricator.wikimedia.org/P21359 and previous config saved to /var/cache/conftool/dbconfig/20220223-110540-ladsgroup.json
  • 11:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P21358 and previous config saved to /var/cache/conftool/dbconfig/20220223-110209-ladsgroup.json
  • 10:49 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300992)', diff saved to https://phabricator.wikimedia.org/P21357 and previous config saved to /var/cache/conftool/dbconfig/20220223-104704-ladsgroup.json
  • 10:46 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:46 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T300774)', diff saved to https://phabricator.wikimedia.org/P21356 and previous config saved to /var/cache/conftool/dbconfig/20220223-104644-kormat.json
  • 10:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:38 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 10:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 10:32 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T300992)', diff saved to https://phabricator.wikimedia.org/P21355 and previous config saved to /var/cache/conftool/dbconfig/20220223-103204-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:32 kormat: running schema change against s3 T300774
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300992)', diff saved to https://phabricator.wikimedia.org/P21354 and previous config saved to /var/cache/conftool/dbconfig/20220223-102919-ladsgroup.json
  • 10:14 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P21353 and previous config saved to /var/cache/conftool/dbconfig/20220223-101414-ladsgroup.json
  • 10:11 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P21352 and previous config saved to /var/cache/conftool/dbconfig/20220223-095909-ladsgroup.json
  • 09:49 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:49 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:49 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2077 (T302363)', diff saved to https://phabricator.wikimedia.org/P21351 and previous config saved to /var/cache/conftool/dbconfig/20220223-094655-ladsgroup.json
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300992)', diff saved to https://phabricator.wikimedia.org/P21350 and previous config saved to /var/cache/conftool/dbconfig/20220223-094405-ladsgroup.json
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T300992)', diff saved to https://phabricator.wikimedia.org/P21349 and previous config saved to /var/cache/conftool/dbconfig/20220223-093933-ladsgroup.json
  • 09:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300992)', diff saved to https://phabricator.wikimedia.org/P21348 and previous config saved to /var/cache/conftool/dbconfig/20220223-093925-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2077.codfw.wmnet with OS bullseye
  • 09:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2077.codfw.wmnet with reason: host reimage
  • 09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P21347 and previous config saved to /var/cache/conftool/dbconfig/20220223-092421-ladsgroup.json
  • 09:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2077.codfw.wmnet with reason: host reimage
  • 09:14 dcausse: restarting blazegrah on wdqs1007 (jvm stuck for 11hours)
  • 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P21346 and previous config saved to /var/cache/conftool/dbconfig/20220223-090916-ladsgroup.json
  • 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2077.codfw.wmnet with OS bullseye
  • 09:02 godog: bounce prometheus-statsd-exporter on C:prometheus::statsd_exporter - T302372
  • 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2077 (T302363)', diff saved to https://phabricator.wikimedia.org/P21345 and previous config saved to /var/cache/conftool/dbconfig/20220223-090109-ladsgroup.json
  • 09:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 09:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2077.codfw.wmnet with reason: Maintenance
  • 09:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2077.codfw.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T302363)', diff saved to https://phabricator.wikimedia.org/P21343 and previous config saved to /var/cache/conftool/dbconfig/20220223-085755-ladsgroup.json
  • 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300992)', diff saved to https://phabricator.wikimedia.org/P21342 and previous config saved to /var/cache/conftool/dbconfig/20220223-085411-ladsgroup.json
  • 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T300992)', diff saved to https://phabricator.wikimedia.org/P21341 and previous config saved to /var/cache/conftool/dbconfig/20220223-084951-ladsgroup.json
  • 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300992)', diff saved to https://phabricator.wikimedia.org/P21340 and previous config saved to /var/cache/conftool/dbconfig/20220223-084938-ladsgroup.json
  • 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2108.codfw.wmnet with OS bullseye
  • 08:38 urbanecm: UTC morning B&C window done
  • 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 10cb05a: Enable DiscussionTools newtopictool, topicsubscription on MediaWiki.org (T302256) (duration: 00m 49s)
  • 08:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P21339 and previous config saved to /var/cache/conftool/dbconfig/20220223-083433-ladsgroup.json
  • 08:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2108.codfw.wmnet with reason: host reimage
  • 08:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:33 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.23/includes/pager/IndexPager.php: 38f33d3: ReverseChronologicalPager: Fix displaying date headers for non-revisions (T302343; 5/5) (duration: 00m 48s)
  • 08:32 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.23/includes/pager/ReverseChronologicalPager.php: 38f33d3: ReverseChronologicalPager: Fix displaying date headers for non-revisions (T302343; 4/5) (duration: 00m 53s)
  • 08:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:31 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:31 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.23/includes/specials/pagers/MergeHistoryPager.php: 38f33d3: ReverseChronologicalPager: Fix displaying date headers for non-revisions (T302343; 3/5) (duration: 00m 49s)
  • 08:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2108.codfw.wmnet with reason: host reimage
  • 08:30 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.23/includes/specials/pagers/ContribsPager.php: 38f33d3: ReverseChronologicalPager: Fix displaying date headers for non-revisions (T302343; 2/5) (duration: 00m 49s)
  • 08:29 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.23/includes/actions/pagers/HistoryPager.php: 38f33d3: ReverseChronologicalPager: Fix displaying date headers for non-revisions (T302343; 1/5) (duration: 00m 49s)
  • 08:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d9e8861: Enable mobile DiscussionTools at ht.wiki (T302259) (duration: 00m 50s)
  • 08:23 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.23/extensions/DiscussionTools/: 269dcfd: Mobile config: Always enable reply/newtopic tools on mobile, disable subscriptions (T302326) (duration: 00m 50s)
  • 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:21 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.22/extensions/DiscussionTools/: b82e4eb: Mobile config: Always enable reply/newtopic tools on mobile, disable subscriptions (T302326) (duration: 00m 52s)
  • 08:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P21338 and previous config saved to /var/cache/conftool/dbconfig/20220223-081929-ladsgroup.json
  • 08:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2108.codfw.wmnet with OS bullseye
  • 08:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T302363)', diff saved to https://phabricator.wikimedia.org/P21337 and previous config saved to /var/cache/conftool/dbconfig/20220223-081338-ladsgroup.json
  • 08:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 08:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T302363)', diff saved to https://phabricator.wikimedia.org/P21336 and previous config saved to /var/cache/conftool/dbconfig/20220223-080609-ladsgroup.json
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300992)', diff saved to https://phabricator.wikimedia.org/P21335 and previous config saved to /var/cache/conftool/dbconfig/20220223-080424-ladsgroup.json
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T300992)', diff saved to https://phabricator.wikimedia.org/P21334 and previous config saved to /var/cache/conftool/dbconfig/20220223-075926-ladsgroup.json
  • 07:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300992)', diff saved to https://phabricator.wikimedia.org/P21333 and previous config saved to /var/cache/conftool/dbconfig/20220223-075918-ladsgroup.json
  • 07:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P21332 and previous config saved to /var/cache/conftool/dbconfig/20220223-074413-ladsgroup.json
  • 07:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P21331 and previous config saved to /var/cache/conftool/dbconfig/20220223-072909-ladsgroup.json
  • 07:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300992)', diff saved to https://phabricator.wikimedia.org/P21330 and previous config saved to /var/cache/conftool/dbconfig/20220223-071404-ladsgroup.json
  • 07:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2118.codfw.wmnet with OS bullseye
  • 07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T300992)', diff saved to https://phabricator.wikimedia.org/P21329 and previous config saved to /var/cache/conftool/dbconfig/20220223-071038-ladsgroup.json
  • 07:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 07:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 07:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 07:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 07:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 07:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 07:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 07:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 07:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 07:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 06:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2118.codfw.wmnet with reason: host reimage
  • 06:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 06:54 Amir1: dbmaint on s2@codfw (T300992)
  • 06:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 06:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 06:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2118.codfw.wmnet with reason: host reimage
  • 06:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2118.codfw.wmnet with OS bullseye
  • 06:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 (T302363)', diff saved to https://phabricator.wikimedia.org/P21328 and previous config saved to /var/cache/conftool/dbconfig/20220223-063733-ladsgroup.json
  • 06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 06:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T302363)', diff saved to https://phabricator.wikimedia.org/P21327 and previous config saved to /var/cache/conftool/dbconfig/20220223-063625-ladsgroup.json
  • 06:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2120.codfw.wmnet with OS bullseye
  • 06:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2120.codfw.wmnet with reason: host reimage
  • 06:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2120.codfw.wmnet with reason: host reimage
  • 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2120.codfw.wmnet with OS bullseye
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T302363)', diff saved to https://phabricator.wikimedia.org/P21326 and previous config saved to /var/cache/conftool/dbconfig/20220223-055534-ladsgroup.json
  • 05:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T302363)', diff saved to https://phabricator.wikimedia.org/P21325 and previous config saved to /var/cache/conftool/dbconfig/20220223-055416-ladsgroup.json
  • 05:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2122.codfw.wmnet with OS bullseye
  • 05:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2122.codfw.wmnet with reason: host reimage
  • 05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2122.codfw.wmnet with reason: host reimage
  • 05:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2122.codfw.wmnet with OS bullseye
  • 05:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T302363)', diff saved to https://phabricator.wikimedia.org/P21324 and previous config saved to /var/cache/conftool/dbconfig/20220223-051125-ladsgroup.json
  • 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 05:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T302363)', diff saved to https://phabricator.wikimedia.org/P21323 and previous config saved to /var/cache/conftool/dbconfig/20220223-051026-ladsgroup.json
  • 05:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2150.codfw.wmnet with OS bullseye
  • 04:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2150.codfw.wmnet with reason: host reimage
  • 04:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2150.codfw.wmnet with reason: host reimage
  • 04:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 04:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 04:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 04:36 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.22/includes/page/ParserOutputAccess.php: Backport: ParserOutputAccess: Check for latest revision when checking for cache (T283029) (duration: 00m 50s)
  • 04:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 04:33 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.23/includes/page/ParserOutputAccess.php: Backport: ParserOutputAccess: Check for latest revision when checking for cache (T283029) (duration: 00m 51s)
  • 04:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2150.codfw.wmnet with OS bullseye
  • 04:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T302363)', diff saved to https://phabricator.wikimedia.org/P21322 and previous config saved to /var/cache/conftool/dbconfig/20220223-042802-ladsgroup.json
  • 04:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 04:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 02:49 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2068.codfw.wmnet with OS stretch
  • 02:49 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2066.codfw.wmnet with OS stretch
  • 02:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
  • 02:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
  • 01:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2066.codfw.wmnet with OS stretch
  • 01:50 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2066.codfw.wmnet with OS stretch
  • 01:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
  • 01:38 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2068.codfw.wmnet with reason: host reimage
  • 01:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
  • 01:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2066.codfw.wmnet with reason: host reimage
  • 01:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2068.codfw.wmnet with OS stretch
  • 01:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1004.wikimedia.org with OS bullseye
  • 01:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2066.codfw.wmnet with OS stretch
  • 01:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2067.codfw.wmnet with OS stretch
  • 01:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: host reimage
  • 01:00 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: host reimage
  • 00:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1004.wikimedia.org with OS bullseye
  • 00:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1004.wikimedia.org with OS bullseye
  • 00:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: host reimage
  • 00:52 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: host reimage
  • 00:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1004.wikimedia.org with OS bullseye
  • 00:51 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcontrol1004.wikimedia.org with OS bullseye
  • 00:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1004.wikimedia.org with OS bullseye
  • 00:29 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1004.wikimedia.org with OS bullseye
  • 00:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
  • 00:23 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
  • 00:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2067.codfw.wmnet with OS stretch

2022-02-22

  • 23:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: host reimage
  • 23:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1004.wikimedia.org with reason: host reimage
  • 23:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:53 dduvall@deploy1002: Synchronized php-1.38.0-wmf.23/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: Backport: VisualEditor: Avoid undefined index for mobileformat ([T302344]) (duration: 00m 49s)
  • 22:52 dduvall@deploy1002: Synchronized php-1.38.0-wmf.23/extensions/DiscussionTools/includes/ApiDiscussionToolsEdit.php: Backport: DiscussionTools: Avoid undefined index for mobileformat ([T302344]) (duration: 00m 51s)
  • 22:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1004.wikimedia.org with OS bullseye
  • 22:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1004.wikimedia.org with OS bullseye
  • 22:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2067.codfw.wmnet with OS stretch
  • 22:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2078.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1004.wikimedia.org with OS bullseye
  • 21:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:47 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host elastic2078.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2067.codfw.wmnet with OS stretch
  • 21:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2077.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:43 urbanecm@deploy1002: Synchronized wmf-config/filebackend.php: 91b81ac: filebackend: migrate $wmfSwift* to $wmgSwift* (T45956) (duration: 00m 52s)
  • 21:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 99f244c: [Cleanup] Remove non-existent config wgVectorUseWvuiSearch (duration: 00m 50s)
  • 21:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7172327: [Vector] Enable table of contents on beta cluster (duration: 00m 50s)
  • 21:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6d1d9a9: InitialiseSettings: General cleanup, wgRemoveGroups (A-D) (T301647) (duration: 00m 50s)
  • 21:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host elastic2077.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2076.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ee7608c: Deploy the fawiki test safety survey to production (T297629) (duration: 00m 51s)
  • 21:19 cwhite: end opensearch upgrade (codfw) T299168
  • 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:12 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host elastic2076.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:06 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1004.wikimedia.org with OS bullseye
  • 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2069.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1004.wikimedia.org with OS bullseye
  • 20:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2069.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2068.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:26 cwhite: begin opensearch upgrade (codfw) T299168
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2068.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:09 ryankemper: T302340 [WCQS] Seeing `0.3.104` running on the hosts now
  • 20:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2067.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:08 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@5d384a5] (wcqs): Deploy 0.3.104 to WCQS (duration: 02m 33s)
  • 20:07 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.23 refs T300199
  • 20:06 ryankemper@deploy1002: Started deploy [wdqs/wdqs@5d384a5] (wcqs): Deploy 0.3.104 to WCQS
  • 20:06 ryankemper: T302340 [WCQS] Forgot to fetch & rebase `deploy1002:/srv/deployment/wdqs/wdqs` before deploy, so `0.3.104` did not actually deploy (still on `0.3.103`). Re-rolling deploy...
  • 20:00 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@f0d05eb] (wcqs): Deploy 0.3.104 to WCQS (duration: 03m 00s)
  • 19:58 ryankemper: T302340 `scap deploy -v --environment wcqs 'Deploy 0.3.104 to WCQS'`
  • 19:57 ryankemper@deploy1002: Started deploy [wdqs/wdqs@f0d05eb] (wcqs): Deploy 0.3.104 to WCQS
  • 19:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:49 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2067.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2066.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:25 ryankemper: T302330 `ryankemper@cumin1001:~$ sudo -E cumin '*mwmaint*' 'run-puppet-agent'` (getting https://gerrit.wikimedia.org/r/c/operations/puppet/+/764875 out)
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:24 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash[2004-2006].codfw.wmnet
  • 19:20 dduvall@deploy1002: Pruned MediaWiki: 1.38.0-wmf.21 (duration: 03m 50s)
  • 19:16 dduvall@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.23 refs T300199 (duration: 49m 17s)
  • 19:11 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash[2004-2006].codfw.wmnet
  • 19:10 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash[1007-1009].eqiad.wmnet
  • 19:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2066.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:58 ssastry@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 18:56 ssastry@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 18:55 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash[1007-1009].eqiad.wmnet
  • 18:53 ssastry@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 18:52 ssastry@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 18:50 ssastry@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 18:49 ssastry@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 18:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:33 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts centrallog2001.codfw.wmnet
  • 18:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:30 moritzm: rebalance ganeti eqiad row_B (all nodes reimaged in there) T296721
  • 18:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:27 dduvall@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.23 refs T300199
  • 18:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:23 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts centrallog2001.codfw.wmnet
  • 18:20 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:52 gehel: depooling WDQS codfw (internal + public) - issues with deployment of new updater version on cdofw
  • 17:02 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 17:01 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:46 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300774)', diff saved to https://phabricator.wikimedia.org/P21316 and previous config saved to /var/cache/conftool/dbconfig/20220222-164604-kormat.json
  • 16:40 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:39 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:30 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21315 and previous config saved to /var/cache/conftool/dbconfig/20220222-163059-kormat.json
  • 16:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 16:15 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21314 and previous config saved to /var/cache/conftool/dbconfig/20220222-161554-kormat.json
  • 16:15 papaul: rebooting scs-oe16-esams to clear librenms alert
  • 16:00 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300774)', diff saved to https://phabricator.wikimedia.org/P21313 and previous config saved to /var/cache/conftool/dbconfig/20220222-160049-kormat.json
  • 15:54 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:43 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:27 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T300774)', diff saved to https://phabricator.wikimedia.org/P21312 and previous config saved to /var/cache/conftool/dbconfig/20220222-152658-kormat.json
  • 15:26 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 15:26 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 15:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:25 urbanecm: Migration of oversight => suppress is done (T112147)
  • 15:25 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript migrateUserGroup.php --wiki=labswiki oversight suppress # T112147
  • 15:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:24 urbanecm: Run `mwscript purgeExpiredUserrights.php enwikiquote` to purge an expired but not yet removed row with the old oversight group (T112147)
  • 15:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:20 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:20 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4a2a212: Update oversight group to suppress (T112147) (duration: 00m 49s)
  • 15:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:13 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 79cfa4e: Remove the oversight group hack (T112147) (duration: 00m 48s)
  • 15:07 urbanecm: Finishing deployment of T112147 that started during B&C time
  • 14:54 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on durum[6001-6002].drmrs.wmnet with reason: T301165; errors expected, not serving any traffic
  • 14:53 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on durum[6001-6002].drmrs.wmnet with reason: T301165; errors expected, not serving any traffic
  • 14:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:32 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:31 urbanecm: Run `[urbanecm@mwmaint1002 ~]$ foreachwikiindblist oversight-wikis migrateUserGroup.php oversight suppress` in a tmux session (oversight-wikis.dblist is a temporary dblist from P21310; T112147)
  • 14:30 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300774)', diff saved to https://phabricator.wikimedia.org/P21311 and previous config saved to /var/cache/conftool/dbconfig/20220222-143023-kormat.json
  • 14:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:24 urbanecm: mwscript migrateUserGroup.php --wiki=metawiki oversight suppress # T112147
  • 14:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:22 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ec07ac0: Add suppress group to privileged groups (T112147) (duration: 00m 49s)
  • 14:21 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:18 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 6859cd2: Do not delete the suppress group (T112147) (duration: 00m 50s)
  • 14:15 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21309 and previous config saved to /var/cache/conftool/dbconfig/20220222-141518-kormat.json
  • 14:14 taavi: deploy T302248 patch
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21308 and previous config saved to /var/cache/conftool/dbconfig/20220222-141338-marostegui.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21307 and previous config saved to /var/cache/conftool/dbconfig/20220222-141148-root.json
  • 14:10 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:10 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:07 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:00 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21306 and previous config saved to /var/cache/conftool/dbconfig/20220222-140013-kormat.json
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P21305 and previous config saved to /var/cache/conftool/dbconfig/20220222-135833-marostegui.json
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21304 and previous config saved to /var/cache/conftool/dbconfig/20220222-135644-root.json
  • 13:45 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300774)', diff saved to https://phabricator.wikimedia.org/P21303 and previous config saved to /var/cache/conftool/dbconfig/20220222-134509-kormat.json
  • 13:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P21302 and previous config saved to /var/cache/conftool/dbconfig/20220222-134329-marostegui.json
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21301 and previous config saved to /var/cache/conftool/dbconfig/20220222-134141-root.json
  • 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:32 godog: bounce prometheus-blackbox-exporter on prometheus1005 - T302265
  • 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21300 and previous config saved to /var/cache/conftool/dbconfig/20220222-132824-marostegui.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21299 and previous config saved to /var/cache/conftool/dbconfig/20220222-132637-root.json
  • 13:24 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1093.eqiad.wmnet with OS bullseye
  • 13:24 moritzm: rebalance ganeti eqiad row_D (all nodes reimaged in there) T296721
  • 13:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21298 and previous config saved to /var/cache/conftool/dbconfig/20220222-131854-marostegui.json
  • 13:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21297 and previous config saved to /var/cache/conftool/dbconfig/20220222-131846-marostegui.json
  • 13:13 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1093.eqiad.wmnet with reason: host reimage
  • 13:11 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1093.eqiad.wmnet with reason: host reimage
  • 13:05 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P21296 and previous config saved to /var/cache/conftool/dbconfig/20220222-130342-marostegui.json
  • 13:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1004.eqiad.wmnet with OS bullseye
  • 12:59 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1093.eqiad.wmnet with OS bullseye
  • 12:50 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on doh[6001-6002].wikimedia.org with reason: T301165; errors expected, not serving any traffic
  • 12:50 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on doh[6001-6002].wikimedia.org with reason: T301165; errors expected, not serving any traffic
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P21295 and previous config saved to /var/cache/conftool/dbconfig/20220222-124837-marostegui.json
  • 12:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
  • 12:47 godog: bounce prometheus-blackbox-exporter on prometheus1006 - T302265
  • 12:45 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
  • 12:44 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T300774)', diff saved to https://phabricator.wikimedia.org/P21294 and previous config saved to /var/cache/conftool/dbconfig/20220222-124449-kormat.json
  • 12:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 12:44 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21293 and previous config saved to /var/cache/conftool/dbconfig/20220222-123332-marostegui.json
  • 12:32 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1004.eqiad.wmnet with OS bullseye
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21292 and previous config saved to /var/cache/conftool/dbconfig/20220222-122351-marostegui.json
  • 12:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300381)', diff saved to https://phabricator.wikimedia.org/P21291 and previous config saved to /var/cache/conftool/dbconfig/20220222-122124-marostegui.json
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P21290 and previous config saved to /var/cache/conftool/dbconfig/20220222-120619-marostegui.json
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21289 and previous config saved to /var/cache/conftool/dbconfig/20220222-115808-ladsgroup.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P21288 and previous config saved to /var/cache/conftool/dbconfig/20220222-115114-marostegui.json
  • 11:46 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1003.eqiad.wmnet with OS bullseye
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P21287 and previous config saved to /var/cache/conftool/dbconfig/20220222-114304-ladsgroup.json
  • 11:42 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300774)', diff saved to https://phabricator.wikimedia.org/P21286 and previous config saved to /var/cache/conftool/dbconfig/20220222-114206-kormat.json
  • 11:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
  • 11:37 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300381)', diff saved to https://phabricator.wikimedia.org/P21285 and previous config saved to /var/cache/conftool/dbconfig/20220222-113609-marostegui.json
  • 11:30 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1093.eqiad.wmnet with OS bullseye
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P21284 and previous config saved to /var/cache/conftool/dbconfig/20220222-112759-ladsgroup.json
  • 11:27 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21283 and previous config saved to /var/cache/conftool/dbconfig/20220222-112702-kormat.json
  • 11:25 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1093.eqiad.wmnet with reason: host reimage
  • 11:24 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1003.eqiad.wmnet with OS bullseye
  • 11:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:22 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1093.eqiad.wmnet with reason: host reimage
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:20 jbond: deploy netbox puppet refactor gerrit:764330 (should be noop)
  • 11:20 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: beta: Allow opening the alpha NewLexeme special page on beta-wikidatawiki (T301234) (Beta only) (duration: 00m 48s)
  • 11:20 jbond: deploy netbox puppet refactor (should be noop)
  • 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21282 and previous config saved to /var/cache/conftool/dbconfig/20220222-111254-ladsgroup.json
  • 11:11 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21281 and previous config saved to /var/cache/conftool/dbconfig/20220222-111157-kormat.json
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T300381)', diff saved to https://phabricator.wikimedia.org/P21280 and previous config saved to /var/cache/conftool/dbconfig/20220222-111144-marostegui.json
  • 11:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300381)', diff saved to https://phabricator.wikimedia.org/P21279 and previous config saved to /var/cache/conftool/dbconfig/20220222-111137-marostegui.json
  • 11:10 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1093.eqiad.wmnet with OS bullseye
  • 11:08 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1093.eqiad.wmnet with OS bullseye
  • 11:06 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1002.eqiad.wmnet with OS bullseye
  • 11:03 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1093.eqiad.wmnet with reason: host reimage
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T302185)', diff saved to https://phabricator.wikimedia.org/P21278 and previous config saved to /var/cache/conftool/dbconfig/20220222-110118-ladsgroup.json
  • 11:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
  • 10:59 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1093.eqiad.wmnet with reason: host reimage
  • 10:56 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300774)', diff saved to https://phabricator.wikimedia.org/P21277 and previous config saved to /var/cache/conftool/dbconfig/20220222-105653-kormat.json
  • 10:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P21276 and previous config saved to /var/cache/conftool/dbconfig/20220222-105632-marostegui.json
  • 10:56 Lucas_WMDE: Deployed patch for T302192
  • 10:48 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1093.eqiad.wmnet with OS bullseye
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21275 and previous config saved to /var/cache/conftool/dbconfig/20220222-104613-ladsgroup.json
  • 10:43 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1002.eqiad.wmnet with OS bullseye
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P21274 and previous config saved to /var/cache/conftool/dbconfig/20220222-104128-marostegui.json
  • 10:36 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1001.eqiad.wmnet with OS bullseye
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21273 and previous config saved to /var/cache/conftool/dbconfig/20220222-103109-ladsgroup.json
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300381)', diff saved to https://phabricator.wikimedia.org/P21272 and previous config saved to /var/cache/conftool/dbconfig/20220222-102623-marostegui.json
  • 10:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
  • 10:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T300381)', diff saved to https://phabricator.wikimedia.org/P21271 and previous config saved to /var/cache/conftool/dbconfig/20220222-101710-marostegui.json
  • 10:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 10:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 10:16 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T300774)', diff saved to https://phabricator.wikimedia.org/P21270 and previous config saved to /var/cache/conftool/dbconfig/20220222-101649-kormat.json
  • 10:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 10:16 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T302185)', diff saved to https://phabricator.wikimedia.org/P21269 and previous config saved to /var/cache/conftool/dbconfig/20220222-101604-ladsgroup.json
  • 10:12 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 10:07 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1001.eqiad.wmnet with OS bullseye
  • 10:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1099.eqiad.wmnet with OS bullseye
  • 10:00 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 09:52 XioNoX: restarting cr2-drmrs for software upgrade
  • 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1099.eqiad.wmnet with reason: host reimage
  • 09:47 aqu@deploy1002: Finished deploy [analytics/refinery@ed5c9f9] (hadoop-test): Migrate aqs/hourly to Airflow TEST [analytics/refinery@ed5c9f9] (duration: 00m 03s)
  • 09:47 aqu@deploy1002: Started deploy [analytics/refinery@ed5c9f9] (hadoop-test): Migrate aqs/hourly to Airflow TEST [analytics/refinery@ed5c9f9]
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300381)', diff saved to https://phabricator.wikimedia.org/P21268 and previous config saved to /var/cache/conftool/dbconfig/20220222-094740-marostegui.json
  • 09:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1099.eqiad.wmnet with reason: host reimage
  • 09:43 jayme@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:38 aqu: Deploying analytics/refinery on hadoop-test only.
  • 09:38 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1099.eqiad.wmnet with OS bullseye
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P21267 and previous config saved to /var/cache/conftool/dbconfig/20220222-093235-marostegui.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P21266 and previous config saved to /var/cache/conftool/dbconfig/20220222-091730-marostegui.json
  • 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300381)', diff saved to https://phabricator.wikimedia.org/P21265 and previous config saved to /var/cache/conftool/dbconfig/20220222-090226-marostegui.json
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21264 and previous config saved to /var/cache/conftool/dbconfig/20220222-085835-ladsgroup.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T300381)', diff saved to https://phabricator.wikimedia.org/P21263 and previous config saved to /var/cache/conftool/dbconfig/20220222-085752-marostegui.json
  • 08:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 08:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T302185)', diff saved to https://phabricator.wikimedia.org/P21262 and previous config saved to /var/cache/conftool/dbconfig/20220222-085653-ladsgroup.json
  • 08:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 08:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T302185)', diff saved to https://phabricator.wikimedia.org/P21261 and previous config saved to /var/cache/conftool/dbconfig/20220222-085536-ladsgroup.json
  • 08:55 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@17a70a0]: Add aqs hourly (duration: 00m 08s)
  • 08:55 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@17a70a0]: Add aqs hourly
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21260 and previous config saved to /var/cache/conftool/dbconfig/20220222-084031-ladsgroup.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300381)', diff saved to https://phabricator.wikimedia.org/P21259 and previous config saved to /var/cache/conftool/dbconfig/20220222-083534-marostegui.json
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21258 and previous config saved to /var/cache/conftool/dbconfig/20220222-082527-ladsgroup.json
  • 08:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:21 taavi: UTC morning deploys done
  • 08:20 taavi@deploy1002: Synchronized php-1.38.0-wmf.22/extensions/VisualEditor/modules/ve-mw/init/targets/ve.init.mw.DesktopArticleTarget.js: Backport: Revert: Don't suppress teardown prompt when pressing escape (T302096) (duration: 00m 49s)
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P21257 and previous config saved to /var/cache/conftool/dbconfig/20220222-082029-marostegui.json
  • 08:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T302185)', diff saved to https://phabricator.wikimedia.org/P21256 and previous config saved to /var/cache/conftool/dbconfig/20220222-081022-ladsgroup.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P21255 and previous config saved to /var/cache/conftool/dbconfig/20220222-080525-marostegui.json
  • 07:51 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300381)', diff saved to https://phabricator.wikimedia.org/P21254 and previous config saved to /var/cache/conftool/dbconfig/20220222-075020-marostegui.json
  • 07:49 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T300381)', diff saved to https://phabricator.wikimedia.org/P21253 and previous config saved to /var/cache/conftool/dbconfig/20220222-074106-marostegui.json
  • 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui: dbmaint on non-pooled hosts s2@eqiad T300381
  • 07:13 marostegui: dbmaint on db2104 (and its replicas) s2@codfw T300381
  • 07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T302185)', diff saved to https://phabricator.wikimedia.org/P21252 and previous config saved to /var/cache/conftool/dbconfig/20220222-071003-ladsgroup.json
  • 07:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2082 (T302185)', diff saved to https://phabricator.wikimedia.org/P21251 and previous config saved to /var/cache/conftool/dbconfig/20220222-070759-ladsgroup.json
  • 07:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2082.codfw.wmnet with OS bullseye
  • 06:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2082.codfw.wmnet with reason: host reimage
  • 06:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2082.codfw.wmnet with reason: host reimage
  • 06:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2082.codfw.wmnet with OS bullseye
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2082 (T302185)', diff saved to https://phabricator.wikimedia.org/P21250 and previous config saved to /var/cache/conftool/dbconfig/20220222-062711-ladsgroup.json
  • 06:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 06:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 06:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2082.codfw.wmnet with reason: Maintenance
  • 06:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2082.codfw.wmnet with reason: Maintenance
  • 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2085:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21249 and previous config saved to /var/cache/conftool/dbconfig/20220222-062443-ladsgroup.json
  • 06:22 marostegui: dbmaint on db2077 s7@codfw T302222
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2085:3311 (T302185)', diff saved to https://phabricator.wikimedia.org/P21248 and previous config saved to /var/cache/conftool/dbconfig/20220222-062018-ladsgroup.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T300775)', diff saved to https://phabricator.wikimedia.org/P21247 and previous config saved to /var/cache/conftool/dbconfig/20220222-061235-marostegui.json
  • 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 06:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 06:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2085.codfw.wmnet with OS bullseye
  • 06:10 marostegui: dbmain on db2077 s7@codfw T302222
  • 05:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2085.codfw.wmnet with reason: host reimage
  • 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2085.codfw.wmnet with reason: host reimage
  • 05:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2085.codfw.wmnet with OS bullseye
  • 05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2085:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21246 and previous config saved to /var/cache/conftool/dbconfig/20220222-053901-ladsgroup.json
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2085:3311 (T302185)', diff saved to https://phabricator.wikimedia.org/P21245 and previous config saved to /var/cache/conftool/dbconfig/20220222-053836-ladsgroup.json
  • 05:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2085.codfw.wmnet with reason: Maintenance
  • 05:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2085.codfw.wmnet with reason: Maintenance
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2086:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21244 and previous config saved to /var/cache/conftool/dbconfig/20220222-053525-ladsgroup.json
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2086:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21243 and previous config saved to /var/cache/conftool/dbconfig/20220222-053102-ladsgroup.json
  • 05:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2086.codfw.wmnet with OS bullseye
  • 05:16 Amir1: dbmaint on s1@codfw (T302185)
  • 05:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2086.codfw.wmnet with reason: host reimage
  • 05:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2086.codfw.wmnet with reason: host reimage
  • 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300992)', diff saved to https://phabricator.wikimedia.org/P21242 and previous config saved to /var/cache/conftool/dbconfig/20220222-045511-ladsgroup.json
  • 04:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2086.codfw.wmnet with OS bullseye
  • 04:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2086:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21241 and previous config saved to /var/cache/conftool/dbconfig/20220222-045406-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2086:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21240 and previous config saved to /var/cache/conftool/dbconfig/20220222-045349-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2086.codfw.wmnet with reason: Maintenance
  • 04:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2086.codfw.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P21239 and previous config saved to /var/cache/conftool/dbconfig/20220222-044006-ladsgroup.json
  • 04:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2080 (T302185)', diff saved to https://phabricator.wikimedia.org/P21238 and previous config saved to /var/cache/conftool/dbconfig/20220222-042940-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P21237 and previous config saved to /var/cache/conftool/dbconfig/20220222-042502-ladsgroup.json
  • 04:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2080.codfw.wmnet with OS bullseye
  • 04:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2080.codfw.wmnet with reason: host reimage
  • 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300992)', diff saved to https://phabricator.wikimedia.org/P21236 and previous config saved to /var/cache/conftool/dbconfig/20220222-040957-ladsgroup.json
  • 04:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2080.codfw.wmnet with reason: host reimage
  • 04:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T300992)', diff saved to https://phabricator.wikimedia.org/P21235 and previous config saved to /var/cache/conftool/dbconfig/20220222-040537-ladsgroup.json
  • 04:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 04:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2080.codfw.wmnet with OS bullseye
  • 03:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2080 (T302185)', diff saved to https://phabricator.wikimedia.org/P21234 and previous config saved to /var/cache/conftool/dbconfig/20220222-035419-ladsgroup.json
  • 03:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2080.codfw.wmnet with reason: Maintenance
  • 03:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2080.codfw.wmnet with reason: Maintenance
  • 03:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2081 (T302185)', diff saved to https://phabricator.wikimedia.org/P21233 and previous config saved to /var/cache/conftool/dbconfig/20220222-035257-ladsgroup.json
  • 03:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2081.codfw.wmnet with OS bullseye
  • 03:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2081.codfw.wmnet with reason: host reimage
  • 03:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2081.codfw.wmnet with reason: host reimage
  • 03:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2081.codfw.wmnet with OS bullseye
  • 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2081 (T302185)', diff saved to https://phabricator.wikimedia.org/P21232 and previous config saved to /var/cache/conftool/dbconfig/20220222-030456-ladsgroup.json
  • 03:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2081.codfw.wmnet with reason: Maintenance
  • 03:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2081.codfw.wmnet with reason: Maintenance
  • 02:46 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1005.wikimedia.org with OS bullseye
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.wikimedia.org with reason: host reimage
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:05 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.wikimedia.org with reason: host reimage
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.wikimedia.org with OS bullseye

2022-02-21

  • 22:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300381)', diff saved to https://phabricator.wikimedia.org/P21231 and previous config saved to /var/cache/conftool/dbconfig/20220221-223015-marostegui.json
  • 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21230 and previous config saved to /var/cache/conftool/dbconfig/20220221-221510-marostegui.json
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21229 and previous config saved to /var/cache/conftool/dbconfig/20220221-220005-marostegui.json
  • 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300381)', diff saved to https://phabricator.wikimedia.org/P21228 and previous config saved to /var/cache/conftool/dbconfig/20220221-214500-marostegui.json
  • 21:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T300381)', diff saved to https://phabricator.wikimedia.org/P21227 and previous config saved to /var/cache/conftool/dbconfig/20220221-213411-marostegui.json
  • 21:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 21:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 21:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300381)', diff saved to https://phabricator.wikimedia.org/P21226 and previous config saved to /var/cache/conftool/dbconfig/20220221-213403-marostegui.json
  • 21:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21225 and previous config saved to /var/cache/conftool/dbconfig/20220221-211859-marostegui.json
  • 21:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21224 and previous config saved to /var/cache/conftool/dbconfig/20220221-210354-marostegui.json
  • 20:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300381)', diff saved to https://phabricator.wikimedia.org/P21223 and previous config saved to /var/cache/conftool/dbconfig/20220221-204849-marostegui.json
  • 20:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T300381)', diff saved to https://phabricator.wikimedia.org/P21222 and previous config saved to /var/cache/conftool/dbconfig/20220221-203708-marostegui.json
  • 20:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 20:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 20:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300381)', diff saved to https://phabricator.wikimedia.org/P21221 and previous config saved to /var/cache/conftool/dbconfig/20220221-203701-marostegui.json
  • 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21220 and previous config saved to /var/cache/conftool/dbconfig/20220221-202156-marostegui.json
  • 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21219 and previous config saved to /var/cache/conftool/dbconfig/20220221-200651-marostegui.json
  • 19:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300381)', diff saved to https://phabricator.wikimedia.org/P21218 and previous config saved to /var/cache/conftool/dbconfig/20220221-195147-marostegui.json
  • 19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T300381)', diff saved to https://phabricator.wikimedia.org/P21217 and previous config saved to /var/cache/conftool/dbconfig/20220221-193842-marostegui.json
  • 19:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 19:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 19:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 19:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 19:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 19:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 19:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300381)', diff saved to https://phabricator.wikimedia.org/P21216 and previous config saved to /var/cache/conftool/dbconfig/20220221-192309-marostegui.json
  • 19:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21215 and previous config saved to /var/cache/conftool/dbconfig/20220221-190801-marostegui.json
  • 19:03 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1003.wikimedia.org with OS bullseye
  • 18:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21214 and previous config saved to /var/cache/conftool/dbconfig/20220221-185256-marostegui.json
  • 18:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300381)', diff saved to https://phabricator.wikimedia.org/P21213 and previous config saved to /var/cache/conftool/dbconfig/20220221-183751-marostegui.json
  • 18:33 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300774)', diff saved to https://phabricator.wikimedia.org/P21212 and previous config saved to /var/cache/conftool/dbconfig/20220221-183304-kormat.json
  • 18:33 urbanecm: Password reset for Jrnka ka@SUL per Ticket#2022022010002692
  • 18:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T300381)', diff saved to https://phabricator.wikimedia.org/P21211 and previous config saved to /var/cache/conftool/dbconfig/20220221-182856-marostegui.json
  • 18:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 18:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 18:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300381)', diff saved to https://phabricator.wikimedia.org/P21210 and previous config saved to /var/cache/conftool/dbconfig/20220221-182849-marostegui.json
  • 18:18 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21209 and previous config saved to /var/cache/conftool/dbconfig/20220221-181800-kormat.json
  • 18:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21208 and previous config saved to /var/cache/conftool/dbconfig/20220221-181344-marostegui.json
  • 18:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2004.codfw.wmnet with OS bullseye
  • 18:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1003.wikimedia.org with reason: host reimage
  • 18:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1003.wikimedia.org with reason: host reimage
  • 18:02 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21207 and previous config saved to /var/cache/conftool/dbconfig/20220221-180255-kormat.json
  • 18:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 18:02 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21206 and previous config saved to /var/cache/conftool/dbconfig/20220221-175839-marostegui.json
  • 17:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2004.codfw.wmnet with reason: host reimage
  • 17:55 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2004.codfw.wmnet with reason: host reimage
  • 17:50 aqu@deploy1002: Finished deploy [airflow-dags/analytics@17a70a0]: fix missing extra_query_parameters (duration: 00m 07s)
  • 17:50 aqu@deploy1002: Started deploy [airflow-dags/analytics@17a70a0]: fix missing extra_query_parameters
  • 17:47 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300774)', diff saved to https://phabricator.wikimedia.org/P21205 and previous config saved to /var/cache/conftool/dbconfig/20220221-174750-kormat.json
  • 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300381)', diff saved to https://phabricator.wikimedia.org/P21204 and previous config saved to /var/cache/conftool/dbconfig/20220221-174335-marostegui.json
  • 17:41 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T300774)', diff saved to https://phabricator.wikimedia.org/P21203 and previous config saved to /var/cache/conftool/dbconfig/20220221-174138-kormat.json
  • 17:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 17:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 17:41 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300774)', diff saved to https://phabricator.wikimedia.org/P21202 and previous config saved to /var/cache/conftool/dbconfig/20220221-174130-kormat.json
  • 17:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2004.codfw.wmnet with OS bullseye
  • 17:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2003.codfw.wmnet with OS bullseye
  • 17:32 aqu@deploy1002: Finished deploy [airflow-dags/analytics@c2fdce7]: fix aqs hourly DAGs start date (duration: 00m 07s)
  • 17:32 aqu@deploy1002: Started deploy [airflow-dags/analytics@c2fdce7]: fix aqs hourly DAGs start date
  • 17:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T300381)', diff saved to https://phabricator.wikimedia.org/P21201 and previous config saved to /var/cache/conftool/dbconfig/20220221-173130-marostegui.json
  • 17:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 17:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 17:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300381)', diff saved to https://phabricator.wikimedia.org/P21200 and previous config saved to /var/cache/conftool/dbconfig/20220221-173122-marostegui.json
  • 17:26 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P21199 and previous config saved to /var/cache/conftool/dbconfig/20220221-172626-kormat.json
  • 17:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1003.wikimedia.org with OS bullseye
  • 17:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2003.codfw.wmnet with reason: host reimage
  • 17:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2003.codfw.wmnet with reason: host reimage
  • 17:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21198 and previous config saved to /var/cache/conftool/dbconfig/20220221-171618-marostegui.json
  • 17:11 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P21197 and previous config saved to /var/cache/conftool/dbconfig/20220221-171121-kormat.json
  • 17:06 aqu@deploy1002: Finished deploy [airflow-dags/analytics@f1244e0]: Migrate aqs/hourly from Oozie|Hive to Airflow|Spark (duration: 00m 07s)
  • 17:06 aqu@deploy1002: Started deploy [airflow-dags/analytics@f1244e0]: Migrate aqs/hourly from Oozie|Hive to Airflow|Spark
  • 17:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21196 and previous config saved to /var/cache/conftool/dbconfig/20220221-170113-marostegui.json
  • 16:59 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2003.codfw.wmnet with OS bullseye
  • 16:56 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300774)', diff saved to https://phabricator.wikimedia.org/P21195 and previous config saved to /var/cache/conftool/dbconfig/20220221-165616-kormat.json
  • 16:54 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T300774)', diff saved to https://phabricator.wikimedia.org/P21194 and previous config saved to /var/cache/conftool/dbconfig/20220221-165405-kormat.json
  • 16:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 16:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 16:53 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300774)', diff saved to https://phabricator.wikimedia.org/P21193 and previous config saved to /var/cache/conftool/dbconfig/20220221-165352-kormat.json
  • 16:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2002.codfw.wmnet with OS bullseye
  • 16:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 16:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300381)', diff saved to https://phabricator.wikimedia.org/P21192 and previous config saved to /var/cache/conftool/dbconfig/20220221-164608-marostegui.json
  • 16:44 mforns@deploy1002: Finished deploy [analytics/refinery@ed5c9f9] (hadoop-test): Deploy Aqs Hourly for Airflow THIN [analytics/refinery@ed5c9f9] (duration: 07m 12s)
  • 16:38 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21191 and previous config saved to /var/cache/conftool/dbconfig/20220221-163847-kormat.json
  • 16:38 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: host reimage
  • 16:37 mforns@deploy1002: Started deploy [analytics/refinery@ed5c9f9] (hadoop-test): Deploy Aqs Hourly for Airflow THIN [analytics/refinery@ed5c9f9]
  • 16:37 mforns@deploy1002: Finished deploy [analytics/refinery@ed5c9f9] (thin): Deploy Aqs Hourly for Airflow THIN [analytics/refinery@ed5c9f9] (duration: 00m 07s)
  • 16:36 mforns@deploy1002: Started deploy [analytics/refinery@ed5c9f9] (thin): Deploy Aqs Hourly for Airflow THIN [analytics/refinery@ed5c9f9]
  • 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T300381)', diff saved to https://phabricator.wikimedia.org/P21190 and previous config saved to /var/cache/conftool/dbconfig/20220221-163555-marostegui.json
  • 16:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 16:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 16:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300381)', diff saved to https://phabricator.wikimedia.org/P21189 and previous config saved to /var/cache/conftool/dbconfig/20220221-163548-marostegui.json
  • 16:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: host reimage
  • 16:30 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1093.eqiad.wmnet with OS bullseye
  • 16:23 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21188 and previous config saved to /var/cache/conftool/dbconfig/20220221-162342-kormat.json
  • 16:21 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1093.eqiad.wmnet with reason: host reimage
  • 16:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21187 and previous config saved to /var/cache/conftool/dbconfig/20220221-162043-marostegui.json
  • 16:18 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2002.codfw.wmnet with OS bullseye
  • 16:17 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1093.eqiad.wmnet with reason: host reimage
  • 16:08 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300774)', diff saved to https://phabricator.wikimedia.org/P21186 and previous config saved to /var/cache/conftool/dbconfig/20220221-160838-kormat.json
  • 16:05 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-serve200[5-8].codfw.wmnet
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21185 and previous config saved to /var/cache/conftool/dbconfig/20220221-160538-marostegui.json
  • 16:04 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=ml_serve,service=kubesvc
  • 16:03 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=ml-serve,service=kubesvc
  • 16:01 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1093.eqiad.wmnet with OS bullseye
  • 16:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2001.codfw.wmnet with OS bullseye
  • 15:59 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T300774)', diff saved to https://phabricator.wikimedia.org/P21184 and previous config saved to /var/cache/conftool/dbconfig/20220221-155924-kormat.json
  • 15:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:52 mforns@deploy1002: Finished deploy [analytics/refinery@ed5c9f9]: Deploy Aqs Hourly for Airflow [analytics/refinery@ed5c9f9] (duration: 21m 23s)
  • 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300381)', diff saved to https://phabricator.wikimedia.org/P21183 and previous config saved to /var/cache/conftool/dbconfig/20220221-155034-marostegui.json
  • 15:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2001.codfw.wmnet with reason: host reimage
  • 15:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 15:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 15:45 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 15:45 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2001.codfw.wmnet with reason: host reimage
  • 15:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 15:45 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300774)', diff saved to https://phabricator.wikimedia.org/P21182 and previous config saved to /var/cache/conftool/dbconfig/20220221-154518-kormat.json
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T300381)', diff saved to https://phabricator.wikimedia.org/P21181 and previous config saved to /var/cache/conftool/dbconfig/20220221-154118-marostegui.json
  • 15:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 15:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300381)', diff saved to https://phabricator.wikimedia.org/P21180 and previous config saved to /var/cache/conftool/dbconfig/20220221-154110-marostegui.json
  • 15:30 mforns@deploy1002: Started deploy [analytics/refinery@ed5c9f9]: Deploy Aqs Hourly for Airflow [analytics/refinery@ed5c9f9]
  • 15:30 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21179 and previous config saved to /var/cache/conftool/dbconfig/20220221-153013-kormat.json
  • 15:28 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2001.codfw.wmnet with OS bullseye
  • 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21178 and previous config saved to /var/cache/conftool/dbconfig/20220221-152606-marostegui.json
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21177 and previous config saved to /var/cache/conftool/dbconfig/20220221-151945-root.json
  • 15:15 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21176 and previous config saved to /var/cache/conftool/dbconfig/20220221-151509-kormat.json
  • 15:11 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21175 and previous config saved to /var/cache/conftool/dbconfig/20220221-151101-marostegui.json
  • 15:10 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:09 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21174 and previous config saved to /var/cache/conftool/dbconfig/20220221-150848-root.json
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21173 and previous config saved to /var/cache/conftool/dbconfig/20220221-150442-root.json
  • 15:00 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300774)', diff saved to https://phabricator.wikimedia.org/P21172 and previous config saved to /var/cache/conftool/dbconfig/20220221-150004-kormat.json
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300381)', diff saved to https://phabricator.wikimedia.org/P21171 and previous config saved to /var/cache/conftool/dbconfig/20220221-145556-marostegui.json
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21170 and previous config saved to /var/cache/conftool/dbconfig/20220221-145345-root.json
  • 14:52 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1093.eqiad.wmnet with OS bullseye
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21169 and previous config saved to /var/cache/conftool/dbconfig/20220221-144938-root.json
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T300381)', diff saved to https://phabricator.wikimedia.org/P21168 and previous config saved to /var/cache/conftool/dbconfig/20220221-144707-marostegui.json
  • 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 14:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 14:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 14:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300381)', diff saved to https://phabricator.wikimedia.org/P21167 and previous config saved to /var/cache/conftool/dbconfig/20220221-143931-marostegui.json
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21166 and previous config saved to /var/cache/conftool/dbconfig/20220221-143841-root.json
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21165 and previous config saved to /var/cache/conftool/dbconfig/20220221-143435-root.json
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21164 and previous config saved to /var/cache/conftool/dbconfig/20220221-142426-marostegui.json
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21163 and previous config saved to /var/cache/conftool/dbconfig/20220221-142337-root.json
  • 14:22 moritzm: installing twisted security updates
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21162 and previous config saved to /var/cache/conftool/dbconfig/20220221-141931-root.json
  • 14:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21161 and previous config saved to /var/cache/conftool/dbconfig/20220221-140922-marostegui.json
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21160 and previous config saved to /var/cache/conftool/dbconfig/20220221-140831-root.json
  • 14:05 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on elastic1093.eqiad.wmnet with reason: host reimage
  • 14:00 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1093.eqiad.wmnet with reason: host reimage
  • 13:59 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T300774)', diff saved to https://phabricator.wikimedia.org/P21159 and previous config saved to /var/cache/conftool/dbconfig/20220221-135945-kormat.json
  • 13:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 13:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 13:59 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300774)', diff saved to https://phabricator.wikimedia.org/P21158 and previous config saved to /var/cache/conftool/dbconfig/20220221-135937-kormat.json
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300381)', diff saved to https://phabricator.wikimedia.org/P21156 and previous config saved to /var/cache/conftool/dbconfig/20220221-135417-marostegui.json
  • 13:49 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1093.eqiad.wmnet with OS bullseye
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T300381)', diff saved to https://phabricator.wikimedia.org/P21154 and previous config saved to /var/cache/conftool/dbconfig/20220221-134542-marostegui.json
  • 13:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:44 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21153 and previous config saved to /var/cache/conftool/dbconfig/20220221-134433-kormat.json
  • 13:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 13:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300381)', diff saved to https://phabricator.wikimedia.org/P21152 and previous config saved to /var/cache/conftool/dbconfig/20220221-133818-marostegui.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21151 and previous config saved to /var/cache/conftool/dbconfig/20220221-133350-root.json
  • 13:29 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21150 and previous config saved to /var/cache/conftool/dbconfig/20220221-132928-kormat.json
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21149 and previous config saved to /var/cache/conftool/dbconfig/20220221-132313-marostegui.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21148 and previous config saved to /var/cache/conftool/dbconfig/20220221-131846-root.json
  • 13:14 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300774)', diff saved to https://phabricator.wikimedia.org/P21147 and previous config saved to /var/cache/conftool/dbconfig/20220221-131423-kormat.json
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21146 and previous config saved to /var/cache/conftool/dbconfig/20220221-130808-marostegui.json
  • 13:06 moritzm: rebalance ganeti row_C (add nodes reimaged in there) T296721
  • 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1009.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21145 and previous config saved to /var/cache/conftool/dbconfig/20220221-130343-root.json
  • 13:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1009.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
  • 12:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
  • 12:53 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T300774)', diff saved to https://phabricator.wikimedia.org/P21144 and previous config saved to /var/cache/conftool/dbconfig/20220221-125326-kormat.json
  • 12:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 12:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300381)', diff saved to https://phabricator.wikimedia.org/P21143 and previous config saved to /var/cache/conftool/dbconfig/20220221-125303-marostegui.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21142 and previous config saved to /var/cache/conftool/dbconfig/20220221-124839-root.json
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T300381)', diff saved to https://phabricator.wikimedia.org/P21141 and previous config saved to /var/cache/conftool/dbconfig/20220221-124215-marostegui.json
  • 12:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:36 marostegui: Rebuild templatelinks table on db2077 (s7) T301848
  • 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1017.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 12:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21140 and previous config saved to /var/cache/conftool/dbconfig/20220221-123335-root.json
  • 12:30 Lucas_WMDE: Deployed patch for T302215
  • 12:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:28 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300774)', diff saved to https://phabricator.wikimedia.org/P21139 and previous config saved to /var/cache/conftool/dbconfig/20220221-122821-kormat.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P21138 and previous config saved to /var/cache/conftool/dbconfig/20220221-122727-marostegui.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300381)', diff saved to https://phabricator.wikimedia.org/P21137 and previous config saved to /var/cache/conftool/dbconfig/20220221-122504-marostegui.json
  • 12:14 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1017.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 12:13 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21136 and previous config saved to /var/cache/conftool/dbconfig/20220221-121316-kormat.json
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21135 and previous config saved to /var/cache/conftool/dbconfig/20220221-120959-marostegui.json
  • 12:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
  • 11:58 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21134 and previous config saved to /var/cache/conftool/dbconfig/20220221-115811-kormat.json
  • 11:58 marostegui: Rebuild templatelinks table on db1129 (s2) T301848
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 T301848', diff saved to https://phabricator.wikimedia.org/P21133 and previous config saved to /var/cache/conftool/dbconfig/20220221-115750-marostegui.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21132 and previous config saved to /var/cache/conftool/dbconfig/20220221-115455-marostegui.json
  • 11:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:43 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300774)', diff saved to https://phabricator.wikimedia.org/P21131 and previous config saved to /var/cache/conftool/dbconfig/20220221-114307-kormat.json
  • 11:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300381)', diff saved to https://phabricator.wikimedia.org/P21130 and previous config saved to /var/cache/conftool/dbconfig/20220221-113950-marostegui.json
  • 11:28 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes-staging,service=kubesvc
  • 11:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T300774)', diff saved to https://phabricator.wikimedia.org/P21129 and previous config saved to /var/cache/conftool/dbconfig/20220221-112809-kormat.json
  • 11:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:28 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300774)', diff saved to https://phabricator.wikimedia.org/P21128 and previous config saved to /var/cache/conftool/dbconfig/20220221-112801-kormat.json
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1012.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 11:26 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1012.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
  • 11:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1004.eqiad.wmnet with OS bullseye
  • 11:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
  • 11:12 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P21127 and previous config saved to /var/cache/conftool/dbconfig/20220221-111256-kormat.json
  • 11:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
  • 11:09 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
  • 11:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2002.codfw.wmnet with OS bullseye
  • 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1022.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 10:57 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P21126 and previous config saved to /var/cache/conftool/dbconfig/20220221-105752-kormat.json
  • 10:57 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1022.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 10:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2002.codfw.wmnet with reason: host reimage
  • 10:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 10:53 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage1004.eqiad.wmnet with OS bullseye
  • 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
  • 10:48 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2002.codfw.wmnet with reason: host reimage
  • 10:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
  • 10:42 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300774)', diff saved to https://phabricator.wikimedia.org/P21125 and previous config saved to /var/cache/conftool/dbconfig/20220221-104247-kormat.json
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T300381)', diff saved to https://phabricator.wikimedia.org/P21124 and previous config saved to /var/cache/conftool/dbconfig/20220221-103931-marostegui.json
  • 10:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300381)', diff saved to https://phabricator.wikimedia.org/P21123 and previous config saved to /var/cache/conftool/dbconfig/20220221-103924-marostegui.json
  • 10:32 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-staging2002.codfw.wmnet with OS bullseye
  • 10:30 Lucas_WMDE: Deployed patch for T302192
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21122 and previous config saved to /var/cache/conftool/dbconfig/20220221-102419-marostegui.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21121 and previous config saved to /var/cache/conftool/dbconfig/20220221-102241-root.json
  • 10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 10:15 jayme@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21120 and previous config saved to /var/cache/conftool/dbconfig/20220221-100914-marostegui.json
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21119 and previous config saved to /var/cache/conftool/dbconfig/20220221-100737-root.json
  • 10:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:01 marostegui: Rebuild templatelinks table on s2 codfw master (db2104), lag to be expected on codfw T301848
  • 09:57 moritzm: installing PHP 7.4 security updates (as packaged in Debian)
  • 09:56 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300381)', diff saved to https://phabricator.wikimedia.org/P21118 and previous config saved to /var/cache/conftool/dbconfig/20220221-095410-marostegui.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21117 and previous config saved to /var/cache/conftool/dbconfig/20220221-095233-root.json
  • 09:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2001.codfw.wmnet with OS bullseye
  • 09:51 kormat: running schema change against s7 T300774
  • 09:51 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T300774)', diff saved to https://phabricator.wikimedia.org/P21116 and previous config saved to /var/cache/conftool/dbconfig/20220221-095122-kormat.json
  • 09:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T300381)', diff saved to https://phabricator.wikimedia.org/P21115 and previous config saved to /var/cache/conftool/dbconfig/20220221-094826-marostegui.json
  • 09:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 09:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300381)', diff saved to https://phabricator.wikimedia.org/P21114 and previous config saved to /var/cache/conftool/dbconfig/20220221-094819-marostegui.json
  • 09:45 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes-staging,service=kubesvc
  • 09:41 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2001.codfw.wmnet with reason: host reimage
  • 09:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2001.codfw.wmnet with reason: host reimage
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21113 and previous config saved to /var/cache/conftool/dbconfig/20220221-093729-root.json
  • 09:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS bullseye
  • 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1009.eqiad.wmnet with OS buster
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21112 and previous config saved to /var/cache/conftool/dbconfig/20220221-093314-marostegui.json
  • 09:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
  • 09:24 godog: deploy prometheus-icinga-exporter 0.19 - T300951
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21111 and previous config saved to /var/cache/conftool/dbconfig/20220221-092226-root.json
  • 09:22 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-staging2001.codfw.wmnet with OS bullseye
  • 09:22 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2001.codfw.wmnet with OS bullseye
  • 09:22 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-staging2001.codfw.wmnet with OS bullseye
  • 09:22 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21110 and previous config saved to /var/cache/conftool/dbconfig/20220221-091809-marostegui.json
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1009.eqiad.wmnet with reason: host reimage
  • 09:04 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS bullseye
  • 09:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1009.eqiad.wmnet with reason: host reimage
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300381)', diff saved to https://phabricator.wikimedia.org/P21109 and previous config saved to /var/cache/conftool/dbconfig/20220221-090305-marostegui.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T300381)', diff saved to https://phabricator.wikimedia.org/P21108 and previous config saved to /var/cache/conftool/dbconfig/20220221-085745-marostegui.json
  • 08:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 08:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1009.eqiad.wmnet with OS buster
  • 08:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300381)', diff saved to https://phabricator.wikimedia.org/P21107 and previous config saved to /var/cache/conftool/dbconfig/20220221-084802-marostegui.json
  • 08:38 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21106 and previous config saved to /var/cache/conftool/dbconfig/20220221-083257-marostegui.json
  • 08:22 godog: update karma to 0.99 on alert* hosts - T284213
  • 08:21 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS bullseye
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21105 and previous config saved to /var/cache/conftool/dbconfig/20220221-081752-marostegui.json
  • 08:11 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 08:10 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 08:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 08:07 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300381)', diff saved to https://phabricator.wikimedia.org/P21104 and previous config saved to /var/cache/conftool/dbconfig/20220221-080248-marostegui.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T300381)', diff saved to https://phabricator.wikimedia.org/P21103 and previous config saved to /var/cache/conftool/dbconfig/20220221-075800-marostegui.json
  • 07:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300381)', diff saved to https://phabricator.wikimedia.org/P21102 and previous config saved to /var/cache/conftool/dbconfig/20220221-075336-marostegui.json
  • 07:48 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS bullseye
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21101 and previous config saved to /var/cache/conftool/dbconfig/20220221-073831-marostegui.json
  • 07:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 07:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21100 and previous config saved to /var/cache/conftool/dbconfig/20220221-072326-marostegui.json
  • 07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 07:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 07:11 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 07:10 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300381)', diff saved to https://phabricator.wikimedia.org/P21099 and previous config saved to /var/cache/conftool/dbconfig/20220221-070822-marostegui.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T300381)', diff saved to https://phabricator.wikimedia.org/P21098 and previous config saved to /var/cache/conftool/dbconfig/20220221-070240-marostegui.json
  • 07:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300381)', diff saved to https://phabricator.wikimedia.org/P21097 and previous config saved to /var/cache/conftool/dbconfig/20220221-070233-marostegui.json
  • 06:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 06:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 06:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 06:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298554)', diff saved to https://phabricator.wikimedia.org/P21096 and previous config saved to /var/cache/conftool/dbconfig/20220221-065220-ladsgroup.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21095 and previous config saved to /var/cache/conftool/dbconfig/20220221-064728-marostegui.json
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1107.eqiad.wmnet with OS bullseye
  • 06:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P21093 and previous config saved to /var/cache/conftool/dbconfig/20220221-063713-ladsgroup.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21092 and previous config saved to /var/cache/conftool/dbconfig/20220221-063223-marostegui.json
  • 06:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1107.eqiad.wmnet with reason: host reimage
  • 06:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1107.eqiad.wmnet with reason: host reimage
  • 06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P21091 and previous config saved to /var/cache/conftool/dbconfig/20220221-062206-ladsgroup.json
  • 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1107.eqiad.wmnet with OS bullseye
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300381)', diff saved to https://phabricator.wikimedia.org/P21090 and previous config saved to /var/cache/conftool/dbconfig/20220221-061719-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T300381)', diff saved to https://phabricator.wikimedia.org/P21089 and previous config saved to /var/cache/conftool/dbconfig/20220221-061205-marostegui.json
  • 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T300775)', diff saved to https://phabricator.wikimedia.org/P21088 and previous config saved to /var/cache/conftool/dbconfig/20220221-060804-marostegui.json
  • 06:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 06:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298554)', diff saved to https://phabricator.wikimedia.org/P21087 and previous config saved to /var/cache/conftool/dbconfig/20220221-060701-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T298554)', diff saved to https://phabricator.wikimedia.org/P21086 and previous config saved to /var/cache/conftool/dbconfig/20220221-054612-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T298554)', diff saved to https://phabricator.wikimedia.org/P21085 and previous config saved to /var/cache/conftool/dbconfig/20220221-054604-ladsgroup.json
  • 05:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P21084 and previous config saved to /var/cache/conftool/dbconfig/20220221-053059-ladsgroup.json
  • 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P21083 and previous config saved to /var/cache/conftool/dbconfig/20220221-051555-ladsgroup.json
  • 05:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T298554)', diff saved to https://phabricator.wikimedia.org/P21082 and previous config saved to /var/cache/conftool/dbconfig/20220221-050050-ladsgroup.json
  • 04:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2083 (T302185)', diff saved to https://phabricator.wikimedia.org/P21081 and previous config saved to /var/cache/conftool/dbconfig/20220221-045516-ladsgroup.json
  • 04:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2083.codfw.wmnet with OS bullseye
  • 04:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2083.codfw.wmnet with reason: host reimage
  • 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T298554)', diff saved to https://phabricator.wikimedia.org/P21080 and previous config saved to /var/cache/conftool/dbconfig/20220221-043358-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T298554)', diff saved to https://phabricator.wikimedia.org/P21079 and previous config saved to /var/cache/conftool/dbconfig/20220221-043350-ladsgroup.json
  • 04:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2083.codfw.wmnet with reason: host reimage
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P21078 and previous config saved to /var/cache/conftool/dbconfig/20220221-041846-ladsgroup.json
  • 04:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2083.codfw.wmnet with OS bullseye
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2083 (T302185)', diff saved to https://phabricator.wikimedia.org/P21077 and previous config saved to /var/cache/conftool/dbconfig/20220221-041529-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2083.codfw.wmnet with reason: Maintenance
  • 04:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2083.codfw.wmnet with reason: Maintenance
  • 04:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2084 (T302185)', diff saved to https://phabricator.wikimedia.org/P21076 and previous config saved to /var/cache/conftool/dbconfig/20220221-041123-ladsgroup.json
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P21075 and previous config saved to /var/cache/conftool/dbconfig/20220221-040341-ladsgroup.json
  • 03:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2084.codfw.wmnet with OS bullseye
  • 03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T298554)', diff saved to https://phabricator.wikimedia.org/P21074 and previous config saved to /var/cache/conftool/dbconfig/20220221-034836-ladsgroup.json
  • 03:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2084.codfw.wmnet with reason: host reimage
  • 03:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T298554)', diff saved to https://phabricator.wikimedia.org/P21073 and previous config saved to /var/cache/conftool/dbconfig/20220221-034100-ladsgroup.json
  • 03:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 03:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 03:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298554)', diff saved to https://phabricator.wikimedia.org/P21072 and previous config saved to /var/cache/conftool/dbconfig/20220221-034052-ladsgroup.json
  • 03:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2084.codfw.wmnet with reason: host reimage
  • 03:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2084.codfw.wmnet with OS bullseye
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2084 (T302185)', diff saved to https://phabricator.wikimedia.org/P21071 and previous config saved to /var/cache/conftool/dbconfig/20220221-032548-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P21070 and previous config saved to /var/cache/conftool/dbconfig/20220221-032548-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2084.codfw.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2084.codfw.wmnet with reason: Maintenance
  • 03:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2091 (T302185)', diff saved to https://phabricator.wikimedia.org/P21069 and previous config saved to /var/cache/conftool/dbconfig/20220221-031602-ladsgroup.json
  • 03:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P21068 and previous config saved to /var/cache/conftool/dbconfig/20220221-031039-ladsgroup.json
  • 03:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2091.codfw.wmnet with OS bullseye
  • 02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298554)', diff saved to https://phabricator.wikimedia.org/P21067 and previous config saved to /var/cache/conftool/dbconfig/20220221-025534-ladsgroup.json
  • 02:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2091.codfw.wmnet with reason: host reimage
  • 02:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2091.codfw.wmnet with reason: host reimage
  • 02:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T298554)', diff saved to https://phabricator.wikimedia.org/P21066 and previous config saved to /var/cache/conftool/dbconfig/20220221-023852-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 02:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 02:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2091.codfw.wmnet with OS bullseye
  • 02:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2091 (T302185)', diff saved to https://phabricator.wikimedia.org/P21065 and previous config saved to /var/cache/conftool/dbconfig/20220221-023158-ladsgroup.json
  • 02:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2091.codfw.wmnet with reason: Maintenance
  • 02:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2091.codfw.wmnet with reason: Maintenance
  • 02:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T302185)', diff saved to https://phabricator.wikimedia.org/P21064 and previous config saved to /var/cache/conftool/dbconfig/20220221-022259-ladsgroup.json
  • 02:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 02:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T298554)', diff saved to https://phabricator.wikimedia.org/P21063 and previous config saved to /var/cache/conftool/dbconfig/20220221-021943-ladsgroup.json
  • 02:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2152.codfw.wmnet with OS bullseye
  • 02:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P21062 and previous config saved to /var/cache/conftool/dbconfig/20220221-020438-ladsgroup.json
  • 01:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2152.codfw.wmnet with reason: host reimage
  • 01:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2152.codfw.wmnet with reason: host reimage
  • 01:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P21061 and previous config saved to /var/cache/conftool/dbconfig/20220221-014934-ladsgroup.json
  • 01:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2152.codfw.wmnet with OS bullseye
  • 01:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T302185)', diff saved to https://phabricator.wikimedia.org/P21060 and previous config saved to /var/cache/conftool/dbconfig/20220221-013811-ladsgroup.json
  • 01:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 01:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 01:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T298554)', diff saved to https://phabricator.wikimedia.org/P21059 and previous config saved to /var/cache/conftool/dbconfig/20220221-013429-ladsgroup.json
  • 01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T298554)', diff saved to https://phabricator.wikimedia.org/P21058 and previous config saved to /var/cache/conftool/dbconfig/20220221-012649-ladsgroup.json
  • 01:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 01:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298554)', diff saved to https://phabricator.wikimedia.org/P21057 and previous config saved to /var/cache/conftool/dbconfig/20220221-012642-ladsgroup.json
  • 01:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P21056 and previous config saved to /var/cache/conftool/dbconfig/20220221-011137-ladsgroup.json
  • 00:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P21055 and previous config saved to /var/cache/conftool/dbconfig/20220221-005632-ladsgroup.json
  • 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298554)', diff saved to https://phabricator.wikimedia.org/P21054 and previous config saved to /var/cache/conftool/dbconfig/20220221-004128-ladsgroup.json
  • 00:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T298554)', diff saved to https://phabricator.wikimedia.org/P21053 and previous config saved to /var/cache/conftool/dbconfig/20220221-001641-ladsgroup.json
  • 00:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 00:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance

2022-02-20

  • 12:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:27 taavi@deploy1002: Synchronized private/PrivateSettings.php: T302047 (duration: 00m 49s)

2022-02-19

  • 16:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:40 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 00m 48s)
  • 16:38 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 00m 48s)
  • 16:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:24 _joe_: restarted php-fpm on wtp1027
  • 03:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:25 legoktm@deploy1002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 00m 47s)
  • 03:03 legoktm@deploy1002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 00m 31s)
  • 03:00 legoktm@deploy1002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 00m 48s)
  • 02:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:46 legoktm@deploy1002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 00m 37s)
  • 02:29 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: T302047 (duration: 00m 48s)
  • 02:16 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: T302047 (duration: 00m 48s)
  • 02:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2022.codfw.wmnet with OS bullseye
  • 02:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2022.codfw.wmnet with reason: host reimage
  • 02:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:58 cdanis@deploy1002: Synchronized wmf-config/InitialiseSettings.php: disable wmgEmergencyCaptcha and enable AbuseFilter throttling for enwiki aebac8fe1 7618ff941 T302047 (duration: 00m 48s)
  • 01:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2022.codfw.wmnet with reason: host reimage
  • 01:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2022.codfw.wmnet with OS bullseye
  • 01:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2021.codfw.wmnet with OS bullseye
  • 01:33 legoktm@deploy1002: Synchronized private/PrivateSettings.php: T302047 tweaks (duration: 00m 48s)
  • 01:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2021.codfw.wmnet with reason: host reimage
  • 01:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:21 legoktm@deploy1002: Synchronized private/PrivateSettings.php: T302047 (duration: 00m 49s)
  • 01:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2021.codfw.wmnet with reason: host reimage
  • 01:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2021.codfw.wmnet with OS bullseye
  • 00:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2020.codfw.wmnet with OS bullseye
  • 00:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2020.codfw.wmnet with reason: host reimage
  • 00:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2020.codfw.wmnet with reason: host reimage
  • 00:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2020.codfw.wmnet with OS bullseye
  • 00:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2019.codfw.wmnet with OS bullseye
  • 00:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2019.codfw.wmnet with reason: host reimage
  • 00:05 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2019.codfw.wmnet with reason: host reimage

2022-02-18

  • 23:47 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2019.codfw.wmnet with OS bullseye
  • 23:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:34 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Revert "enable wmgEmergencyCaptcha for enwiki"" (duration: 00m 50s)
  • 23:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2001.codfw.wmnet with OS bullseye
  • 23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2001.codfw.wmnet with reason: host reimage
  • 23:04 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2001.codfw.wmnet with reason: host reimage
  • 22:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2001.codfw.wmnet with OS bullseye
  • 22:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-cache2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2003.codfw.wmnet with OS bullseye
  • 22:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2003.codfw.wmnet with reason: host reimage
  • 22:20 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-cache2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:17 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2003.codfw.wmnet with reason: host reimage
  • 21:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2003.codfw.wmnet with OS bullseye
  • 21:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2002.codfw.wmnet with OS bullseye
  • 21:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2002.codfw.wmnet with reason: host reimage
  • 21:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2002.codfw.wmnet with reason: host reimage
  • 21:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2002.codfw.wmnet with OS bullseye
  • 21:22 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache2002.codfw.wmnet with OS bullseye
  • 20:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2002.codfw.wmnet with OS bullseye
  • 18:06 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1093.eqiad.wmnet with OS bullseye
  • 17:46 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300774)', diff saved to https://phabricator.wikimedia.org/P21045 and previous config saved to /var/cache/conftool/dbconfig/20220218-174640-kormat.json
  • 17:31 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P21044 and previous config saved to /var/cache/conftool/dbconfig/20220218-173135-kormat.json
  • 17:26 ariel@deploy1002: Finished deploy [dumps/dumps@f7c16d4]: noop script, dup jobname check for api jobs, do flow dumps in pieces like stubs (duration: 00m 03s)
  • 17:26 ariel@deploy1002: Started deploy [dumps/dumps@f7c16d4]: noop script, dup jobname check for api jobs, do flow dumps in pieces like stubs
  • 17:16 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P21043 and previous config saved to /var/cache/conftool/dbconfig/20220218-171630-kormat.json
  • 17:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2022.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:01 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300774)', diff saved to https://phabricator.wikimedia.org/P21042 and previous config saved to /var/cache/conftool/dbconfig/20220218-170125-kormat.json
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2022.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2021.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:47 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2021.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:46 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T300774)', diff saved to https://phabricator.wikimedia.org/P21041 and previous config saved to /var/cache/conftool/dbconfig/20220218-164434-kormat.json
  • 16:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:45 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:44 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300774)', diff saved to https://phabricator.wikimedia.org/P21040 and previous config saved to /var/cache/conftool/dbconfig/20220218-164427-kormat.json
  • 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2020.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:34 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2020.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:34 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1093.eqiad.wmnet with OS bullseye
  • 16:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2019.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:29 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P21039 and previous config saved to /var/cache/conftool/dbconfig/20220218-162922-kormat.json
  • 16:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2019.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:14 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P21038 and previous config saved to /var/cache/conftool/dbconfig/20220218-161417-kormat.json
  • 16:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-cache2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-cache2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-cache2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:59 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300774)', diff saved to https://phabricator.wikimedia.org/P21037 and previous config saved to /var/cache/conftool/dbconfig/20220218-155912-kormat.json
  • 15:57 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T300774)', diff saved to https://phabricator.wikimedia.org/P21036 and previous config saved to /var/cache/conftool/dbconfig/20220218-155659-kormat.json
  • 15:57 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 15:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 15:56 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300774)', diff saved to https://phabricator.wikimedia.org/P21035 and previous config saved to /var/cache/conftool/dbconfig/20220218-155652-kormat.json
  • 15:56 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1093.eqiad.wmnet with OS bullseye
  • 15:52 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-cache2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-cache2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:41 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P21034 and previous config saved to /var/cache/conftool/dbconfig/20220218-154147-kormat.json
  • 15:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 15:34 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-cache2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-cache2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:26 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P21033 and previous config saved to /var/cache/conftool/dbconfig/20220218-152641-kormat.json
  • 15:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:21 cdanis@deploy1002: Synchronized wmf-config/InitialiseSettings.php: disable wmgEmergencyCaptcha for enwiki 286f99886 T302047 (duration: 00m 49s)
  • 15:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:16 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1093.eqiad.wmnet with OS bullseye
  • 15:15 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ml-cache2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:14 cdanis@deploy1002: Synchronized wmf-config/InitialiseSettings.php: re-enable AbuseFilter throttling on enwiki 808d82dcd T302047 (duration: 00m 49s)
  • 15:11 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300774)', diff saved to https://phabricator.wikimedia.org/P21032 and previous config saved to /var/cache/conftool/dbconfig/20220218-151136-kormat.json
  • 14:58 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T300774)', diff saved to https://phabricator.wikimedia.org/P21031 and previous config saved to /var/cache/conftool/dbconfig/20220218-145820-kormat.json
  • 14:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 14:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1009.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 14:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1009.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 14:44 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 14:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 14:29 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 14:29 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 14:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:15 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300774)', diff saved to https://phabricator.wikimedia.org/P21030 and previous config saved to /var/cache/conftool/dbconfig/20220218-141517-kormat.json
  • 14:06 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 14:04 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 14:03 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 14:02 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 14:02 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 14:01 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 14:01 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 14:01 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 14:00 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 14:00 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P21029 and previous config saved to /var/cache/conftool/dbconfig/20220218-140012-kormat.json
  • 13:59 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.prepare-upgrade (exit_code=99)
  • 13:59 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 13:45 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P21028 and previous config saved to /var/cache/conftool/dbconfig/20220218-134508-kormat.json
  • 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1012.eqiad.wmnet with OS buster
  • 13:31 dcausse: restarting blazegraph on wdqs1012 (jvm stuck for 8hours)
  • 13:30 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300774)', diff saved to https://phabricator.wikimedia.org/P21027 and previous config saved to /var/cache/conftool/dbconfig/20220218-133003-kormat.json
  • 13:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1012.eqiad.wmnet with reason: host reimage
  • 13:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1012.eqiad.wmnet with reason: host reimage
  • 13:13 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T300774)', diff saved to https://phabricator.wikimedia.org/P21026 and previous config saved to /var/cache/conftool/dbconfig/20220218-131315-kormat.json
  • 13:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:13 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300774)', diff saved to https://phabricator.wikimedia.org/P21025 and previous config saved to /var/cache/conftool/dbconfig/20220218-131307-kormat.json
  • 13:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1012.eqiad.wmnet with OS buster
  • 13:02 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1093.eqiad.wmnet with OS bullseye
  • 12:58 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P21024 and previous config saved to /var/cache/conftool/dbconfig/20220218-125802-kormat.json
  • 12:42 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P21023 and previous config saved to /var/cache/conftool/dbconfig/20220218-124258-kormat.json
  • 12:37 arturo: aborrero@apt1001:~$ sudo -i reprepro -C main includedeb bullseye-wikimedia /home/aborrero/prometheus-openstack-exporter_0.1.4-2_all.deb (T302050)
  • 12:37 arturo: aborrero@apt1001:~$ sudo -i reprepro -C main includedeb buster-wikimedia /home/aborrero/prometheus-openstack-exporter_0.1.4-2_all.deb (T302050)
  • 12:27 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300774)', diff saved to https://phabricator.wikimedia.org/P21022 and previous config saved to /var/cache/conftool/dbconfig/20220218-122753-kormat.json
  • 12:22 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1093.eqiad.wmnet with OS bullseye
  • 12:11 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T300774)', diff saved to https://phabricator.wikimedia.org/P21021 and previous config saved to /var/cache/conftool/dbconfig/20220218-121126-kormat.json
  • 12:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 12:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 12:11 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300774)', diff saved to https://phabricator.wikimedia.org/P21020 and previous config saved to /var/cache/conftool/dbconfig/20220218-121113-kormat.json
  • 12:11 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:08 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:05 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 11:56 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P21019 and previous config saved to /var/cache/conftool/dbconfig/20220218-115608-kormat.json
  • 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1017.eqiad.wmnet with OS buster
  • 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1017.eqiad.wmnet with reason: host reimage
  • 11:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1017.eqiad.wmnet with reason: host reimage
  • 11:41 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P21018 and previous config saved to /var/cache/conftool/dbconfig/20220218-114103-kormat.json
  • 11:27 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1017.eqiad.wmnet with OS buster
  • 11:26 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300774)', diff saved to https://phabricator.wikimedia.org/P21017 and previous config saved to /var/cache/conftool/dbconfig/20220218-112558-kormat.json
  • 11:05 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T300774)', diff saved to https://phabricator.wikimedia.org/P21016 and previous config saved to /var/cache/conftool/dbconfig/20220218-110506-kormat.json
  • 11:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 11:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 11:05 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300774)', diff saved to https://phabricator.wikimedia.org/P21015 and previous config saved to /var/cache/conftool/dbconfig/20220218-110459-kormat.json
  • 10:50 moritzm: installing zsh security updates on stretch
  • 10:49 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P21014 and previous config saved to /var/cache/conftool/dbconfig/20220218-104954-kormat.json
  • 10:43 Emperor: truncate swift/server.log.1 to 10G on thanos-be2001 T301657
  • 10:37 Emperor: rsyslog-rotate to clear held-open server.log.1 (ms-be[2028-2030,2032,2037-2038,2040,2046-2047,2050-2051,2053-2054,2057,2060,2063,2065].codfw.wmnet,ms-be[1028-1031,1035-1038,1042,1046,1048-1049,1054,1058-1060,1065,1067].eqiad.wmnet,thanos-be2001.codfw.wmnet) T301657
  • 10:34 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P21013 and previous config saved to /var/cache/conftool/dbconfig/20220218-103449-kormat.json
  • 10:20 godog: truncate /var/log/swift/server.log.1 to 30G due to full root fs - T301657
  • 10:19 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300774)', diff saved to https://phabricator.wikimedia.org/P21012 and previous config saved to /var/cache/conftool/dbconfig/20220218-101945-kormat.json
  • 10:01 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T300774)', diff saved to https://phabricator.wikimedia.org/P21011 and previous config saved to /var/cache/conftool/dbconfig/20220218-100135-kormat.json
  • 10:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 10:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 10:00 kormat: deploying schema change to s2 T300774
  • 09:35 moritzm: draining instances off ganeti1009
  • 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1022.eqiad.wmnet with OS buster
  • 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1022.eqiad.wmnet with reason: host reimage
  • 09:01 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2001.codfw.wmnet
  • 08:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1022.eqiad.wmnet with reason: host reimage
  • 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2002.codfw.wmnet
  • 08:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2002.codfw.wmnet
  • 08:53 kart_: Updated cxserver to 2022-02-15-050044-production (T301443)
  • 08:52 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 08:50 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 08:47 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1022.eqiad.wmnet with OS buster
  • 08:45 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 08:39 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 08:39 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 08:19 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 08:19 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 07:57 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:57 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:57 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:57 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:42 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:42 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:41 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:41 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 02:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:12 cdanis@deploy1002: Synchronized wmf-config/InitialiseSettings.php: enable wmgEmergencyCaptcha for enwiki ff2f7ef64 T302047 (duration: 00m 49s)
  • 02:09 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 02:03 cdanis@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable AbuseFilter throttling on enwiki 6692b4642 T302047 (duration: 00m 49s)

2022-02-17

  • 22:28 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:25 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 21:19 razzi@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=93) for new host datahubsearch1002.eqiad.wmnet
  • 20:04 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@66350a9]: (no justification provided) (duration: 02m 02s)
  • 20:02 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@66350a9]: (no justification provided)
  • 19:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase-dev2003.codfw.wmnet with OS buster
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P21009 and previous config saved to /var/cache/conftool/dbconfig/20220217-195302-ladsgroup.json
  • 19:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase-dev2003.codfw.wmnet with reason: host reimage
  • 19:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase-dev2003.codfw.wmnet with reason: host reimage
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P21008 and previous config saved to /var/cache/conftool/dbconfig/20220217-193757-ladsgroup.json
  • 19:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase-dev2002.codfw.wmnet with OS buster
  • 19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase-dev2002.codfw.wmnet with reason: host reimage
  • 19:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase-dev2003.codfw.wmnet with OS buster
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P21007 and previous config saved to /var/cache/conftool/dbconfig/20220217-192252-ladsgroup.json
  • 19:22 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase-dev2002.codfw.wmnet with reason: host reimage
  • 19:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase-dev2001.codfw.wmnet with OS buster
  • 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase-dev2001.codfw.wmnet with reason: host reimage
  • 19:08 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:08 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase-dev2001.codfw.wmnet with reason: host reimage
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P21006 and previous config saved to /var/cache/conftool/dbconfig/20220217-190748-ladsgroup.json
  • 19:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase-dev2002.codfw.wmnet with OS buster
  • 19:02 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:54 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300774)', diff saved to https://phabricator.wikimedia.org/P21005 and previous config saved to /var/cache/conftool/dbconfig/20220217-185414-kormat.json
  • 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300510)', diff saved to https://phabricator.wikimedia.org/P21004 and previous config saved to /var/cache/conftool/dbconfig/20220217-185414-ladsgroup.json
  • 18:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase-dev2001.codfw.wmnet with OS buster
  • 18:39 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21003 and previous config saved to /var/cache/conftool/dbconfig/20220217-183910-kormat.json
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21002 and previous config saved to /var/cache/conftool/dbconfig/20220217-183909-ladsgroup.json
  • 18:34 accraze@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 18:31 accraze@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 18:24 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21001 and previous config saved to /var/cache/conftool/dbconfig/20220217-182405-kormat.json
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21000 and previous config saved to /var/cache/conftool/dbconfig/20220217-182405-ladsgroup.json
  • 18:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300510)', diff saved to https://phabricator.wikimedia.org/P20999 and previous config saved to /var/cache/conftool/dbconfig/20220217-180900-ladsgroup.json
  • 18:06 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T300774)', diff saved to https://phabricator.wikimedia.org/P20998 and previous config saved to /var/cache/conftool/dbconfig/20220217-180647-kormat.json
  • 18:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 18:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 18:06 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300774)', diff saved to https://phabricator.wikimedia.org/P20997 and previous config saved to /var/cache/conftool/dbconfig/20220217-180639-kormat.json
  • 17:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1105.eqiad.wmnet with OS bullseye
  • 17:54 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on datahubsearch1001.eqiad.wmnet with reason: Node is being set up for first time and puppet run failed
  • 17:54 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on datahubsearch1001.eqiad.wmnet with reason: Node is being set up for first time and puppet run failed
  • 17:53 razzi@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on an-test-coord1001.eqiad.wmnet with reason: Still troubleshooting mariadb issues
  • 17:53 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-test-coord1001.eqiad.wmnet with reason: Still troubleshooting mariadb issues
  • 17:51 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P20995 and previous config saved to /var/cache/conftool/dbconfig/20220217-175135-kormat.json
  • 17:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1105.eqiad.wmnet with reason: host reimage
  • 17:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1105.eqiad.wmnet with reason: host reimage
  • 17:36 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P20994 and previous config saved to /var/cache/conftool/dbconfig/20220217-173630-kormat.json
  • 17:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1105.eqiad.wmnet with OS bullseye
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20993 and previous config saved to /var/cache/conftool/dbconfig/20220217-172650-ladsgroup.json
  • 17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T300510)', diff saved to https://phabricator.wikimedia.org/P20992 and previous config saved to /var/cache/conftool/dbconfig/20220217-172504-ladsgroup.json
  • 17:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300774)', diff saved to https://phabricator.wikimedia.org/P20991 and previous config saved to /var/cache/conftool/dbconfig/20220217-172124-kormat.json
  • 17:19 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes-staging,service=kubesvc
  • 17:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS bullseye
  • 17:11 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host datahubsearch1002.eqiad.wmnet
  • 17:11 razzi@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host datahubsearch1002.eqiad.wmnet
  • 17:09 XioNoX: stop advertising drmrs from esams
  • 16:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:42 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host datahubsearch1002.eqiad.wmnet
  • 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
  • 16:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
  • 16:27 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 16:21 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T300774)', diff saved to https://phabricator.wikimedia.org/P20990 and previous config saved to /var/cache/conftool/dbconfig/20220217-162104-kormat.json
  • 16:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:21 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20989 and previous config saved to /var/cache/conftool/dbconfig/20220217-162056-kormat.json
  • 16:20 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS bullseye
  • 16:05 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P20988 and previous config saved to /var/cache/conftool/dbconfig/20220217-160551-kormat.json
  • 15:50 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P20987 and previous config saved to /var/cache/conftool/dbconfig/20220217-155047-kormat.json
  • 15:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2002.codfw.wmnet
  • 15:46 ejegg: updated fundraising CiviCRM from 84953e1d to 2874d623
  • 15:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2002.codfw.wmnet
  • 15:35 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20986 and previous config saved to /var/cache/conftool/dbconfig/20220217-153542-kormat.json
  • 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on testvm[2001-2003].codfw.wmnet with reason: Instance restarts
  • 15:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on testvm[2001-2003].codfw.wmnet with reason: Instance restarts
  • 15:23 moritzm: imported openjdk-8 8u322-b06-1~deb11u1 for bullseye-wikimedia (forward port of latest Java 8 security fixes)
  • 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1012.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 15:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1012.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 15:10 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20984 and previous config saved to /var/cache/conftool/dbconfig/20220217-151021-kormat.json
  • 15:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 15:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 15:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 15:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 15:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 15:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 15:09 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20983 and previous config saved to /var/cache/conftool/dbconfig/20220217-150941-kormat.json
  • 15:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 14:54 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P20982 and previous config saved to /var/cache/conftool/dbconfig/20220217-145436-kormat.json
  • 14:47 hashar: UTC evening backport and config training has completed.
  • 14:45 hashar@deploy1002: Synchronized wmf-config/interwiki.php: Config: Regen interwiki cache to drop erroneous 'wikipedia' (T301936) (duration: 00m 48s)
  • 14:44 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@3a25565]: (no justification provided) (duration: 02m 04s)
  • 14:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:42 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@3a25565]: (no justification provided)
  • 14:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:39 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P20981 and previous config saved to /var/cache/conftool/dbconfig/20220217-143931-kormat.json
  • 14:32 hashar@deploy1002: Synchronized php-1.38.0-wmf.22/extensions/WikimediaMaintenance/dumpInterwiki.php: Backport: Stop excluding the 'wikipedia' interwiki prefix (T301936) (duration: 00m 48s)
  • 14:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:24 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable RelatedArticles for desktop (non-mobile) view at zhwikinews (T299856) (duration: 00m 49s)
  • 14:24 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20980 and previous config saved to /var/cache/conftool/dbconfig/20220217-142427-kormat.json
  • 14:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:19 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: InitialiseSettings: General cleanup, wgAddGroups (R-Z) (T301647) (no-op) (duration: 00m 50s)
  • 13:58 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20979 and previous config saved to /var/cache/conftool/dbconfig/20220217-135831-kormat.json
  • 13:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:43 moritzm: installing paramiko securiy updates
  • 13:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 13:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 13:18 moritzm: installing zsh security updates
  • 13:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:11 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300774)', diff saved to https://phabricator.wikimedia.org/P20977 and previous config saved to /var/cache/conftool/dbconfig/20220217-131111-kormat.json
  • 13:01 moritzm: installing expat security updates
  • 12:56 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P20976 and previous config saved to /var/cache/conftool/dbconfig/20220217-125607-kormat.json
  • 12:41 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P20975 and previous config saved to /var/cache/conftool/dbconfig/20220217-124102-kormat.json
  • 12:25 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300774)', diff saved to https://phabricator.wikimedia.org/P20974 and previous config saved to /var/cache/conftool/dbconfig/20220217-122557-kormat.json
  • 12:00 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T300774)', diff saved to https://phabricator.wikimedia.org/P20973 and previous config saved to /var/cache/conftool/dbconfig/20220217-120014-kormat.json
  • 12:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 12:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 12:00 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20972 and previous config saved to /var/cache/conftool/dbconfig/20220217-120001-kormat.json
  • 11:44 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P20971 and previous config saved to /var/cache/conftool/dbconfig/20220217-114456-kormat.json
  • 11:29 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P20970 and previous config saved to /var/cache/conftool/dbconfig/20220217-112951-kormat.json
  • 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: elastic1046.eqiad.wmnet
  • 11:28 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: elastic1046.eqiad.wmnet
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: elastic1043.eqiad.wmnet
  • 11:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: elastic1043.eqiad.wmnet
  • 11:14 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20969 and previous config saved to /var/cache/conftool/dbconfig/20220217-111447-kormat.json
  • 11:01 moritzm: installing python3.5 security uodates
  • 10:46 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T300774)', diff saved to https://phabricator.wikimedia.org/P20968 and previous config saved to /var/cache/conftool/dbconfig/20220217-104653-kormat.json
  • 10:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:46 kormat: running schema change against s5 T300774
  • 10:32 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 10:32 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 09:50 moritzm: migrate instances off ganeti1012
  • 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1017.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 09:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1017.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 09:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:39 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.22 refs T300198
  • 08:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:26 urbanecm: UTC early B&C now really done
  • 08:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c0cbd30: Deploy Growth features to 100% of newcomers on most Wikipedias (T301820) (duration: 00m 50s)
  • 08:22 apergos: UTC early B&C window NOT completed, woops.
  • 08:21 apergos: UTC early B&C window completed
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:10 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable SectionTranslation in Occitan and Luganda WPs + CX out-of-Beta for Luganda WP (T301443) (duration: 00m 51s)
  • 08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300381)', diff saved to https://phabricator.wikimedia.org/P20967 and previous config saved to /var/cache/conftool/dbconfig/20220217-062708-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P20966 and previous config saved to /var/cache/conftool/dbconfig/20220217-061203-marostegui.json
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P20965 and previous config saved to /var/cache/conftool/dbconfig/20220217-055659-marostegui.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300381)', diff saved to https://phabricator.wikimedia.org/P20964 and previous config saved to /var/cache/conftool/dbconfig/20220217-054154-marostegui.json
  • 04:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T300381)', diff saved to https://phabricator.wikimedia.org/P20963 and previous config saved to /var/cache/conftool/dbconfig/20220217-041721-marostegui.json
  • 04:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 04:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 04:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300381)', diff saved to https://phabricator.wikimedia.org/P20962 and previous config saved to /var/cache/conftool/dbconfig/20220217-041713-marostegui.json
  • 04:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P20961 and previous config saved to /var/cache/conftool/dbconfig/20220217-040208-marostegui.json
  • 03:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P20960 and previous config saved to /var/cache/conftool/dbconfig/20220217-034704-marostegui.json
  • 03:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300381)', diff saved to https://phabricator.wikimedia.org/P20959 and previous config saved to /var/cache/conftool/dbconfig/20220217-033159-marostegui.json
  • 02:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T300381)', diff saved to https://phabricator.wikimedia.org/P20958 and previous config saved to /var/cache/conftool/dbconfig/20220217-022128-marostegui.json
  • 02:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 02:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 02:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300381)', diff saved to https://phabricator.wikimedia.org/P20957 and previous config saved to /var/cache/conftool/dbconfig/20220217-022121-marostegui.json
  • 02:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P20956 and previous config saved to /var/cache/conftool/dbconfig/20220217-020616-marostegui.json
  • 01:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P20955 and previous config saved to /var/cache/conftool/dbconfig/20220217-015111-marostegui.json
  • 01:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300381)', diff saved to https://phabricator.wikimedia.org/P20954 and previous config saved to /var/cache/conftool/dbconfig/20220217-013607-marostegui.json
  • 00:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T300381)', diff saved to https://phabricator.wikimedia.org/P20953 and previous config saved to /var/cache/conftool/dbconfig/20220217-001907-marostegui.json
  • 00:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 00:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 00:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300381)', diff saved to https://phabricator.wikimedia.org/P20952 and previous config saved to /var/cache/conftool/dbconfig/20220217-001859-marostegui.json
  • 00:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P20951 and previous config saved to /var/cache/conftool/dbconfig/20220217-000355-marostegui.json

2022-02-16

  • 23:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P20950 and previous config saved to /var/cache/conftool/dbconfig/20220216-234850-marostegui.json
  • 23:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300381)', diff saved to https://phabricator.wikimedia.org/P20949 and previous config saved to /var/cache/conftool/dbconfig/20220216-233345-marostegui.json
  • 23:28 topranks: test reboot of lsw1-e1-eqiad - not in service.
  • 23:09 tgr@deploy1002: Synchronized wmf-config/logos.php: Config: Use huwiki 500k milestone logos (T301923) (duration: 00m 49s)
  • 23:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:58 tgr@deploy1002: Synchronized logos/config.yaml: Config: Add huwiki 500k milestone logos (T301923) (duration: 00m 49s)
  • 22:57 tgr@deploy1002: Synchronized static/images/project-logos/: Config: Add huwiki 500k milestone logos (T301923) (duration: 00m 50s)
  • 22:49 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Enable image recommendations on eswiki (T301276) (duration: 00m 52s)
  • 22:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20948 and previous config saved to /var/cache/conftool/dbconfig/20220216-222329-root.json
  • 22:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on doh[6001-6002].wikimedia.org with reason: T301165; errors expected, not serving any traffic
  • 22:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on doh[6001-6002].wikimedia.org with reason: T301165; errors expected, not serving any traffic
  • 22:15 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on durum[6001-6002].drmrs.wmnet with reason: T301165; errors expected, not serving any traffic
  • 22:15 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on durum[6001-6002].drmrs.wmnet with reason: T301165; errors expected, not serving any traffic
  • 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T300381)', diff saved to https://phabricator.wikimedia.org/P20946 and previous config saved to /var/cache/conftool/dbconfig/20220216-221456-marostegui.json
  • 22:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 22:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 22:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300381)', diff saved to https://phabricator.wikimedia.org/P20945 and previous config saved to /var/cache/conftool/dbconfig/20220216-221448-marostegui.json
  • 22:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20944 and previous config saved to /var/cache/conftool/dbconfig/20220216-220826-root.json
  • 21:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P20943 and previous config saved to /var/cache/conftool/dbconfig/20220216-215944-marostegui.json
  • 21:55 tgr@deploy1002: Synchronized php-1.38.0-wmf.22/includes/EditPage.php: Backport: EditPage: Parse wikitext in the usual way in the copyright message (T301890) (duration: 00m 49s)
  • 21:54 mutante: merged Alex's changes, built prometheus-etherpad-exporter_0.6 on deneb, imported on apt1001, ran reprepro export, installed new version on etherpad1003 T301872
  • 21:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20942 and previous config saved to /var/cache/conftool/dbconfig/20220216-215322-root.json
  • 21:52 tgr: ran mwscript updateCollation.php abwiki --force
  • 21:49 tgr@deploy1002: Synchronized php-1.38.0-wmf.22/includes/collation/AbkhazUppercaseCollation.php: Backport: Add Ӷ and Ԥ to Abkhaz collation (T298309) (duration: 00m 49s)
  • 21:48 tgr@deploy1002: Synchronized php-1.38.0-wmf.21/includes/collation/AbkhazUppercaseCollation.php: Backport: Add Ӷ and Ԥ to Abkhaz collation (T298309) (duration: 00m 49s)
  • 21:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P20941 and previous config saved to /var/cache/conftool/dbconfig/20220216-214439-marostegui.json
  • 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20940 and previous config saved to /var/cache/conftool/dbconfig/20220216-213819-root.json
  • 21:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300381)', diff saved to https://phabricator.wikimedia.org/P20939 and previous config saved to /var/cache/conftool/dbconfig/20220216-212934-marostegui.json
  • 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 21:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 21:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20938 and previous config saved to /var/cache/conftool/dbconfig/20220216-212315-root.json
  • 21:16 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: InitialiseSettings: General cleanup, wgAddGroups (J-P) (T301647) (duration: 00m 51s)
  • 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T300381)', diff saved to https://phabricator.wikimedia.org/P20937 and previous config saved to /var/cache/conftool/dbconfig/20220216-200922-marostegui.json
  • 20:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 20:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 20:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300381)', diff saved to https://phabricator.wikimedia.org/P20936 and previous config saved to /var/cache/conftool/dbconfig/20220216-200914-marostegui.json
  • 19:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P20934 and previous config saved to /var/cache/conftool/dbconfig/20220216-195410-marostegui.json
  • 19:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P20933 and previous config saved to /var/cache/conftool/dbconfig/20220216-193905-marostegui.json
  • 19:33 tzatziki: removing 28 files for legal compliance
  • 19:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300381)', diff saved to https://phabricator.wikimedia.org/P20932 and previous config saved to /var/cache/conftool/dbconfig/20220216-192400-marostegui.json
  • 19:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:49 mutante: deploying OTRS config change
  • 18:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T300381)', diff saved to https://phabricator.wikimedia.org/P20931 and previous config saved to /var/cache/conftool/dbconfig/20220216-181706-marostegui.json
  • 18:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 18:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 18:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300381)', diff saved to https://phabricator.wikimedia.org/P20930 and previous config saved to /var/cache/conftool/dbconfig/20220216-181651-marostegui.json
  • 18:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P20929 and previous config saved to /var/cache/conftool/dbconfig/20220216-180146-marostegui.json
  • 17:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P20926 and previous config saved to /var/cache/conftool/dbconfig/20220216-174641-marostegui.json
  • 17:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300381)', diff saved to https://phabricator.wikimedia.org/P20925 and previous config saved to /var/cache/conftool/dbconfig/20220216-173137-marostegui.json
  • 17:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 17:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 17:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 17:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 17:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bullseye
  • 17:25 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 17:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
  • 17:13 accraze@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 17:13 accraze@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 17:12 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
  • 17:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host contint2002.wikimedia.org with OS buster
  • 16:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bullseye
  • 16:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on contint2002.wikimedia.org with reason: host reimage
  • 16:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on contint2002.wikimedia.org with reason: host reimage
  • 16:51 mutante: contint2001 - temp disabled puppet (active CI server) - contint1001 - attempting to install newer docker version (gerrit:758987 T300682)
  • 16:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host contint2002.wikimedia.org with OS buster
  • 16:33 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T300774)', diff saved to https://phabricator.wikimedia.org/P20923 and previous config saved to /var/cache/conftool/dbconfig/20220216-163308-kormat.json
  • 16:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:26 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.21/extensions/FlaggedRevs/backend/FlaggedRevs.php: Backport: Use ParserOutputAccess for accessing ParserOutput (T283029) (duration: 00m 49s)
  • 16:18 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P20922 and previous config saved to /var/cache/conftool/dbconfig/20220216-161803-kormat.json
  • 16:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T300381)', diff saved to https://phabricator.wikimedia.org/P20921 and previous config saved to /var/cache/conftool/dbconfig/20220216-161054-marostegui.json
  • 16:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 16:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 16:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300381)', diff saved to https://phabricator.wikimedia.org/P20920 and previous config saved to /var/cache/conftool/dbconfig/20220216-161047-marostegui.json
  • 16:10 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.21/includes/page/ParserOutputAccess.php: Backport: ParserOutputAccess: Cache Parsing inside the class as well (T301310) (duration: 00m 52s)
  • 16:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:06 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.22/includes/page/ParserOutputAccess.php: Backport: ParserOutputAccess: Cache Parsing inside the class as well (T301310) (duration: 00m 54s)
  • 16:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:02 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P20919 and previous config saved to /var/cache/conftool/dbconfig/20220216-160257-kormat.json
  • 15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P20918 and previous config saved to /var/cache/conftool/dbconfig/20220216-155542-marostegui.json
  • 15:47 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T300774)', diff saved to https://phabricator.wikimedia.org/P20917 and previous config saved to /var/cache/conftool/dbconfig/20220216-154752-kormat.json
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P20916 and previous config saved to /var/cache/conftool/dbconfig/20220216-154037-marostegui.json
  • 15:35 moritzm: installing zsh security updates
  • 15:35 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T300774)', diff saved to https://phabricator.wikimedia.org/P20915 and previous config saved to /var/cache/conftool/dbconfig/20220216-153456-kormat.json
  • 15:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 15:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 15:34 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T300774)', diff saved to https://phabricator.wikimedia.org/P20914 and previous config saved to /var/cache/conftool/dbconfig/20220216-153448-kormat.json
  • 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300381)', diff saved to https://phabricator.wikimedia.org/P20913 and previous config saved to /var/cache/conftool/dbconfig/20220216-152529-marostegui.json
  • 15:19 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P20912 and previous config saved to /var/cache/conftool/dbconfig/20220216-151944-kormat.json
  • 15:04 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P20911 and previous config saved to /var/cache/conftool/dbconfig/20220216-150439-kormat.json
  • 15:04 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 15:03 jelto@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 15:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 15:00 jelto@deploy1002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 14:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:49 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T300774)', diff saved to https://phabricator.wikimedia.org/P20910 and previous config saved to /var/cache/conftool/dbconfig/20220216-144934-kormat.json
  • 14:47 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T300774)', diff saved to https://phabricator.wikimedia.org/P20909 and previous config saved to /var/cache/conftool/dbconfig/20220216-144726-kormat.json
  • 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 14:44 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 14:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 14:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 14:35 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T300774)', diff saved to https://phabricator.wikimedia.org/P20908 and previous config saved to /var/cache/conftool/dbconfig/20220216-143535-kormat.json
  • 14:21 moritzm: migrate instances off ganeti1017
  • 14:20 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P20907 and previous config saved to /var/cache/conftool/dbconfig/20220216-142030-kormat.json
  • 14:17 sukhe: disabled puppet on all doh* hosts except doh3001
  • 14:17 moritzm: failover the ganeti master to ganeti1024 T296721
  • 14:16 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2073.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:16 volans@cumin2002: START - Cookbook sre.hosts.provision for host elastic2073.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T300381)', diff saved to https://phabricator.wikimedia.org/P20906 and previous config saved to /var/cache/conftool/dbconfig/20220216-141546-marostegui.json
  • 14:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 14:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 14:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics@8991326]: (no justification provided) (duration: 00m 07s)
  • 14:13 mforns@deploy1002: Started deploy [airflow-dags/analytics@8991326]: (no justification provided)
  • 14:05 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P20905 and previous config saved to /var/cache/conftool/dbconfig/20220216-140526-kormat.json
  • 13:50 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T300774)', diff saved to https://phabricator.wikimedia.org/P20903 and previous config saved to /var/cache/conftool/dbconfig/20220216-135021-kormat.json
  • 13:46 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T300774)', diff saved to https://phabricator.wikimedia.org/P20902 and previous config saved to /var/cache/conftool/dbconfig/20220216-134612-kormat.json
  • 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:46 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T300774)', diff saved to https://phabricator.wikimedia.org/P20901 and previous config saved to /var/cache/conftool/dbconfig/20220216-134559-kormat.json
  • 13:30 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P20900 and previous config saved to /var/cache/conftool/dbconfig/20220216-133054-kormat.json
  • 13:29 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:29 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:29 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:28 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:27 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:27 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:24 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:23 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T300775)', diff saved to https://phabricator.wikimedia.org/P20899 and previous config saved to /var/cache/conftool/dbconfig/20220216-132322-marostegui.json
  • 13:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:23 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 13:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 13:21 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:21 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:15 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P20898 and previous config saved to /var/cache/conftool/dbconfig/20220216-131549-kormat.json
  • 13:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 13:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 13:12 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:00 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T300774)', diff saved to https://phabricator.wikimedia.org/P20897 and previous config saved to /var/cache/conftool/dbconfig/20220216-130044-kormat.json
  • 12:46 moritzm: installing apache-log4j1.2 security updates
  • 12:42 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T300774)', diff saved to https://phabricator.wikimedia.org/P20896 and previous config saved to /var/cache/conftool/dbconfig/20220216-124232-kormat.json
  • 12:42 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:42 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:42 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T300774)', diff saved to https://phabricator.wikimedia.org/P20895 and previous config saved to /var/cache/conftool/dbconfig/20220216-124225-kormat.json
  • 12:27 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P20894 and previous config saved to /var/cache/conftool/dbconfig/20220216-122720-kormat.json
  • 12:12 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P20893 and previous config saved to /var/cache/conftool/dbconfig/20220216-121215-kormat.json
  • 12:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 12:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 12:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 12:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300381)', diff saved to https://phabricator.wikimedia.org/P20892 and previous config saved to /var/cache/conftool/dbconfig/20220216-120840-marostegui.json
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300510)', diff saved to https://phabricator.wikimedia.org/P20891 and previous config saved to /var/cache/conftool/dbconfig/20220216-120659-ladsgroup.json
  • 12:06 moritzm: configure ganeti1024/ganeti1027/ganeti1028 as master candidates for eqiad Ganeti cluster
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1011.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 11:57 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T300774)', diff saved to https://phabricator.wikimedia.org/P20890 and previous config saved to /var/cache/conftool/dbconfig/20220216-115711-kormat.json
  • 11:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1011.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P20889 and previous config saved to /var/cache/conftool/dbconfig/20220216-115336-marostegui.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P20888 and previous config saved to /var/cache/conftool/dbconfig/20220216-115155-ladsgroup.json
  • 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
  • 11:43 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T300774)', diff saved to https://phabricator.wikimedia.org/P20887 and previous config saved to /var/cache/conftool/dbconfig/20220216-114310-kormat.json
  • 11:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:43 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T300774)', diff saved to https://phabricator.wikimedia.org/P20886 and previous config saved to /var/cache/conftool/dbconfig/20220216-114303-kormat.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P20885 and previous config saved to /var/cache/conftool/dbconfig/20220216-113831-marostegui.json
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P20884 and previous config saved to /var/cache/conftool/dbconfig/20220216-113650-ladsgroup.json
  • 11:27 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P20883 and previous config saved to /var/cache/conftool/dbconfig/20220216-112758-kormat.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300381)', diff saved to https://phabricator.wikimedia.org/P20882 and previous config saved to /var/cache/conftool/dbconfig/20220216-112326-marostegui.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300510)', diff saved to https://phabricator.wikimedia.org/P20881 and previous config saved to /var/cache/conftool/dbconfig/20220216-112145-ladsgroup.json
  • 11:12 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P20880 and previous config saved to /var/cache/conftool/dbconfig/20220216-111253-kormat.json
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20879 and previous config saved to /var/cache/conftool/dbconfig/20220216-110816-ladsgroup.json
  • 11:07 moritzm: restarting apache on prometheus nodes to pick up expat security updates
  • 10:57 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T300774)', diff saved to https://phabricator.wikimedia.org/P20878 and previous config saved to /var/cache/conftool/dbconfig/20220216-105748-kormat.json
  • 10:55 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T300774)', diff saved to https://phabricator.wikimedia.org/P20877 and previous config saved to /var/cache/conftool/dbconfig/20220216-105540-kormat.json
  • 10:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 10:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P20875 and previous config saved to /var/cache/conftool/dbconfig/20220216-105312-ladsgroup.json
  • 10:43 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:43 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P20873 and previous config saved to /var/cache/conftool/dbconfig/20220216-103807-ladsgroup.json
  • 10:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 10:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 10:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20872 and previous config saved to /var/cache/conftool/dbconfig/20220216-102302-ladsgroup.json
  • 10:20 moritzm: installing expat security updates
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T300381)', diff saved to https://phabricator.wikimedia.org/P20871 and previous config saved to /var/cache/conftool/dbconfig/20220216-101354-marostegui.json
  • 10:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 10:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300381)', diff saved to https://phabricator.wikimedia.org/P20870 and previous config saved to /var/cache/conftool/dbconfig/20220216-101346-marostegui.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P20869 and previous config saved to /var/cache/conftool/dbconfig/20220216-095841-marostegui.json
  • 09:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1146.eqiad.wmnet with OS bullseye
  • 09:52 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:50 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 09:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P20868 and previous config saved to /var/cache/conftool/dbconfig/20220216-094337-marostegui.json
  • 09:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1146.eqiad.wmnet with reason: host reimage
  • 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1146.eqiad.wmnet with reason: host reimage
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300381)', diff saved to https://phabricator.wikimedia.org/P20867 and previous config saved to /var/cache/conftool/dbconfig/20220216-092832-marostegui.json
  • 09:25 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 09:24 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 09:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1146.eqiad.wmnet with OS bullseye
  • 09:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:09 hashar@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.22 refs T300198 (duration: 00m 49s)
  • 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'T300510', diff saved to https://phabricator.wikimedia.org/P20866 and previous config saved to /var/cache/conftool/dbconfig/20220216-090924-ladsgroup.json
  • 09:08 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.22 refs T300198
  • 09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300510)', diff saved to https://phabricator.wikimedia.org/P20865 and previous config saved to /var/cache/conftool/dbconfig/20220216-090737-ladsgroup.json
  • 09:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:01 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 08:39 urbanecm: Set an email for developer account Osnard and re-enable it (T301796)
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20864 and previous config saved to /var/cache/conftool/dbconfig/20220216-083832-root.json
  • 08:33 dcausse: restarting blazegraph on wdqs1005 (jvm stuck for 4hours)
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20863 and previous config saved to /var/cache/conftool/dbconfig/20220216-082329-root.json
  • 08:18 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts prometheus1004.eqiad.wmnet
  • 08:13 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 9001a8c: Use $wgGroupInheritsPermissions for "confirmed" group (T275334; 2/2) (duration: 03m 39s)
  • 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T300381)', diff saved to https://phabricator.wikimedia.org/P20862 and previous config saved to /var/cache/conftool/dbconfig/20220216-081056-marostegui.json
  • 08:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:10 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus1004.eqiad.wmnet
  • 08:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9001a8c: Use $wgGroupInheritsPermissions for "confirmed" group (T275334; 1/2) (duration: 00m 51s)
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20861 and previous config saved to /var/cache/conftool/dbconfig/20220216-080825-root.json
  • 08:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T300510)', diff saved to https://phabricator.wikimedia.org/P20860 and previous config saved to /var/cache/conftool/dbconfig/20220216-080717-ladsgroup.json
  • 08:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20859 and previous config saved to /var/cache/conftool/dbconfig/20220216-080531-ladsgroup.json
  • 08:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20858 and previous config saved to /var/cache/conftool/dbconfig/20220216-075321-root.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20857 and previous config saved to /var/cache/conftool/dbconfig/20220216-073818-root.json
  • 07:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1133.eqiad.wmnet with OS bullseye
  • 07:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1133.eqiad.wmnet with reason: host reimage
  • 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1133.eqiad.wmnet with reason: host reimage
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300510)', diff saved to https://phabricator.wikimedia.org/P20856 and previous config saved to /var/cache/conftool/dbconfig/20220216-071125-ladsgroup.json
  • 07:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:00 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1133.eqiad.wmnet with OS bullseye
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P20855 and previous config saved to /var/cache/conftool/dbconfig/20220216-065620-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P20854 and previous config saved to /var/cache/conftool/dbconfig/20220216-064115-ladsgroup.json
  • 06:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 06:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 06:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 06:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 06:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 06:26 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.21/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: Clean up flaggedtemplate rows for deleted pages too (T296380) (duration: 00m 52s)
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300510)', diff saved to https://phabricator.wikimedia.org/P20853 and previous config saved to /var/cache/conftool/dbconfig/20220216-062610-ladsgroup.json
  • 06:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 06:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 06:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS bullseye
  • 06:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage
  • 06:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage
  • 05:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS bullseye
  • 05:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T300510)', diff saved to https://phabricator.wikimedia.org/P20852 and previous config saved to /var/cache/conftool/dbconfig/20220216-054749-ladsgroup.json
  • 05:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance

2022-02-15

  • 23:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase-dev2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:40 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host restbase-dev2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase-dev2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host restbase-dev2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase-dev2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:22 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host restbase-dev2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:14 tzatziki: Removing one file for legal compliance
  • 23:10 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300381)', diff saved to https://phabricator.wikimedia.org/P20850 and previous config saved to /var/cache/conftool/dbconfig/20220215-230454-marostegui.json
  • 22:55 tzatziki: Removing 5 files for legal compliance
  • 22:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P20849 and previous config saved to /var/cache/conftool/dbconfig/20220215-224950-marostegui.json
  • 22:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P20848 and previous config saved to /var/cache/conftool/dbconfig/20220215-223445-marostegui.json
  • 22:28 jhuneidi@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: sync on production
  • 22:27 jhuneidi@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply on staging
  • 22:27 jhuneidi@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply on production
  • 22:26 jhuneidi@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: sync on production
  • 22:26 jhuneidi@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply on staging
  • 22:25 jhuneidi@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply on production
  • 22:24 jhuneidi@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: sync on staging
  • 22:23 jhuneidi@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply on production
  • 22:23 jhuneidi@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply on staging
  • 22:21 jhuneidi@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply on production
  • 22:21 jhuneidi@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply on staging
  • 22:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300381)', diff saved to https://phabricator.wikimedia.org/P20847 and previous config saved to /var/cache/conftool/dbconfig/20220215-221940-marostegui.json
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T300381)', diff saved to https://phabricator.wikimedia.org/P20846 and previous config saved to /var/cache/conftool/dbconfig/20220215-220041-marostegui.json
  • 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300381)', diff saved to https://phabricator.wikimedia.org/P20845 and previous config saved to /var/cache/conftool/dbconfig/20220215-220034-marostegui.json
  • 22:00 hoo: Updated the Wikidata property suggester with data from the 2022-02-07 JSON dump (with pre-applied T132839 workarounds)
  • 21:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P20844 and previous config saved to /var/cache/conftool/dbconfig/20220215-214529-marostegui.json
  • 21:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:41 urbanecm: UTC late B&C window completed
  • 21:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2e0b51f: amiwiki: Deploy Growth features to newcomers (duration: 00m 49s)
  • 21:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:36 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: b3e8161: Apply max width setting to all Wikisource page namespaces (T300563; 2/2) (duration: 00m 49s)
  • 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:36 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b3e8161: Apply max width setting to all Wikisource page namespaces (T300563; 1/2) (duration: 00m 50s)
  • 21:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P20843 and previous config saved to /var/cache/conftool/dbconfig/20220215-213024-marostegui.json
  • 21:22 eileen: civicrm revision 815e3091 -> 84953e1d
  • 21:20 eileen: localsettings checkout revision (02f4888c -> 2a6d2e45)
  • 21:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300381)', diff saved to https://phabricator.wikimedia.org/P20842 and previous config saved to /var/cache/conftool/dbconfig/20220215-211519-marostegui.json
  • 21:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d97b43e: Remove MFUseDesktopContributionsPage config (T300583) (duration: 00m 52s)
  • 20:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T300381)', diff saved to https://phabricator.wikimedia.org/P20841 and previous config saved to /var/cache/conftool/dbconfig/20220215-205547-marostegui.json
  • 20:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 20:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 20:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300381)', diff saved to https://phabricator.wikimedia.org/P20840 and previous config saved to /var/cache/conftool/dbconfig/20220215-205539-marostegui.json
  • 20:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P20838 and previous config saved to /var/cache/conftool/dbconfig/20220215-204035-marostegui.json
  • 20:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P20837 and previous config saved to /var/cache/conftool/dbconfig/20220215-202530-marostegui.json
  • 20:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300381)', diff saved to https://phabricator.wikimedia.org/P20836 and previous config saved to /var/cache/conftool/dbconfig/20220215-201025-marostegui.json
  • 19:52 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1015.eqiad.wmnet with OS buster
  • 19:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T300381)', diff saved to https://phabricator.wikimedia.org/P20835 and previous config saved to /var/cache/conftool/dbconfig/20220215-195051-marostegui.json
  • 19:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 19:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 19:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300381)', diff saved to https://phabricator.wikimedia.org/P20834 and previous config saved to /var/cache/conftool/dbconfig/20220215-195042-marostegui.json
  • 19:43 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1015.eqiad.wmnet with reason: host reimage
  • 19:40 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1015.eqiad.wmnet with reason: host reimage
  • 19:39 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1093.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:38 herron: beginning rolling restart of kafka-main clusters for updates
  • 19:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P20833 and previous config saved to /var/cache/conftool/dbconfig/20220215-193537-marostegui.json
  • 19:30 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host elastic1093.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:30 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1015.eqiad.wmnet with OS buster
  • 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:27 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:25 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 19:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P20832 and previous config saved to /var/cache/conftool/dbconfig/20220215-192033-marostegui.json
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:12 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.22/skins/Vector: Backport: Revert "Add fetch tests from WVUI" (duration: 01m 07s)
  • 19:09 bblack: lvs1019 - start pybal/puppet with real routing, taking over low-traffic from lvs1020
  • 19:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gerrit2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300381)', diff saved to https://phabricator.wikimedia.org/P20831 and previous config saved to /var/cache/conftool/dbconfig/20220215-190528-marostegui.json
  • 18:58 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host gerrit2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host gerrit2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:50 bblack: cr[12]-eqiad - edit static fallback for low-traffic (lvs1015 -> lvs1019)
  • 18:41 bblack: lvs1019 - disable puppet/pybal, reboot - T301142
  • 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T300381)', diff saved to https://phabricator.wikimedia.org/P20830 and previous config saved to /var/cache/conftool/dbconfig/20220215-184037-marostegui.json
  • 18:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 18:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300381)', diff saved to https://phabricator.wikimedia.org/P20829 and previous config saved to /var/cache/conftool/dbconfig/20220215-184023-marostegui.json
  • 18:39 herron: beginning rolling restart of kafka-logging clusters for updates
  • 18:36 bblack: lvs1019 - first prod puppetization + pybal start
  • 18:35 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host gerrit2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host contint2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host contint2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P20828 and previous config saved to /var/cache/conftool/dbconfig/20220215-182519-marostegui.json
  • 18:18 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase1031.eqiad.wmnet with OS buster
  • 18:12 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1014.eqiad.wmnet with OS buster
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P20827 and previous config saved to /var/cache/conftool/dbconfig/20220215-181012-marostegui.json
  • 18:02 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1014.eqiad.wmnet with reason: host reimage
  • 17:59 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1014.eqiad.wmnet with reason: host reimage
  • 17:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300381)', diff saved to https://phabricator.wikimedia.org/P20826 and previous config saved to /var/cache/conftool/dbconfig/20220215-175508-marostegui.json
  • 17:48 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1014.eqiad.wmnet with OS buster
  • 17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host contint2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:47 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1031.eqiad.wmnet with OS buster
  • 17:42 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 17:40 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host contint2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:39 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync on main
  • 17:38 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply on main
  • 17:38 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply on main
  • 17:38 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply on main
  • 17:36 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: sync on main
  • 17:36 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply on main
  • 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T300381)', diff saved to https://phabricator.wikimedia.org/P20824 and previous config saved to /var/cache/conftool/dbconfig/20220215-173536-marostegui.json
  • 17:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300381)', diff saved to https://phabricator.wikimedia.org/P20823 and previous config saved to /var/cache/conftool/dbconfig/20220215-173529-marostegui.json
  • 17:34 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: sync on main
  • 17:33 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply on main
  • 17:32 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: sync on main
  • 17:32 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply on main
  • 17:26 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: sync on main
  • 17:26 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply on main
  • 17:20 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P20822 and previous config saved to /var/cache/conftool/dbconfig/20220215-172024-marostegui.json
  • 17:14 bblack: lvs1018 - bringing pybal online for production upload traffic
  • 17:08 bblack: cr[12]-eqiad: manual edit static fallback route for high-traffic2 from lvs1014 to lvs1018 - T301142
  • 17:06 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host contint2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P20821 and previous config saved to /var/cache/conftool/dbconfig/20220215-170520-marostegui.json
  • 17:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1011.eqiad.wmnet with OS buster
  • 16:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host contint2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:56 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:55 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 16:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1011.eqiad.wmnet with reason: host reimage
  • 16:51 bblack: lvs1018 - reboot
  • 16:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300381)', diff saved to https://phabricator.wikimedia.org/P20820 and previous config saved to /var/cache/conftool/dbconfig/20220215-165015-marostegui.json
  • 16:50 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1011.eqiad.wmnet with reason: host reimage
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T300006)', diff saved to https://phabricator.wikimedia.org/P20819 and previous config saved to /var/cache/conftool/dbconfig/20220215-164611-ladsgroup.json
  • 16:39 cwhite: logstash switchback to eqiad complete T299168
  • 16:38 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti1011.eqiad.wmnet with OS buster
  • 16:38 bblack: lvs1018 - puppeting into prod role for first time
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P20818 and previous config saved to /var/cache/conftool/dbconfig/20220215-163106-ladsgroup.json
  • 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T300381)', diff saved to https://phabricator.wikimedia.org/P20817 and previous config saved to /var/cache/conftool/dbconfig/20220215-162949-marostegui.json
  • 16:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300381)', diff saved to https://phabricator.wikimedia.org/P20816 and previous config saved to /var/cache/conftool/dbconfig/20220215-162941-marostegui.json
  • 16:26 bblack: lvs1014 - downtimed - stopping puppet+pybal to fail traffic over to lvs1020 - T301142
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P20815 and previous config saved to /var/cache/conftool/dbconfig/20220215-161601-ladsgroup.json
  • 16:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P20814 and previous config saved to /var/cache/conftool/dbconfig/20220215-161436-marostegui.json
  • 16:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus2004.codfw.wmnet
  • 16:01 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus2004.codfw.wmnet
  • 16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T300006)', diff saved to https://phabricator.wikimedia.org/P20813 and previous config saved to /var/cache/conftool/dbconfig/20220215-160055-ladsgroup.json
  • 15:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P20812 and previous config saved to /var/cache/conftool/dbconfig/20220215-155931-marostegui.json
  • 15:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es1024.eqiad.wmnet with OS bullseye
  • 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300381)', diff saved to https://phabricator.wikimedia.org/P20811 and previous config saved to /var/cache/conftool/dbconfig/20220215-154427-marostegui.json
  • 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T300381)', diff saved to https://phabricator.wikimedia.org/P20810 and previous config saved to /var/cache/conftool/dbconfig/20220215-152455-marostegui.json
  • 15:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300381)', diff saved to https://phabricator.wikimedia.org/P20809 and previous config saved to /var/cache/conftool/dbconfig/20220215-152448-marostegui.json
  • 15:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host es1024.eqiad.wmnet with OS bullseye
  • 15:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus1004.eqiad.wmnet
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300510)', diff saved to https://phabricator.wikimedia.org/P20808 and previous config saved to /var/cache/conftool/dbconfig/20220215-151026-ladsgroup.json
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P20807 and previous config saved to /var/cache/conftool/dbconfig/20220215-150943-marostegui.json
  • 15:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2005.codfw.wmnet with OS bullseye
  • 14:56 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P20806 and previous config saved to /var/cache/conftool/dbconfig/20220215-145521-ladsgroup.json
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P20805 and previous config saved to /var/cache/conftool/dbconfig/20220215-145438-marostegui.json
  • 14:50 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus1004.eqiad.wmnet
  • 14:40 hnowlan: removing java packages from all maps hosts
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P20804 and previous config saved to /var/cache/conftool/dbconfig/20220215-144016-ladsgroup.json
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300381)', diff saved to https://phabricator.wikimedia.org/P20803 and previous config saved to /var/cache/conftool/dbconfig/20220215-143934-marostegui.json
  • 14:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:37 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2005.codfw.wmnet with OS bullseye
  • 14:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:30 Lucas_WMDE: UTC afternoon backport window done
  • 14:28 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: InitialiseSettings: General cleanup (T301647) (wgAddGroups F-I) (duration: 02m 41s)
  • 14:28 moritzm: installing clamav security updates on otrs1001 / ticket.wikimedia.org
  • 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300510)', diff saved to https://phabricator.wikimedia.org/P20800 and previous config saved to /var/cache/conftool/dbconfig/20220215-142511-ladsgroup.json
  • 14:24 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts prometheus1004.eqiad.wmnet
  • 14:23 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus1004.eqiad.wmnet
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T300381)', diff saved to https://phabricator.wikimedia.org/P20799 and previous config saved to /var/cache/conftool/dbconfig/20220215-141916-marostegui.json
  • 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 14:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300381)', diff saved to https://phabricator.wikimedia.org/P20798 and previous config saved to /var/cache/conftool/dbconfig/20220215-141908-marostegui.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20797 and previous config saved to /var/cache/conftool/dbconfig/20220215-141411-ladsgroup.json
  • 14:07 hnowlan: removing java packages from maps2005
  • 14:06 volans: deployed spicerack v2.0.0 on cumin hosts
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T300775)', diff saved to https://phabricator.wikimedia.org/P20796 and previous config saved to /var/cache/conftool/dbconfig/20220215-140408-marostegui.json
  • 14:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P20795 and previous config saved to /var/cache/conftool/dbconfig/20220215-140404-marostegui.json
  • 14:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 14:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1022.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 14:02 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1022.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 14:02 volans@cumin2002: END (PASS) - Cookbook sre.hosts.test-cookbook (exit_code=0) testing new spicerack release
  • 14:02 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet with reason: testing new spicerack
  • 14:02 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin2002.codfw.wmnet with reason: testing new spicerack
  • 14:02 volans@cumin2002: START - Cookbook sre.hosts.test-cookbook testing new spicerack release
  • 14:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 14:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P20794 and previous config saved to /var/cache/conftool/dbconfig/20220215-135907-ladsgroup.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P20793 and previous config saved to /var/cache/conftool/dbconfig/20220215-134859-marostegui.json
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P20792 and previous config saved to /var/cache/conftool/dbconfig/20220215-134402-ladsgroup.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300381)', diff saved to https://phabricator.wikimedia.org/P20791 and previous config saved to /var/cache/conftool/dbconfig/20220215-133354-marostegui.json
  • 13:33 vgutierrez: rolling restart of envoy on cp nodes
  • 13:33 vgutierrez: enable puppet on cache::(text|upload)_envoy nodes
  • 13:31 moritzm: installing lxml security updates
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20790 and previous config saved to /var/cache/conftool/dbconfig/20220215-132857-ladsgroup.json
  • 13:25 vgutierrez: disable puppet on cache::(text|upload)_envoy nodes
  • 13:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 13:16 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 13:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 13:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T300381)', diff saved to https://phabricator.wikimedia.org/P20789 and previous config saved to /var/cache/conftool/dbconfig/20220215-131427-marostegui.json
  • 13:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1170.eqiad.wmnet with OS bullseye
  • 13:01 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus1006.eqiad.wmnet
  • 13:01 filippo@puppetmaster1001: conftool action : set/pooled=yes; selector: name=prometheus2006.codfw.wmnet
  • 13:00 filippo@puppetmaster1001: conftool action : set/weight=10; selector: name=prometheus2006.codfw.wmnet
  • 13:00 filippo@puppetmaster1001: conftool action : set/weight=10; selector: name=prometheus1006.eqiad.wmnet
  • 12:58 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.4.0 - volans@cumin2002
  • 12:57 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.4.0 - volans@cumin2002
  • 12:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 12:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 12:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300381)', diff saved to https://phabricator.wikimedia.org/P20788 and previous config saved to /var/cache/conftool/dbconfig/20220215-125548-marostegui.json
  • 12:54 volans@deploy1002: Finished deploy [homer/deploy@94bed87]: Release v0.4.0 (duration: 01m 28s)
  • 12:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1024.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 12:52 volans@deploy1002: Started deploy [homer/deploy@94bed87]: Release v0.4.0
  • 12:51 volans: uploaded spicerack_2.0.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 12:47 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts elastic2035.codfw.wmnet
  • 12:46 marostegui@cumin1001: START - Cookbook sre.hosts.provision for host es1024.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 12:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1170.eqiad.wmnet with OS bullseye
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T300510)', diff saved to https://phabricator.wikimedia.org/P20787 and previous config saved to /var/cache/conftool/dbconfig/20220215-124207-ladsgroup.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P20786 and previous config saved to /var/cache/conftool/dbconfig/20220215-124043-marostegui.json
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20785 and previous config saved to /var/cache/conftool/dbconfig/20220215-124035-ladsgroup.json
  • 12:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:32 topranks: Modifying anycast_import policy on cr1-eqiad to validate / prep for changes to support wikidough IPv6.
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P20784 and previous config saved to /var/cache/conftool/dbconfig/20220215-122533-marostegui.json
  • 12:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2104.codfw.wmnet with OS bullseye
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300381)', diff saved to https://phabricator.wikimedia.org/P20783 and previous config saved to /var/cache/conftool/dbconfig/20220215-121028-marostegui.json
  • 11:50 sukhe: running homer for Gerrit 762788 and T301165
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T300381)', diff saved to https://phabricator.wikimedia.org/P20782 and previous config saved to /var/cache/conftool/dbconfig/20220215-114950-marostegui.json
  • 11:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 11:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 11:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2104.codfw.wmnet with OS bullseye
  • 11:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 11:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 11:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 11:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:23 moritzm: rolling out Java 8 security updates for buster
  • 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:10 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.22 refs T300198
  • 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1024 (T300006)', diff saved to https://phabricator.wikimedia.org/P20781 and previous config saved to /var/cache/conftool/dbconfig/20220215-110420-ladsgroup.json
  • 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
  • 11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
  • 11:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 10:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T300381)', diff saved to https://phabricator.wikimedia.org/P20780 and previous config saved to /var/cache/conftool/dbconfig/20220215-105354-marostegui.json
  • 10:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P20779 and previous config saved to /var/cache/conftool/dbconfig/20220215-103849-marostegui.json
  • 10:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:25 hnowlan@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Restarting to pick up Java security updates - hnowlan@cumin1001
  • 10:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P20778 and previous config saved to /var/cache/conftool/dbconfig/20220215-102345-marostegui.json
  • 10:23 ladsgroup@deploy1002: Synchronized wmf-config/db-production.php: Config: Revert "db-production: Stop writes to es5" (T300976) (duration: 00m 55s)
  • 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Setting weight to es1023 T300006', diff saved to https://phabricator.wikimedia.org/P20777 and previous config saved to /var/cache/conftool/dbconfig/20220215-101817-root.json
  • 10:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote es1023 to es5 primary and set section read-write T300006', diff saved to https://phabricator.wikimedia.org/P20776 and previous config saved to /var/cache/conftool/dbconfig/20220215-101412-root.json
  • 10:10 Amir1: Starting es5 eqiad failover from es1024 to es1023 - T300006
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T300381)', diff saved to https://phabricator.wikimedia.org/P20775 and previous config saved to /var/cache/conftool/dbconfig/20220215-100840-marostegui.json
  • 10:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T300381)', diff saved to https://phabricator.wikimedia.org/P20774 and previous config saved to /var/cache/conftool/dbconfig/20220215-100333-marostegui.json
  • 10:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T300381)', diff saved to https://phabricator.wikimedia.org/P20773 and previous config saved to /var/cache/conftool/dbconfig/20220215-100325-marostegui.json
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set es1023 with weight 0 T300006', diff saved to https://phabricator.wikimedia.org/P20772 and previous config saved to /var/cache/conftool/dbconfig/20220215-100253-ladsgroup.json
  • 10:01 ladsgroup@deploy1002: Synchronized wmf-config/db-production.php: Config: db-production: Stop writes to es5 (T300976) (duration: 00m 49s)
  • 10:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:58 hashar@deploy1002: Pruned MediaWiki: 1.38.0-wmf.20 (duration: 03m 08s)
  • 09:55 hashar@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.22 refs T300198 (duration: 45m 55s)
  • 09:49 moritzm: migrate instances off ganeti1022
  • 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es5 T300006
  • 09:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es5 T300006
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P20771 and previous config saved to /var/cache/conftool/dbconfig/20220215-094821-marostegui.json
  • 09:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
  • 09:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
  • 09:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 09:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P20769 and previous config saved to /var/cache/conftool/dbconfig/20220215-093316-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T300381)', diff saved to https://phabricator.wikimedia.org/P20768 and previous config saved to /var/cache/conftool/dbconfig/20220215-091811-marostegui.json
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T300381)', diff saved to https://phabricator.wikimedia.org/P20767 and previous config saved to /var/cache/conftool/dbconfig/20220215-091606-marostegui.json
  • 09:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T300381)', diff saved to https://phabricator.wikimedia.org/P20766 and previous config saved to /var/cache/conftool/dbconfig/20220215-091554-marostegui.json
  • 09:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 09:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 09:09 hashar@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.22 refs T300198
  • 09:04 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-serve2008.codfw.wmnet
  • 09:04 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-serve2007.codfw.wmnet
  • 08:56 volans: rolling out python3-wmflib 1.0.2-1 across the fleet
  • 08:54 moritzm: imported openjdk-8 8u322-b06-1~deb10u1 for buster-wikimedia (forward port of latest Java 8 security fixes)
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P20764 and previous config saved to /var/cache/conftool/dbconfig/20220215-084544-marostegui.json
  • 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2135.codfw.wmnet with OS bullseye
  • 08:32 moritzm: installing apache security updates on thanos nodes
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T300381)', diff saved to https://phabricator.wikimedia.org/P20763 and previous config saved to /var/cache/conftool/dbconfig/20220215-083039-marostegui.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T300381)', diff saved to https://phabricator.wikimedia.org/P20762 and previous config saved to /var/cache/conftool/dbconfig/20220215-082533-marostegui.json
  • 08:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T300381)', diff saved to https://phabricator.wikimedia.org/P20761 and previous config saved to /var/cache/conftool/dbconfig/20220215-082519-marostegui.json
  • 08:15 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2135.codfw.wmnet with OS bullseye
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P20760 and previous config saved to /var/cache/conftool/dbconfig/20220215-081015-marostegui.json
  • 08:00 marostegui: Failover m3 from db1107 to db1183 - T301219
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P20759 and previous config saved to /var/cache/conftool/dbconfig/20220215-075510-marostegui.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T300381)', diff saved to https://phabricator.wikimedia.org/P20758 and previous config saved to /var/cache/conftool/dbconfig/20220215-074005-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T300381)', diff saved to https://phabricator.wikimedia.org/P20757 and previous config saved to /var/cache/conftool/dbconfig/20220215-073701-marostegui.json
  • 07:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T300381)', diff saved to https://phabricator.wikimedia.org/P20756 and previous config saved to /var/cache/conftool/dbconfig/20220215-073653-marostegui.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P20755 and previous config saved to /var/cache/conftool/dbconfig/20220215-072149-marostegui.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P20754 and previous config saved to /var/cache/conftool/dbconfig/20220215-070644-marostegui.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T300381)', diff saved to https://phabricator.wikimedia.org/P20753 and previous config saved to /var/cache/conftool/dbconfig/20220215-065139-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T300381)', diff saved to https://phabricator.wikimedia.org/P20752 and previous config saved to /var/cache/conftool/dbconfig/20220215-064631-marostegui.json
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T300381)', diff saved to https://phabricator.wikimedia.org/P20751 and previous config saved to /var/cache/conftool/dbconfig/20220215-064209-marostegui.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P20750 and previous config saved to /var/cache/conftool/dbconfig/20220215-062705-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P20749 and previous config saved to /var/cache/conftool/dbconfig/20220215-061200-marostegui.json
  • 05:59 marostegui: Remove watchdog@10.% user from pc1-pc3 T301442
  • 05:58 marostegui: Remove watchdog@10.% user from es1-es5 T301442
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T300381)', diff saved to https://phabricator.wikimedia.org/P20748 and previous config saved to /var/cache/conftool/dbconfig/20220215-055655-marostegui.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T300381)', diff saved to https://phabricator.wikimedia.org/P20747 and previous config saved to /var/cache/conftool/dbconfig/20220215-055441-marostegui.json
  • 05:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 05:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 05:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling db2136 (after maint)', diff saved to https://phabricator.wikimedia.org/P20746 and previous config saved to /var/cache/conftool/dbconfig/20220215-023518-ladsgroup.json
  • 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:14 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e" (duration: 06m 19s)
  • 02:09 mbsantos@deploy1002: Started deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e"
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn

2022-02-14

  • 22:04 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 22:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 22:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 21:25 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main
  • 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:15 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main
  • 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:33 mutante: mx/exim: re-adding donate@wikimedia.org email alias (OTRS -> ITS) (T297915)
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298554)', diff saved to https://phabricator.wikimedia.org/P20744 and previous config saved to /var/cache/conftool/dbconfig/20220214-202720-ladsgroup.json
  • 20:27 mutante: mx/exim: removing donate@wikimedia.org email alias (OTRS -> ITS) - was alias for fundraising@ (T297915)
  • 20:24 mutante: mx/exim: removing wikimania@wikimedia.org email alias (OTRS -> ITS) (T297915)
  • 20:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P20743 and previous config saved to /var/cache/conftool/dbconfig/20220214-201215-ladsgroup.json
  • 19:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P20742 and previous config saved to /var/cache/conftool/dbconfig/20220214-195711-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298554)', diff saved to https://phabricator.wikimedia.org/P20741 and previous config saved to /var/cache/conftool/dbconfig/20220214-194206-ladsgroup.json
  • 19:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300662)', diff saved to https://phabricator.wikimedia.org/P20740 and previous config saved to /var/cache/conftool/dbconfig/20220214-193732-marostegui.json
  • 19:36 herron: prometheus2006 systemctl reset-failed
  • 19:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20739 and previous config saved to /var/cache/conftool/dbconfig/20220214-192227-marostegui.json
  • 19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:08 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 19:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20738 and previous config saved to /var/cache/conftool/dbconfig/20220214-190722-marostegui.json
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298554)', diff saved to https://phabricator.wikimedia.org/P20737 and previous config saved to /var/cache/conftool/dbconfig/20220214-190235-ladsgroup.json
  • 19:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298554)', diff saved to https://phabricator.wikimedia.org/P20736 and previous config saved to /var/cache/conftool/dbconfig/20220214-190228-ladsgroup.json
  • 19:01 volans: uploaded python3-wmflib_1.0.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 18:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300662)', diff saved to https://phabricator.wikimedia.org/P20735 and previous config saved to /var/cache/conftool/dbconfig/20220214-185218-marostegui.json
  • 18:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T300662)', diff saved to https://phabricator.wikimedia.org/P20734 and previous config saved to /var/cache/conftool/dbconfig/20220214-185103-marostegui.json
  • 18:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 18:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 18:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300662)', diff saved to https://phabricator.wikimedia.org/P20733 and previous config saved to /var/cache/conftool/dbconfig/20220214-185056-marostegui.json
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P20732 and previous config saved to /var/cache/conftool/dbconfig/20220214-184723-ladsgroup.json
  • 18:44 mutante: contint2001 - disabling puppet, try replacing docker version (docker-io -> docker-ce), contint1001 first which is currently NOT the active server - gerrit:758987 T300682
  • 18:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P20731 and previous config saved to /var/cache/conftool/dbconfig/20220214-183551-marostegui.json
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P20730 and previous config saved to /var/cache/conftool/dbconfig/20220214-183218-ladsgroup.json
  • 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P20729 and previous config saved to /var/cache/conftool/dbconfig/20220214-182046-marostegui.json
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298554)', diff saved to https://phabricator.wikimedia.org/P20728 and previous config saved to /var/cache/conftool/dbconfig/20220214-181714-ladsgroup.json
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300662)', diff saved to https://phabricator.wikimedia.org/P20727 and previous config saved to /var/cache/conftool/dbconfig/20220214-180541-marostegui.json
  • 18:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T300662)', diff saved to https://phabricator.wikimedia.org/P20726 and previous config saved to /var/cache/conftool/dbconfig/20220214-180427-marostegui.json
  • 18:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300662)', diff saved to https://phabricator.wikimedia.org/P20725 and previous config saved to /var/cache/conftool/dbconfig/20220214-180419-marostegui.json
  • 17:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts etherpad1002.eqiad.wmnet
  • 17:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P20724 and previous config saved to /var/cache/conftool/dbconfig/20220214-174915-marostegui.json
  • 17:48 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts etherpad1002.eqiad.wmnet
  • 17:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance - hw issues
  • 17:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance - hw issues
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298554)', diff saved to https://phabricator.wikimedia.org/P20722 and previous config saved to /var/cache/conftool/dbconfig/20220214-173526-ladsgroup.json
  • 17:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P20721 and previous config saved to /var/cache/conftool/dbconfig/20220214-173410-marostegui.json
  • 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (hw issue)', diff saved to https://phabricator.wikimedia.org/P20720 and previous config saved to /var/cache/conftool/dbconfig/20220214-172924-ladsgroup.json
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300662)', diff saved to https://phabricator.wikimedia.org/P20719 and previous config saved to /var/cache/conftool/dbconfig/20220214-171905-marostegui.json
  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T300662)', diff saved to https://phabricator.wikimedia.org/P20718 and previous config saved to /var/cache/conftool/dbconfig/20220214-171750-marostegui.json
  • 17:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 17:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300662)', diff saved to https://phabricator.wikimedia.org/P20717 and previous config saved to /var/cache/conftool/dbconfig/20220214-171743-marostegui.json
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P20715 and previous config saved to /var/cache/conftool/dbconfig/20220214-170238-marostegui.json
  • 17:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:54 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 16:54 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 50s)
  • 16:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P20714 and previous config saved to /var/cache/conftool/dbconfig/20220214-164733-marostegui.json
  • 16:40 razzi@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host datahubsearch1002.eqiad.wmnet
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300662)', diff saved to https://phabricator.wikimedia.org/P20713 and previous config saved to /var/cache/conftool/dbconfig/20220214-163228-marostegui.json
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T300662)', diff saved to https://phabricator.wikimedia.org/P20712 and previous config saved to /var/cache/conftool/dbconfig/20220214-163113-marostegui.json
  • 16:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 16:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 16:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 16:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 16:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 16:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 16:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300662)', diff saved to https://phabricator.wikimedia.org/P20711 and previous config saved to /var/cache/conftool/dbconfig/20220214-163016-marostegui.json
  • 16:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P20710 and previous config saved to /var/cache/conftool/dbconfig/20220214-161511-marostegui.json
  • 16:08 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host datahubsearch1002.eqiad.wmnet
  • 16:07 jbond: update mx1001 to disable ldap validation of gmail emails gerrit:762442 (allready on mx2001)
  • 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P20709 and previous config saved to /var/cache/conftool/dbconfig/20220214-160007-marostegui.json
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:45 vgutierrez: re-enable puppet on cp nodes running HAProxy - T290005
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300662)', diff saved to https://phabricator.wikimedia.org/P20708 and previous config saved to /var/cache/conftool/dbconfig/20220214-154502-marostegui.json
  • 15:43 sukhe: running authdns-update for T301165
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T300662)', diff saved to https://phabricator.wikimedia.org/P20707 and previous config saved to /var/cache/conftool/dbconfig/20220214-154147-marostegui.json
  • 15:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 15:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300662)', diff saved to https://phabricator.wikimedia.org/P20706 and previous config saved to /var/cache/conftool/dbconfig/20220214-154139-marostegui.json
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298554)', diff saved to https://phabricator.wikimedia.org/P20705 and previous config saved to /var/cache/conftool/dbconfig/20220214-153811-ladsgroup.json
  • 15:37 jayme: published image docker-registry.discovery.wmnet/prometheus-statsd-exporter:0.0.10
  • 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P20704 and previous config saved to /var/cache/conftool/dbconfig/20220214-152635-marostegui.json
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P20703 and previous config saved to /var/cache/conftool/dbconfig/20220214-152306-ladsgroup.json
  • 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P20701 and previous config saved to /var/cache/conftool/dbconfig/20220214-151130-marostegui.json
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P20700 and previous config saved to /var/cache/conftool/dbconfig/20220214-150801-ladsgroup.json
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300662)', diff saved to https://phabricator.wikimedia.org/P20699 and previous config saved to /var/cache/conftool/dbconfig/20220214-145625-marostegui.json
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T300662)', diff saved to https://phabricator.wikimedia.org/P20698 and previous config saved to /var/cache/conftool/dbconfig/20220214-145508-marostegui.json
  • 14:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 14:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300662)', diff saved to https://phabricator.wikimedia.org/P20697 and previous config saved to /var/cache/conftool/dbconfig/20220214-145501-marostegui.json
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298554)', diff saved to https://phabricator.wikimedia.org/P20696 and previous config saved to /var/cache/conftool/dbconfig/20220214-145257-ladsgroup.json
  • 14:51 vgutierrez: disable puppet on cp nodes running HAProxy - T290005
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P20695 and previous config saved to /var/cache/conftool/dbconfig/20220214-143956-marostegui.json
  • 14:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:36 Lucas_WMDE: UTC afternoon backport window done
  • 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: InitialiseSettings: General cleanup (T301647) (should be a no-op) (duration: 00m 48s)
  • 14:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: trwikisource: Enable ULS webfonts by default (T283626) (duration: 00m 48s)
  • 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:27 moritzm: installing Java 8/stretch security updates
  • 14:26 jnuche: Jenkins upgrade complete
  • 14:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [WikibaseMediaInfo] Make synonyms profile the default (T301559) (duration: 00m 48s)
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P20694 and previous config saved to /var/cache/conftool/dbconfig/20220214-142452-marostegui.json
  • 14:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:17 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix missing icons for apiportalwiki and wikimaniawiki (T301636) (duration: 00m 49s)
  • 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298554)', diff saved to https://phabricator.wikimedia.org/P20693 and previous config saved to /var/cache/conftool/dbconfig/20220214-141304-ladsgroup.json
  • 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 14:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298554)', diff saved to https://phabricator.wikimedia.org/P20692 and previous config saved to /var/cache/conftool/dbconfig/20220214-141251-ladsgroup.json
  • 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:10 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ printf '%s\n' 'https://en.wikipedia.org/static/images/sul/foundation-black.png' | mwscript purgeList.php # T301636
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300662)', diff saved to https://phabricator.wikimedia.org/P20691 and previous config saved to /var/cache/conftool/dbconfig/20220214-140947-marostegui.json
  • 14:09 lucaswerkmeister-wmde@deploy1002: Synchronized static/images/sul/foundation-black.png: Config: Upload logo for apiportalwiki in wmgCentralAuthLoginIcon (T301636) (duration: 00m 49s)
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T300662)', diff saved to https://phabricator.wikimedia.org/P20690 and previous config saved to /var/cache/conftool/dbconfig/20220214-140832-marostegui.json
  • 14:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300662)', diff saved to https://phabricator.wikimedia.org/P20689 and previous config saved to /var/cache/conftool/dbconfig/20220214-140824-marostegui.json
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P20688 and previous config saved to /var/cache/conftool/dbconfig/20220214-135746-ladsgroup.json
  • 13:54 jnuche: Jenkins contint instances are going to be restarted soon
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P20687 and previous config saved to /var/cache/conftool/dbconfig/20220214-135320-marostegui.json
  • 13:47 moritzm: rolling restart of apache on logstash* to pick up expat security updates
  • 13:43 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4031.ulsfo.wmnet
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P20686 and previous config saved to /var/cache/conftool/dbconfig/20220214-134242-ladsgroup.json
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P20685 and previous config saved to /var/cache/conftool/dbconfig/20220214-133815-marostegui.json
  • 13:33 mmandere@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp4031.ulsfo.wmnet
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298554)', diff saved to https://phabricator.wikimedia.org/P20684 and previous config saved to /var/cache/conftool/dbconfig/20220214-132736-ladsgroup.json
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300662)', diff saved to https://phabricator.wikimedia.org/P20683 and previous config saved to /var/cache/conftool/dbconfig/20220214-132310-marostegui.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T300662)', diff saved to https://phabricator.wikimedia.org/P20682 and previous config saved to /var/cache/conftool/dbconfig/20220214-132155-marostegui.json
  • 13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300662)', diff saved to https://phabricator.wikimedia.org/P20681 and previous config saved to /var/cache/conftool/dbconfig/20220214-132135-marostegui.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P20680 and previous config saved to /var/cache/conftool/dbconfig/20220214-130630-marostegui.json
  • 12:53 arturo: merging https://gerrit.wikimedia.org/r/c/operations/homer/public/+/755478 to core routers
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P20679 and previous config saved to /var/cache/conftool/dbconfig/20220214-125125-marostegui.json
  • 12:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1016.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 12:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1016.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298554)', diff saved to https://phabricator.wikimedia.org/P20678 and previous config saved to /var/cache/conftool/dbconfig/20220214-123636-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298554)', diff saved to https://phabricator.wikimedia.org/P20677 and previous config saved to /var/cache/conftool/dbconfig/20220214-123629-ladsgroup.json
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300662)', diff saved to https://phabricator.wikimedia.org/P20676 and previous config saved to /var/cache/conftool/dbconfig/20220214-123620-marostegui.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T300662)', diff saved to https://phabricator.wikimedia.org/P20675 and previous config saved to /var/cache/conftool/dbconfig/20220214-123506-marostegui.json
  • 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 12:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 12:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300662)', diff saved to https://phabricator.wikimedia.org/P20674 and previous config saved to /var/cache/conftool/dbconfig/20220214-123446-marostegui.json
  • 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P20673 and previous config saved to /var/cache/conftool/dbconfig/20220214-122124-ladsgroup.json
  • 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P20672 and previous config saved to /var/cache/conftool/dbconfig/20220214-121941-marostegui.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P20671 and previous config saved to /var/cache/conftool/dbconfig/20220214-120619-ladsgroup.json
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P20670 and previous config saved to /var/cache/conftool/dbconfig/20220214-120436-marostegui.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P20669 and previous config saved to /var/cache/conftool/dbconfig/20220214-115250-marostegui.json
  • 11:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1021.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 11:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1009.eqiad.wmnet
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298554)', diff saved to https://phabricator.wikimedia.org/P20668 and previous config saved to /var/cache/conftool/dbconfig/20220214-115115-ladsgroup.json
  • 11:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1021.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300662)', diff saved to https://phabricator.wikimedia.org/P20667 and previous config saved to /var/cache/conftool/dbconfig/20220214-114931-marostegui.json
  • 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T300662)', diff saved to https://phabricator.wikimedia.org/P20666 and previous config saved to /var/cache/conftool/dbconfig/20220214-114817-marostegui.json
  • 11:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 11:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 11:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
  • 11:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298554)', diff saved to https://phabricator.wikimedia.org/P20665 and previous config saved to /var/cache/conftool/dbconfig/20220214-113850-ladsgroup.json
  • 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298554)', diff saved to https://phabricator.wikimedia.org/P20664 and previous config saved to /var/cache/conftool/dbconfig/20220214-113842-ladsgroup.json
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P20663 and previous config saved to /var/cache/conftool/dbconfig/20220214-112337-ladsgroup.json
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300382)', diff saved to https://phabricator.wikimedia.org/P20662 and previous config saved to /var/cache/conftool/dbconfig/20220214-111708-marostegui.json
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P20661 and previous config saved to /var/cache/conftool/dbconfig/20220214-110833-ladsgroup.json
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P20660 and previous config saved to /var/cache/conftool/dbconfig/20220214-110203-marostegui.json
  • 10:56 moritzm: restart apache/FPM on mediawiki canaries to pick up expat security updates
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298554)', diff saved to https://phabricator.wikimedia.org/P20659 and previous config saved to /var/cache/conftool/dbconfig/20220214-105328-ladsgroup.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P20658 and previous config saved to /var/cache/conftool/dbconfig/20220214-104659-marostegui.json
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298554)', diff saved to https://phabricator.wikimedia.org/P20657 and previous config saved to /var/cache/conftool/dbconfig/20220214-104143-ladsgroup.json
  • 10:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 10:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298554)', diff saved to https://phabricator.wikimedia.org/P20656 and previous config saved to /var/cache/conftool/dbconfig/20220214-104136-ladsgroup.json
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300382)', diff saved to https://phabricator.wikimedia.org/P20655 and previous config saved to /var/cache/conftool/dbconfig/20220214-103154-marostegui.json
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P20654 and previous config saved to /var/cache/conftool/dbconfig/20220214-102631-ladsgroup.json
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T300382)', diff saved to https://phabricator.wikimedia.org/P20653 and previous config saved to /var/cache/conftool/dbconfig/20220214-102142-marostegui.json
  • 10:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300382)', diff saved to https://phabricator.wikimedia.org/P20652 and previous config saved to /var/cache/conftool/dbconfig/20220214-102135-marostegui.json
  • 10:12 jayme: published image docker-registry.discovery.wmnet/cfssl-issuer:0.2.2-1
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P20650 and previous config saved to /var/cache/conftool/dbconfig/20220214-101126-ladsgroup.json
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P20649 and previous config saved to /var/cache/conftool/dbconfig/20220214-100630-marostegui.json
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298554)', diff saved to https://phabricator.wikimedia.org/P20648 and previous config saved to /var/cache/conftool/dbconfig/20220214-095622-ladsgroup.json
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P20647 and previous config saved to /var/cache/conftool/dbconfig/20220214-095125-marostegui.json
  • 09:44 jayme: published image docker-registry.discovery.wmnet/cfssl-issuer:0.2.2-0
  • 09:40 vgutierrez: update haproxy to 2.4.12 on cp4032 - T290005
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300382)', diff saved to https://phabricator.wikimedia.org/P20646 and previous config saved to /var/cache/conftool/dbconfig/20220214-093621-marostegui.json
  • 09:34 vgutierrez: update haproxy to 2.4.12 on cp4026 - T290005
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T300382)', diff saved to https://phabricator.wikimedia.org/P20645 and previous config saved to /var/cache/conftool/dbconfig/20220214-092602-marostegui.json
  • 09:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300382)', diff saved to https://phabricator.wikimedia.org/P20644 and previous config saved to /var/cache/conftool/dbconfig/20220214-092555-marostegui.json
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298554)', diff saved to https://phabricator.wikimedia.org/P20643 and previous config saved to /var/cache/conftool/dbconfig/20220214-091422-ladsgroup.json
  • 09:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 09:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P20642 and previous config saved to /var/cache/conftool/dbconfig/20220214-091050-marostegui.json
  • 08:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2008.codfw.wmnet with OS bullseye
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P20641 and previous config saved to /var/cache/conftool/dbconfig/20220214-085546-marostegui.json
  • 08:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:48 taavi: UTC morning deploys done (for real this time)
  • 08:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:45 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: prod: WRITE_NEW for CentralAuth hidden level migration (T289068) (duration: 00m 49s)
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300382)', diff saved to https://phabricator.wikimedia.org/P20640 and previous config saved to /var/cache/conftool/dbconfig/20220214-084041-marostegui.json
  • 08:40 urbanecm: Reopen UTC morning B&C for a last deploy
  • 08:40 urbanecm: UTC morning B&C window done
  • 08:39 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1b0daef: Fixed typo for SectionTranslation in testwiki: lu -> lg (duration: 00m 48s)
  • 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T300382)', diff saved to https://phabricator.wikimedia.org/P20639 and previous config saved to /var/cache/conftool/dbconfig/20220214-083051-marostegui.json
  • 08:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300382)', diff saved to https://phabricator.wikimedia.org/P20638 and previous config saved to /var/cache/conftool/dbconfig/20220214-083043-marostegui.json
  • 08:29 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2008.codfw.wmnet with OS bullseye
  • 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:19 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=arywiki --fix # T291737
  • 08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P20637 and previous config saved to /var/cache/conftool/dbconfig/20220214-081538-marostegui.json
  • 08:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: db0e71e: arywiki: Add Portal and Draft namespaces (T291737) (duration: 00m 52s)
  • 08:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2007.codfw.wmnet with OS bullseye
  • 08:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 08:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P20636 and previous config saved to /var/cache/conftool/dbconfig/20220214-080034-marostegui.json
  • 07:56 dcausse: restart blazegraph on wdqs1013 (jvm stuck for 26h)
  • 07:48 moritzm: installing expat security updates
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300382)', diff saved to https://phabricator.wikimedia.org/P20635 and previous config saved to /var/cache/conftool/dbconfig/20220214-074529-marostegui.json
  • 07:43 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2007.codfw.wmnet with OS bullseye
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T300382)', diff saved to https://phabricator.wikimedia.org/P20634 and previous config saved to /var/cache/conftool/dbconfig/20220214-073544-marostegui.json
  • 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 07:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T300382)', diff saved to https://phabricator.wikimedia.org/P20633 and previous config saved to /var/cache/conftool/dbconfig/20220214-071718-marostegui.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P20632 and previous config saved to /var/cache/conftool/dbconfig/20220214-070214-marostegui.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P20631 and previous config saved to /var/cache/conftool/dbconfig/20220214-064709-marostegui.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T300382)', diff saved to https://phabricator.wikimedia.org/P20630 and previous config saved to /var/cache/conftool/dbconfig/20220214-063204-marostegui.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T300382)', diff saved to https://phabricator.wikimedia.org/P20629 and previous config saved to /var/cache/conftool/dbconfig/20220214-062219-marostegui.json
  • 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 06:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 05:56 marostegui: Deploy schema change on s5 master (db1130) T300775
  • 05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance

2022-02-13

  • 23:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20627 and previous config saved to /var/cache/conftool/dbconfig/20220213-231742-marostegui.json
  • 23:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P20626 and previous config saved to /var/cache/conftool/dbconfig/20220213-230237-marostegui.json
  • 22:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P20625 and previous config saved to /var/cache/conftool/dbconfig/20220213-224733-marostegui.json
  • 22:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20624 and previous config saved to /var/cache/conftool/dbconfig/20220213-223228-marostegui.json
  • 19:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:26 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.21/includes/page/WikiPage.php: Backport: WikiPage: Cast the category values to string in updateCategoryCounts (T301433) (duration: 00m 49s)
  • 15:39 godog: shorten /var/log/swift/server.log.1 on thanos-be2001 to recover some space
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20623 and previous config saved to /var/cache/conftool/dbconfig/20220213-100348-marostegui.json
  • 10:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300775)', diff saved to https://phabricator.wikimedia.org/P20622 and previous config saved to /var/cache/conftool/dbconfig/20220213-100340-marostegui.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P20621 and previous config saved to /var/cache/conftool/dbconfig/20220213-094836-marostegui.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P20620 and previous config saved to /var/cache/conftool/dbconfig/20220213-093331-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300775)', diff saved to https://phabricator.wikimedia.org/P20619 and previous config saved to /var/cache/conftool/dbconfig/20220213-091826-marostegui.json

2022-02-12

  • 22:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T300775)', diff saved to https://phabricator.wikimedia.org/P20617 and previous config saved to /var/cache/conftool/dbconfig/20220212-225806-marostegui.json
  • 22:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 22:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 22:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 22:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 12:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:02 jelto: update gitlab-runner1001 and gitlab-runner2001 to gitlab-runner 14.7.0
  • 09:52 jelto: update gitlab1001 to gitlab-ce 14.7.2-ce.0
  • 09:41 jelto: update gitlab2001 to gitlab-ce 14.7.2-ce.0
  • 08:49 elukey: truncate /var/log/auth.log to 1g on krb1001 to free space on root partition (original log saved under /srv)
  • 07:23 dcausse: restarting blazegraph on wdqs1004 (jvm stuck for 4hours)
  • 03:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 03:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 03:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20616 and previous config saved to /var/cache/conftool/dbconfig/20220212-032710-marostegui.json
  • 03:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P20615 and previous config saved to /var/cache/conftool/dbconfig/20220212-031205-marostegui.json
  • 02:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P20614 and previous config saved to /var/cache/conftool/dbconfig/20220212-025700-marostegui.json
  • 02:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20613 and previous config saved to /var/cache/conftool/dbconfig/20220212-024155-marostegui.json

2022-02-11

  • 23:23 inflatador: puppet-merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/762006
  • 22:47 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main
  • 22:36 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main
  • 22:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 22:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 22:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 22:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 22:20 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main
  • 22:09 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main
  • 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 21:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:41 tzatziki: removed 16 emails from accounts with deleteUserEmail.php
  • 19:14 mutante: running puppet on all ores machines to install aspell-hi (gerrit:761974) which for some reason was installed on a random subset of ores servers (1002,2001,2005 but not the other 19 ones) T300195 T252581 - after this the package is now installed on 18 servers (1001-1009, 2001-2009)
  • 16:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 16:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
  • 16:54 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync on production
  • 16:53 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 16:53 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
  • 16:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync on production
  • 16:32 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host datahubsearch1001.eqiad.wmnet
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20611 and previous config saved to /var/cache/conftool/dbconfig/20220211-161324-marostegui.json
  • 16:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 16:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 16:03 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host datahubsearch1001.eqiad.wmnet
  • 14:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts auth2001.codfw.wmnet
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20610 and previous config saved to /var/cache/conftool/dbconfig/20220211-142045-root.json
  • 14:07 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts auth2001.codfw.wmnet
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20609 and previous config saved to /var/cache/conftool/dbconfig/20220211-140540-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20608 and previous config saved to /var/cache/conftool/dbconfig/20220211-135037-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20607 and previous config saved to /var/cache/conftool/dbconfig/20220211-133533-root.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20606 and previous config saved to /var/cache/conftool/dbconfig/20220211-132028-root.json
  • 13:19 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1011.eqiad.wmnet with OS buster
  • 13:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T300662)', diff saved to https://phabricator.wikimedia.org/P20605 and previous config saved to /var/cache/conftool/dbconfig/20220211-131507-marostegui.json
  • 13:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:53 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1011.eqiad.wmnet with OS buster
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1016.eqiad.wmnet with OS buster
  • 12:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1016.eqiad.wmnet with OS buster
  • 10:43 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 10:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
  • 10:42 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync on production
  • 10:42 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 10:42 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
  • 10:42 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync on production
  • 10:41 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 10:40 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
  • 10:40 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync on production
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1021.eqiad.wmnet with OS buster
  • 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1021.eqiad.wmnet with OS buster
  • 10:05 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 10:05 jelto@deploy1002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 09:29 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 09:29 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality' for release 'main' .
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist group from s1 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P20599 and previous config saved to /var/cache/conftool/dbconfig/20220211-090223-marostegui.json
  • 08:57 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti1011.eqiad.wmnet with OS buster
  • 08:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1011.eqiad.wmnet with OS buster
  • 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 06:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 06:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20598 and previous config saved to /var/cache/conftool/dbconfig/20220211-062306-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P20597 and previous config saved to /var/cache/conftool/dbconfig/20220211-060801-marostegui.json
  • 05:56 marostegui: Remove watchdog@10.% user from s6 codfw T301442
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P20596 and previous config saved to /var/cache/conftool/dbconfig/20220211-055256-marostegui.json
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20595 and previous config saved to /var/cache/conftool/dbconfig/20220211-053752-marostegui.json
  • 02:33 eileen: checkout revision (ccd5afc3 -> 815e3091)
  • 02:32 eileen: civicrm: revision 815e3091, config 02f4888c
  • 00:38 thcipriani: utc late backport Yes Done
  • 00:33 thcipriani@deploy1002: Synchronized dblists/desktop-improvements.dblist: Config: Make Vector 2022 the default skin for MediaWiki.org (T298519) (duration: 00m 48s)
  • 00:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:31 thcipriani@deploy1002: Synchronized wmf-config/config/mediawikiwiki.yaml: Config: Make Vector 2022 the default skin for MediaWiki.org (T298519) (duration: 00m 48s)
  • 00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:16 bwang@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: urwiki: Add patroller usergroup (T301491) (duration: 00m 49s)
  • 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 00:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298554)', diff saved to https://phabricator.wikimedia.org/P20594 and previous config saved to /var/cache/conftool/dbconfig/20220211-001425-ladsgroup.json

2022-02-10

  • 23:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P20593 and previous config saved to /var/cache/conftool/dbconfig/20220210-235920-ladsgroup.json
  • 23:54 cstone: Donation Interface revision changed from dbcb5254 to a6a9b63e
  • 23:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P20592 and previous config saved to /var/cache/conftool/dbconfig/20220210-234416-ladsgroup.json
  • 23:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298554)', diff saved to https://phabricator.wikimedia.org/P20591 and previous config saved to /var/cache/conftool/dbconfig/20220210-232911-ladsgroup.json
  • 23:18 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T298554)', diff saved to https://phabricator.wikimedia.org/P20590 and previous config saved to /var/cache/conftool/dbconfig/20220210-231004-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 23:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 22:39 mutante: etherpad - succesfully switched to etherpad1003 (bullseye) and etherpad 1.8.16 - on second attempt after making it listen on IPv6 to work behind envoy (T300568) - https://gerrit.wikimedia.org/r/c/operations/puppet/+/761727/
  • 22:34 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 22:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 22:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 22:28 bblack@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 22:27 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS buster
  • 22:26 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 22:24 mutante: etherpad - one more short downtime for maintenance - downtimed in alertmanager and icinga
  • 22:04 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS buster
  • 21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298554)', diff saved to https://phabricator.wikimedia.org/P20589 and previous config saved to /var/cache/conftool/dbconfig/20220210-215354-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P20588 and previous config saved to /var/cache/conftool/dbconfig/20220210-213849-ladsgroup.json
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P20587 and previous config saved to /var/cache/conftool/dbconfig/20220210-212344-ladsgroup.json
  • 21:16 bblack: cr1-eqiad - manual config, static fallback for high-traffic1 to lvs1017
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298554)', diff saved to https://phabricator.wikimedia.org/P20586 and previous config saved to /var/cache/conftool/dbconfig/20220210-210839-ladsgroup.json
  • 21:08 bblack: lvs1017 - bringing pybal online with real routing, flips high-traffic (text-cluster) traffic from lvs1020 -> lvs1017
  • 20:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T298554)', diff saved to https://phabricator.wikimedia.org/P20585 and previous config saved to /var/cache/conftool/dbconfig/20220210-204831-ladsgroup.json
  • 20:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 20:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 20:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298554)', diff saved to https://phabricator.wikimedia.org/P20584 and previous config saved to /var/cache/conftool/dbconfig/20220210-204818-ladsgroup.json
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P20583 and previous config saved to /var/cache/conftool/dbconfig/20220210-203313-ladsgroup.json
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P20582 and previous config saved to /var/cache/conftool/dbconfig/20220210-201808-ladsgroup.json
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:08 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.21 refs T300197
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298554)', diff saved to https://phabricator.wikimedia.org/P20581 and previous config saved to /var/cache/conftool/dbconfig/20220210-200304-ladsgroup.json
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T298554)', diff saved to https://phabricator.wikimedia.org/P20580 and previous config saved to /var/cache/conftool/dbconfig/20220210-194518-ladsgroup.json
  • 19:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 19:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298554)', diff saved to https://phabricator.wikimedia.org/P20579 and previous config saved to /var/cache/conftool/dbconfig/20220210-194510-ladsgroup.json
  • 19:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P20578 and previous config saved to /var/cache/conftool/dbconfig/20220210-193005-ladsgroup.json
  • 19:25 bblack: lvs1017 reboot again for clean network config - T301142
  • 19:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P20577 and previous config saved to /var/cache/conftool/dbconfig/20220210-191501-ladsgroup.json
  • 19:13 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@828a428] (eqiad): Configure geoshapes postgres max conns (duration: 01m 29s)
  • 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:13 urbanecm@deploy1002: Synchronized wmf-config/flaggedrevs.php: 72f3b31: Migrate $wmfStandardAutoPromote to $wmgStandardAutoPromote (T45956) (duration: 00m 49s)
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:12 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@828a428] (eqiad): Configure geoshapes postgres max conns
  • 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:11 bblack: lvs1017 rebooting for sanity-check after prod config - T301142
  • 19:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300382)', diff saved to https://phabricator.wikimedia.org/P20576 and previous config saved to /var/cache/conftool/dbconfig/20220210-190840-marostegui.json
  • 19:03 otto@deploy1002: Finished deploy [airflow-dags/research@b871faf]: (no justification provided) (duration: 00m 03s)
  • 19:03 otto@deploy1002: Started deploy [airflow-dags/research@b871faf]: (no justification provided)
  • 19:01 otto@deploy1002: Finished deploy [airflow-dags/research@b871faf]: (no justification provided) (duration: 00m 27s)
  • 19:01 otto@deploy1002: Started deploy [airflow-dags/research@b871faf]: (no justification provided)
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298554)', diff saved to https://phabricator.wikimedia.org/P20575 and previous config saved to /var/cache/conftool/dbconfig/20220210-185956-ladsgroup.json
  • 18:53 ebernhardson: restart all mjolnir daemons on search-loader1001 and 2001 to purge old cached node lists
  • 18:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P20574 and previous config saved to /var/cache/conftool/dbconfig/20220210-185336-marostegui.json
  • 18:52 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: sync on production
  • 18:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:49 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply on staging
  • 18:49 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply on production
  • 18:49 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: sync on production
  • 18:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:46 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply on staging
  • 18:46 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply on production
  • 18:45 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: sync on staging
  • 18:45 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1031.eqiad.wmnet with OS buster
  • 18:45 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1032.eqiad.wmnet with OS buster
  • 18:45 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply on production
  • 18:45 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1033.eqiad.wmnet with OS buster
  • 18:45 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply on staging
  • 18:44 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply on staging
  • 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:43 bblack: lvs1013 - stopping puppet+pybal for move to lvs1017, high-traffic1 traffic fails over to lvs1020 for now - T301142
  • 18:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:42 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.21/includes/content/ContentHandler.php: Backport: ContentHandler: Avoding saving in ParserCache in search index jobs (T285993) (duration: 00m 50s)
  • 18:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:40 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.20/includes/content/ContentHandler.php: Backport: ContentHandler: Avoding saving in ParserCache in search index jobs (T285993) (duration: 00m 50s)
  • 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T300775)', diff saved to https://phabricator.wikimedia.org/P20573 and previous config saved to /var/cache/conftool/dbconfig/20220210-184012-marostegui.json
  • 18:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300775)', diff saved to https://phabricator.wikimedia.org/P20572 and previous config saved to /var/cache/conftool/dbconfig/20220210-184004-marostegui.json
  • 18:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P20571 and previous config saved to /var/cache/conftool/dbconfig/20220210-183831-marostegui.json
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2088:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20570 and previous config saved to /var/cache/conftool/dbconfig/20220210-183107-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T298554)', diff saved to https://phabricator.wikimedia.org/P20569 and previous config saved to /var/cache/conftool/dbconfig/20220210-182959-ladsgroup.json
  • 18:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 18:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298554)', diff saved to https://phabricator.wikimedia.org/P20568 and previous config saved to /var/cache/conftool/dbconfig/20220210-182952-ladsgroup.json
  • 18:29 bblack@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:28 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@a5be8ac] (eqiad): Remove references to cassandra `storage_id` (duration: 01m 01s)
  • 18:27 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@a5be8ac] (eqiad): Remove references to cassandra `storage_id`
  • 18:26 bblack@cumin1001: START - Cookbook sre.dns.netbox
  • 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2088:3311 (T300510)', diff saved to https://phabricator.wikimedia.org/P20567 and previous config saved to /var/cache/conftool/dbconfig/20220210-182547-ladsgroup.json
  • 18:25 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@a5be8ac] (eqiad): Remove references to cassandra `storage_id` (duration: 00m 15s)
  • 18:25 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@a5be8ac] (eqiad): Remove references to cassandra `storage_id`
  • 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P20566 and previous config saved to /var/cache/conftool/dbconfig/20220210-182500-marostegui.json
  • 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300382)', diff saved to https://phabricator.wikimedia.org/P20565 and previous config saved to /var/cache/conftool/dbconfig/20220210-182326-marostegui.json
  • 18:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1033.eqiad.wmnet with OS buster
  • 18:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS buster
  • 18:16 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1031.eqiad.wmnet with OS buster
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P20564 and previous config saved to /var/cache/conftool/dbconfig/20220210-181447-ladsgroup.json
  • 18:13 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@bf5fb8e] (eqiad): Remove unused kartotherian-postgres reference (duration: 00m 14s)
  • 18:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@bf5fb8e] (eqiad): Remove unused kartotherian-postgres reference
  • 18:12 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@5699db7] (eqiad): Remove unused kartotherian-layermixer reference (duration: 04m 52s)
  • 18:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2088.codfw.wmnet with OS bullseye
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P20563 and previous config saved to /var/cache/conftool/dbconfig/20220210-180955-marostegui.json
  • 18:07 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@5699db7] (eqiad): Remove unused kartotherian-layermixer reference
  • 18:06 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@4312bc3] (eqiad): Update kartotherian-package to dd11f2d (duration: 05m 58s)
  • 18:00 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@4312bc3] (eqiad): Update kartotherian-package to dd11f2d
  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P20562 and previous config saved to /var/cache/conftool/dbconfig/20220210-175942-ladsgroup.json
  • 17:57 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@4312bc3] (eqiad): Update kartotherian-package to dd11f2d (duration: 05m 59s)
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300775)', diff saved to https://phabricator.wikimedia.org/P20561 and previous config saved to /var/cache/conftool/dbconfig/20220210-175450-marostegui.json
  • 17:51 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@4312bc3] (eqiad): Update kartotherian-package to dd11f2d
  • 17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298554)', diff saved to https://phabricator.wikimedia.org/P20560 and previous config saved to /var/cache/conftool/dbconfig/20220210-174438-ladsgroup.json
  • 17:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2088.codfw.wmnet with OS bullseye
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2088:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20559 and previous config saved to /var/cache/conftool/dbconfig/20220210-173957-ladsgroup.json
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2088:3311 (T300510)', diff saved to https://phabricator.wikimedia.org/P20558 and previous config saved to /var/cache/conftool/dbconfig/20220210-173932-ladsgroup.json
  • 17:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 17:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 17:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1011.eqiad.wmnet with OS stretch
  • 17:31 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1010.eqiad.wmnet with OS stretch
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 17:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1009.eqiad.wmnet with OS stretch
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T298554)', diff saved to https://phabricator.wikimedia.org/P20557 and previous config saved to /var/cache/conftool/dbconfig/20220210-172635-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T300382)', diff saved to https://phabricator.wikimedia.org/P20556 and previous config saved to /var/cache/conftool/dbconfig/20220210-172307-marostegui.json
  • 17:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 17:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300382)', diff saved to https://phabricator.wikimedia.org/P20555 and previous config saved to /var/cache/conftool/dbconfig/20220210-172300-marostegui.json
  • 17:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 17:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
  • 17:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync on production
  • 17:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 17:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
  • 17:14 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync on production
  • 17:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1011.eqiad.wmnet with OS stretch
  • 17:10 rzl: rzl@cumin2001:~$ sudo cumin A:mw "enable-puppet T273323"
  • 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P20553 and previous config saved to /var/cache/conftool/dbconfig/20220210-170755-marostegui.json
  • 17:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1010.eqiad.wmnet with OS stretch
  • 17:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1009.eqiad.wmnet with OS stretch
  • 17:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbmonitor1002.wikimedia.org
  • 17:03 rzl: rzl@cumin2001:~$ sudo cumin A:mw "disable-puppet T273323"
  • 17:01 mutante: etherpad going down for maintenance
  • 16:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbmonitor1002.wikimedia.org
  • 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P20552 and previous config saved to /var/cache/conftool/dbconfig/20220210-165250-marostegui.json
  • 16:50 otto@deploy1002: Finished deploy [airflow-dags/analytics@5b6ba8e]: (no justification provided) (duration: 00m 10s)
  • 16:50 otto@deploy1002: Started deploy [airflow-dags/analytics@5b6ba8e]: (no justification provided)
  • 16:50 otto@deploy1002: Finished deploy [airflow-dags/analytics@5b6ba8e]: (no justification provided) (duration: 01m 46s)
  • 16:48 otto@deploy1002: Started deploy [airflow-dags/analytics@5b6ba8e]: (no justification provided)
  • 16:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300382)', diff saved to https://phabricator.wikimedia.org/P20551 and previous config saved to /var/cache/conftool/dbconfig/20220210-163746-marostegui.json
  • 16:37 otto@deploy1002: Finished deploy [airflow-dags/analytics_test@5b6ba8e]: (no justification provided) (duration: 00m 08s)
  • 16:37 otto@deploy1002: Started deploy [airflow-dags/analytics_test@5b6ba8e]: (no justification provided)
  • 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T300382)', diff saved to https://phabricator.wikimedia.org/P20550 and previous config saved to /var/cache/conftool/dbconfig/20220210-163633-marostegui.json
  • 16:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 16:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300382)', diff saved to https://phabricator.wikimedia.org/P20549 and previous config saved to /var/cache/conftool/dbconfig/20220210-163620-marostegui.json
  • 16:22 otto@deploy1002: Finished deploy [airflow-dags/analytics_test@66d6cad]: (no justification provided) (duration: 00m 11s)
  • 16:22 otto@deploy1002: Started deploy [airflow-dags/analytics_test@66d6cad]: (no justification provided)
  • 16:22 otto@deploy1002: Finished deploy [airflow-dags/analytics_test@66d6cad]: (no justification provided) (duration: 07m 49s)
  • 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P20548 and previous config saved to /var/cache/conftool/dbconfig/20220210-162115-marostegui.json
  • 16:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:14 otto@deploy1002: Started deploy [airflow-dags/analytics_test@66d6cad]: (no justification provided)
  • 16:14 otto@deploy1002: Finished deploy [airflow-dags/analytics_test@66d6cad]: (no justification provided) (duration: 04m 19s)
  • 16:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:09 otto@deploy1002: Started deploy [airflow-dags/analytics_test@66d6cad]: (no justification provided)
  • 16:09 ppchelko@deploy1002: Synchronized w/tmp_settings_bench.php: Config: gerrit 761433 settings benchmark - measure new static php array config load (duration: 00m 49s)
  • 16:08 otto@deploy1002: Finished deploy [airflow-dags/analytics_test@66d6cad]: (no justification provided) (duration: 00m 46s)
  • 16:07 otto@deploy1002: Started deploy [airflow-dags/analytics_test@66d6cad]: (no justification provided)
  • 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P20547 and previous config saved to /var/cache/conftool/dbconfig/20220210-160611-marostegui.json
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298554)', diff saved to https://phabricator.wikimedia.org/P20546 and previous config saved to /var/cache/conftool/dbconfig/20220210-160417-ladsgroup.json
  • 16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T300510)', diff saved to https://phabricator.wikimedia.org/P20545 and previous config saved to /var/cache/conftool/dbconfig/20220210-160046-ladsgroup.json
  • 16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T300510)', diff saved to https://phabricator.wikimedia.org/P20544 and previous config saved to /var/cache/conftool/dbconfig/20220210-160003-ladsgroup.json
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300382)', diff saved to https://phabricator.wikimedia.org/P20543 and previous config saved to /var/cache/conftool/dbconfig/20220210-155106-marostegui.json
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P20542 and previous config saved to /var/cache/conftool/dbconfig/20220210-154913-ladsgroup.json
  • 15:39 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.20/includes/Storage/DerivedPageDataUpdater.php: Backport: DerivedPageDataUpdater: Set ParserOutput when it's passed to it (T301309) (duration: 00m 50s)
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P20541 and previous config saved to /var/cache/conftool/dbconfig/20220210-153408-ladsgroup.json
  • 15:32 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.21/includes/Storage/DerivedPageDataUpdater.php: Backport: DerivedPageDataUpdater: Set ParserOutput when it's passed to it (T301309) (duration: 00m 53s)
  • 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2138.codfw.wmnet with OS bullseye
  • 15:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 15:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 15:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:20 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 15:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 15:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298554)', diff saved to https://phabricator.wikimedia.org/P20538 and previous config saved to /var/cache/conftool/dbconfig/20220210-151903-ladsgroup.json
  • 15:17 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 15:16 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:57 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:56 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2138.codfw.wmnet with OS bullseye
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T300382)', diff saved to https://phabricator.wikimedia.org/P20537 and previous config saved to /var/cache/conftool/dbconfig/20220210-145047-marostegui.json
  • 14:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 14:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300382)', diff saved to https://phabricator.wikimedia.org/P20536 and previous config saved to /var/cache/conftool/dbconfig/20220210-145040-marostegui.json
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138 (T300510)', diff saved to https://phabricator.wikimedia.org/P20535 and previous config saved to /var/cache/conftool/dbconfig/20220210-144913-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P20534 and previous config saved to /var/cache/conftool/dbconfig/20220210-143535-marostegui.json
  • 14:23 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-serve2006.codfw.wmnet
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P20533 and previous config saved to /var/cache/conftool/dbconfig/20220210-142030-marostegui.json
  • 14:19 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-serve2006.codfw.wmnet
  • 14:19 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ml-serve2005.codfw.wmnet
  • 14:10 elukey: `elukey@cumin1001:~$ homer 'cr*codfw*' commit "Add ml-serve2006 to the k8s ml-serve-codfw cluster's neighbors"`
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300382)', diff saved to https://phabricator.wikimedia.org/P20532 and previous config saved to /var/cache/conftool/dbconfig/20220210-140525-marostegui.json
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298554)', diff saved to https://phabricator.wikimedia.org/P20531 and previous config saved to /var/cache/conftool/dbconfig/20220210-140500-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 14:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 14:00 moritzm: installing apache security updates on phab1001/phabricator.wikimedia.org
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T300382)', diff saved to https://phabricator.wikimedia.org/P20530 and previous config saved to /var/cache/conftool/dbconfig/20220210-135411-marostegui.json
  • 13:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 13:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 13:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300382)', diff saved to https://phabricator.wikimedia.org/P20529 and previous config saved to /var/cache/conftool/dbconfig/20220210-135332-marostegui.json
  • 13:50 moritzm: installing apache security updates on otrs1001/ticket.wikimedia.org
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P20527 and previous config saved to /var/cache/conftool/dbconfig/20220210-133827-marostegui.json
  • 13:28 moritzm: installing lxml security updates
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P20526 and previous config saved to /var/cache/conftool/dbconfig/20220210-132323-marostegui.json
  • 13:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus1003.eqiad.wmnet
  • 13:09 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus1003.eqiad.wmnet
  • 13:08 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts prometheus2003.codfw.wmnet
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300382)', diff saved to https://phabricator.wikimedia.org/P20525 and previous config saved to /var/cache/conftool/dbconfig/20220210-130818-marostegui.json
  • 12:59 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus2003.codfw.wmnet
  • 12:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298554)', diff saved to https://phabricator.wikimedia.org/P20524 and previous config saved to /var/cache/conftool/dbconfig/20220210-125850-ladsgroup.json
  • 12:58 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts prometheus2003.codfw.wmnet
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T300382)', diff saved to https://phabricator.wikimedia.org/P20523 and previous config saved to /var/cache/conftool/dbconfig/20220210-125503-marostegui.json
  • 12:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300382)', diff saved to https://phabricator.wikimedia.org/P20522 and previous config saved to /var/cache/conftool/dbconfig/20220210-125456-marostegui.json
  • 12:50 moritzm: installing apr security updates
  • 12:49 filippo@cumin1001: START - Cookbook sre.hosts.decommission for hosts prometheus2003.codfw.wmnet
  • 12:48 Lucas_WMDE: printf '%s\n' 'https://query.wikidata.org/index.html' 'https://query.wikidata.org/embed.html' | mwscript purgeList.php # T301457 just in case
  • 12:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P20521 and previous config saved to /var/cache/conftool/dbconfig/20220210-124346-ladsgroup.json
  • 12:40 taavi: UTC morning deploys done
  • 12:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P20520 and previous config saved to /var/cache/conftool/dbconfig/20220210-123951-marostegui.json
  • 12:39 taavi@deploy1002: Synchronized logos/config.yaml: Config: banwikisource: Fix logo size (T296459) (duration: 00m 49s)
  • 12:39 taavi: purge banwikisource logos via purgeList.php T296459
  • 12:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:39 taavi@deploy1002: Synchronized wmf-config/logos.php: Config: banwikisource: Fix logo size (T296459) (duration: 00m 49s)
  • 12:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:38 taavi@deploy1002: Synchronized static/images/project-logos/: Config: banwikisource: Fix logo size (T296459) (duration: 00m 50s)
  • 12:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:34 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: InitialiseSettings: move ombudsmen.wikimedia.org to ombuds.wikimedia.org (T273323) (duration: 00m 49s)
  • 12:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:30 taavi@deploy1002: Synchronized multiversion/MWMultiVersion.php: Config: MWMultiVersion: move ombudsmen.wikimedia.org to ombuds.wikimedia.org (T273323) (duration: 00m 49s)
  • 12:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P20519 and previous config saved to /var/cache/conftool/dbconfig/20220210-122841-ladsgroup.json
  • 12:25 taavi@deploy1002: Synchronized wmf-config/MetaContactPages.php: Config: Define a contact form for Chapter/Thorg application status (T298024) (duration: 00m 50s)
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P20518 and previous config saved to /var/cache/conftool/dbconfig/20220210-122446-marostegui.json
  • 12:23 moritzm: installing pillow security updates
  • 12:18 taavi: echo "https://query.wikidata.org/" | mwscript purgeList.php # T301457
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298554)', diff saved to https://phabricator.wikimedia.org/P20517 and previous config saved to /var/cache/conftool/dbconfig/20220210-121336-ladsgroup.json
  • 12:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300382)', diff saved to https://phabricator.wikimedia.org/P20516 and previous config saved to /var/cache/conftool/dbconfig/20220210-120941-marostegui.json
  • 12:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T300382)', diff saved to https://phabricator.wikimedia.org/P20515 and previous config saved to /var/cache/conftool/dbconfig/20220210-120729-marostegui.json
  • 12:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300382)', diff saved to https://phabricator.wikimedia.org/P20514 and previous config saved to /var/cache/conftool/dbconfig/20220210-120701-marostegui.json
  • 11:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase2009.codfw.wmnet
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P20513 and previous config saved to /var/cache/conftool/dbconfig/20220210-115156-marostegui.json
  • 11:43 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase2009.codfw.wmnet
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20512 and previous config saved to /var/cache/conftool/dbconfig/20220210-114224-root.json
  • 11:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase2010.codfw.wmnet
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P20511 and previous config saved to /var/cache/conftool/dbconfig/20220210-113651-marostegui.json
  • 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts restbase2010.codfw.wmnet
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20510 and previous config saved to /var/cache/conftool/dbconfig/20220210-112720-root.json
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300382)', diff saved to https://phabricator.wikimedia.org/P20509 and previous config saved to /var/cache/conftool/dbconfig/20220210-112147-marostegui.json
  • 11:21 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync on internal
  • 11:21 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync on external
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T300382)', diff saved to https://phabricator.wikimedia.org/P20508 and previous config saved to /var/cache/conftool/dbconfig/20220210-112034-marostegui.json
  • 11:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:20 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply on staging
  • 11:20 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply on external
  • 11:20 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply on internal
  • 11:19 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync on internal
  • 11:18 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync on external
  • 11:18 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply on staging
  • 11:18 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply on external
  • 11:18 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply on internal
  • 11:17 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
  • 11:16 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 11:16 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 11:16 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 11:16 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply on staging
  • 11:16 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply on external
  • 11:16 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply on internal
  • 11:15 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on staging
  • 11:15 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 11:15 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 11:15 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 11:14 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on staging
  • 11:14 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 11:14 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 11:14 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 11:14 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync on internal
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20507 and previous config saved to /var/cache/conftool/dbconfig/20220210-111217-root.json
  • 11:11 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync on external
  • 11:10 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply on staging
  • 11:10 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply on internal
  • 11:09 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply on external
  • 11:08 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync on internal
  • 11:08 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync on external
  • 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:06 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply on staging
  • 11:06 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply on internal
  • 11:06 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply on external
  • 11:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:05 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync on staging
  • 11:04 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on external
  • 11:03 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply on internal
  • 11:03 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply on staging
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1021.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 11:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1021.eqiad.wmnet with reason: Remove from Ganeti cluster for reimage
  • 11:01 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.20/extensions/FlaggedRevs/backend/FlaggedRevs.php: Backport: Short circut updating stats when the page is not reviewable (T301433) (duration: 00m 49s)
  • 11:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298554)', diff saved to https://phabricator.wikimedia.org/P20506 and previous config saved to /var/cache/conftool/dbconfig/20220210-105853-ladsgroup.json
  • 10:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:58 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.21/extensions/FlaggedRevs/backend/FlaggedRevs.php: Backport: Short circut updating stats when the page is not reviewable (T301433) (duration: 00m 50s)
  • 10:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20505 and previous config saved to /var/cache/conftool/dbconfig/20220210-105713-root.json
  • 10:46 moritzm: installing ruby2.5 security updates
  • 10:44 arturo: deploying https://gerrit.wikimedia.org/r/c/operations/homer/public/+/761435 to core routers
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20503 and previous config saved to /var/cache/conftool/dbconfig/20220210-104208-root.json
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T300382)', diff saved to https://phabricator.wikimedia.org/P20502 and previous config saved to /var/cache/conftool/dbconfig/20220210-103324-marostegui.json
  • 10:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 10:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T300382)', diff saved to https://phabricator.wikimedia.org/P20501 and previous config saved to /var/cache/conftool/dbconfig/20220210-103317-marostegui.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P20500 and previous config saved to /var/cache/conftool/dbconfig/20220210-101812-marostegui.json
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P20499 and previous config saved to /var/cache/conftool/dbconfig/20220210-100307-marostegui.json
  • 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 09:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298554)', diff saved to https://phabricator.wikimedia.org/P20498 and previous config saved to /var/cache/conftool/dbconfig/20220210-094929-ladsgroup.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T300382)', diff saved to https://phabricator.wikimedia.org/P20497 and previous config saved to /var/cache/conftool/dbconfig/20220210-094802-marostegui.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T300382)', diff saved to https://phabricator.wikimedia.org/P20496 and previous config saved to /var/cache/conftool/dbconfig/20220210-094655-marostegui.json
  • 09:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T300382)', diff saved to https://phabricator.wikimedia.org/P20495 and previous config saved to /var/cache/conftool/dbconfig/20220210-094647-marostegui.json
  • 09:43 elukey: update pcc facts
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P20494 and previous config saved to /var/cache/conftool/dbconfig/20220210-093425-ladsgroup.json
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P20493 and previous config saved to /var/cache/conftool/dbconfig/20220210-093141-marostegui.json
  • 09:30 marostegui: Remove watchdog@10.% user from db2071 T301442
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchanges group from s1 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P20492 and previous config saved to /var/cache/conftool/dbconfig/20220210-092727-marostegui.json
  • 09:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P20491 and previous config saved to /var/cache/conftool/dbconfig/20220210-091920-ladsgroup.json
  • 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298554)', diff saved to https://phabricator.wikimedia.org/P20489 and previous config saved to /var/cache/conftool/dbconfig/20220210-090415-ladsgroup.json
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T300382)', diff saved to https://phabricator.wikimedia.org/P20488 and previous config saved to /var/cache/conftool/dbconfig/20220210-090129-marostegui.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T300382)', diff saved to https://phabricator.wikimedia.org/P20487 and previous config saved to /var/cache/conftool/dbconfig/20220210-090023-marostegui.json
  • 09:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T300382)', diff saved to https://phabricator.wikimedia.org/P20486 and previous config saved to /var/cache/conftool/dbconfig/20220210-090016-marostegui.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P20485 and previous config saved to /var/cache/conftool/dbconfig/20220210-084511-marostegui.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P20484 and previous config saved to /var/cache/conftool/dbconfig/20220210-083006-marostegui.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T300382)', diff saved to https://phabricator.wikimedia.org/P20483 and previous config saved to /var/cache/conftool/dbconfig/20220210-081501-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T300382)', diff saved to https://phabricator.wikimedia.org/P20482 and previous config saved to /var/cache/conftool/dbconfig/20220210-081354-marostegui.json
  • 08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T300382)', diff saved to https://phabricator.wikimedia.org/P20481 and previous config saved to /var/cache/conftool/dbconfig/20220210-081340-marostegui.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P20480 and previous config saved to /var/cache/conftool/dbconfig/20220210-075836-marostegui.json
  • 07:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298554)', diff saved to https://phabricator.wikimedia.org/P20479 and previous config saved to /var/cache/conftool/dbconfig/20220210-074404-ladsgroup.json
  • 07:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 07:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298554)', diff saved to https://phabricator.wikimedia.org/P20478 and previous config saved to /var/cache/conftool/dbconfig/20220210-074356-ladsgroup.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P20477 and previous config saved to /var/cache/conftool/dbconfig/20220210-074331-marostegui.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T300775)', diff saved to https://phabricator.wikimedia.org/P20476 and previous config saved to /var/cache/conftool/dbconfig/20220210-072933-marostegui.json
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 07:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300775)', diff saved to https://phabricator.wikimedia.org/P20475 and previous config saved to /var/cache/conftool/dbconfig/20220210-072925-marostegui.json
  • 07:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P20474 and previous config saved to /var/cache/conftool/dbconfig/20220210-072852-ladsgroup.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T300382)', diff saved to https://phabricator.wikimedia.org/P20473 and previous config saved to /var/cache/conftool/dbconfig/20220210-072826-marostegui.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T300382)', diff saved to https://phabricator.wikimedia.org/P20472 and previous config saved to /var/cache/conftool/dbconfig/20220210-072718-marostegui.json
  • 07:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T300382)', diff saved to https://phabricator.wikimedia.org/P20471 and previous config saved to /var/cache/conftool/dbconfig/20220210-072711-marostegui.json
  • 07:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2006.codfw.wmnet with OS bullseye
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P20470 and previous config saved to /var/cache/conftool/dbconfig/20220210-071421-marostegui.json
  • 07:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P20469 and previous config saved to /var/cache/conftool/dbconfig/20220210-071347-ladsgroup.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P20468 and previous config saved to /var/cache/conftool/dbconfig/20220210-071206-marostegui.json
  • 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1115.eqiad.wmnet with OS bullseye
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P20467 and previous config saved to /var/cache/conftool/dbconfig/20220210-065916-marostegui.json
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298554)', diff saved to https://phabricator.wikimedia.org/P20466 and previous config saved to /var/cache/conftool/dbconfig/20220210-065842-ladsgroup.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P20465 and previous config saved to /var/cache/conftool/dbconfig/20220210-065701-marostegui.json
  • 06:46 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2006.codfw.wmnet with OS bullseye
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300775)', diff saved to https://phabricator.wikimedia.org/P20464 and previous config saved to /var/cache/conftool/dbconfig/20220210-064411-marostegui.json
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T300382)', diff saved to https://phabricator.wikimedia.org/P20463 and previous config saved to /var/cache/conftool/dbconfig/20220210-064156-marostegui.json
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T300775)', diff saved to https://phabricator.wikimedia.org/P20462 and previous config saved to /var/cache/conftool/dbconfig/20220210-064149-marostegui.json
  • 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 06:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20461 and previous config saved to /var/cache/conftool/dbconfig/20220210-064059-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T300382)', diff saved to https://phabricator.wikimedia.org/P20460 and previous config saved to /var/cache/conftool/dbconfig/20220210-064049-marostegui.json
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T300382)', diff saved to https://phabricator.wikimedia.org/P20459 and previous config saved to /var/cache/conftool/dbconfig/20220210-064021-marostegui.json
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1115.eqiad.wmnet with OS bullseye
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20458 and previous config saved to /var/cache/conftool/dbconfig/20220210-062556-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P20457 and previous config saved to /var/cache/conftool/dbconfig/20220210-062517-marostegui.json
  • 06:23 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1115.eqiad.wmnet with OS bullseye
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1115.eqiad.wmnet with OS bullseye
  • 06:13 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1115.eqiad.wmnet with OS bullseye
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20456 and previous config saved to /var/cache/conftool/dbconfig/20220210-061052-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P20455 and previous config saved to /var/cache/conftool/dbconfig/20220210-061012-marostegui.json
  • 06:07 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1115.eqiad.wmnet with OS bullseye
  • 06:01 marostegui: Drop tendril database from db1115 T297605
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20454 and previous config saved to /var/cache/conftool/dbconfig/20220210-055548-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T300382)', diff saved to https://phabricator.wikimedia.org/P20453 and previous config saved to /var/cache/conftool/dbconfig/20220210-055507-marostegui.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T300382)', diff saved to https://phabricator.wikimedia.org/P20452 and previous config saved to /var/cache/conftool/dbconfig/20220210-055400-marostegui.json
  • 05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove recentchangeslinked group from s1 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P20451 and previous config saved to /var/cache/conftool/dbconfig/20220210-054911-marostegui.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20450 and previous config saved to /var/cache/conftool/dbconfig/20220210-054045-root.json
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298554)', diff saved to https://phabricator.wikimedia.org/P20449 and previous config saved to /var/cache/conftool/dbconfig/20220210-054003-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298554)', diff saved to https://phabricator.wikimedia.org/P20448 and previous config saved to /var/cache/conftool/dbconfig/20220210-053956-ladsgroup.json
  • 05:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P20447 and previous config saved to /var/cache/conftool/dbconfig/20220210-052451-ladsgroup.json
  • 05:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P20446 and previous config saved to /var/cache/conftool/dbconfig/20220210-050946-ladsgroup.json
  • 04:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298554)', diff saved to https://phabricator.wikimedia.org/P20445 and previous config saved to /var/cache/conftool/dbconfig/20220210-045442-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298554)', diff saved to https://phabricator.wikimedia.org/P20444 and previous config saved to /var/cache/conftool/dbconfig/20220210-032310-ladsgroup.json
  • 03:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 03:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 03:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298554)', diff saved to https://phabricator.wikimedia.org/P20443 and previous config saved to /var/cache/conftool/dbconfig/20220210-032303-ladsgroup.json
  • 03:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P20442 and previous config saved to /var/cache/conftool/dbconfig/20220210-030758-ladsgroup.json
  • 02:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P20441 and previous config saved to /var/cache/conftool/dbconfig/20220210-025253-ladsgroup.json
  • 02:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298554)', diff saved to https://phabricator.wikimedia.org/P20440 and previous config saved to /var/cache/conftool/dbconfig/20220210-023749-ladsgroup.json
  • 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298554)', diff saved to https://phabricator.wikimedia.org/P20439 and previous config saved to /var/cache/conftool/dbconfig/20220210-011920-ladsgroup.json
  • 01:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 01:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 00:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:37 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: jawikivoyage: Change module talk namespace from トーク to ノート (T262155) (duration: 00m 50s)
  • 00:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:19 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: jawikivoyage: Change talk namespace names from トーク to ノート (T262155) (duration: 00m 54s)
  • 00:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 00:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance

2022-02-09

  • 23:48 mutante: apt1001 - delete etherpad-lite for bullseye source package, built, uploaded and imported 1.8.16-2 in bullseye-wikimedia, now source and binary packages in APT, simulated install on etherpad1003 works T300568
  • 23:18 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts elastic[1032-1038,1040-1042,1044-1047].eqiad.wmnet
  • 23:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 23:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 23:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 23:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298554)', diff saved to https://phabricator.wikimedia.org/P20438 and previous config saved to /var/cache/conftool/dbconfig/20220209-230745-ladsgroup.json
  • 22:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P20437 and previous config saved to /var/cache/conftool/dbconfig/20220209-225240-ladsgroup.json
  • 22:50 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic[1032-1038,1040-1042,1044-1047].eqiad.wmnet
  • 22:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P20435 and previous config saved to /var/cache/conftool/dbconfig/20220209-223736-ladsgroup.json
  • 22:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298554)', diff saved to https://phabricator.wikimedia.org/P20434 and previous config saved to /var/cache/conftool/dbconfig/20220209-222231-ladsgroup.json
  • 21:51 hoo: T299422: Started Wikibase rebuildItemsPerSite in 100k page batches on mwmaint1002 for wikidatawiki. Can be killed at any time, if necessary.
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298554)', diff saved to https://phabricator.wikimedia.org/P20432 and previous config saved to /var/cache/conftool/dbconfig/20220209-205619-ladsgroup.json
  • 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298554)', diff saved to https://phabricator.wikimedia.org/P20431 and previous config saved to /var/cache/conftool/dbconfig/20220209-205606-ladsgroup.json
  • 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:48 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.21 refs T300197 (duration: 00m 51s)
  • 20:47 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.21 refs T300197
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P20430 and previous config saved to /var/cache/conftool/dbconfig/20220209-204101-ladsgroup.json
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P20429 and previous config saved to /var/cache/conftool/dbconfig/20220209-202557-ladsgroup.json
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298554)', diff saved to https://phabricator.wikimedia.org/P20428 and previous config saved to /var/cache/conftool/dbconfig/20220209-201052-ladsgroup.json
  • 19:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:45 urbanecm: UTC evening B&C window completed
  • 19:45 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.21/extensions/GrowthExperiments/includes/Specials/SpecialMentorDashboard.php: 3da81ec: Mentor dashboard: Mark mentor-tools as beta (T280307) (duration: 00m 49s)
  • 19:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:37 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.21/extensions/WikimediaEvents/: 588fa93: Track changes of growthexperiments-mentor-away-timestamp (T280307) (duration: 00m 49s)
  • 19:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.20/extensions/GrowthExperiments/: 9675848: 49202e7: Deploy M2 Mentor settings module (T280307) (duration: 00m 51s)
  • 19:33 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.20/extensions/WikimediaEvents/includes/PrefUpdateInstrumentation.php: a307ac4: Track changes of growthexperiments-mentor-away-timestamp (T280307) (duration: 00m 50s)
  • 19:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:23 urbanecm: [urbanecm@deploy1002 /srv/mediawiki-staging (master % u=)]$ rm v5.4.2\) # delete untracked file found in staging dir; created by Reedy, contains scap's logo
  • 19:09 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298554)', diff saved to https://phabricator.wikimedia.org/P20427 and previous config saved to /var/cache/conftool/dbconfig/20220209-184430-ladsgroup.json
  • 18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298554)', diff saved to https://phabricator.wikimedia.org/P20426 and previous config saved to /var/cache/conftool/dbconfig/20220209-184423-ladsgroup.json
  • 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P20425 and previous config saved to /var/cache/conftool/dbconfig/20220209-182918-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P20424 and previous config saved to /var/cache/conftool/dbconfig/20220209-181413-ladsgroup.json
  • 18:00 elukey: copy calico debs from buster-wikimedia's component/calico-future to bullseye-wikimedia component/calico317
  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298554)', diff saved to https://phabricator.wikimedia.org/P20423 and previous config saved to /var/cache/conftool/dbconfig/20220209-175909-ladsgroup.json
  • 17:37 joal@deploy1002: Finished deploy [analytics/refinery@55b229b] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@55b229b] (duration: 07m 04s)
  • 17:34 elukey: upload rsyslog 8.2102.0-2+deb11u1+wmf1 packages to bullseye-wikimedia component/rsyslog-k8s
  • 17:30 joal@deploy1002: Started deploy [analytics/refinery@55b229b] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@55b229b]
  • 17:30 joal@deploy1002: Finished deploy [analytics/refinery@55b229b] (thin): Regular analytics weekly train THIN [analytics/refinery@55b229b] (duration: 00m 07s)
  • 17:30 joal@deploy1002: Started deploy [analytics/refinery@55b229b] (thin): Regular analytics weekly train THIN [analytics/refinery@55b229b]
  • 17:27 joal@deploy1002: Finished deploy [analytics/refinery@55b229b]: Regular analytics weekly train [analytics/refinery@55b229b] (duration: 22m 00s)
  • 17:07 jayme: ran sudo rm /var/run/confd-template/.k8s-ingress-staging*.err on puppetmaster1001 - T300740
  • 17:05 joal@deploy1002: Started deploy [analytics/refinery@55b229b]: Regular analytics weekly train [analytics/refinery@55b229b]
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298554)', diff saved to https://phabricator.wikimedia.org/P20422 and previous config saved to /var/cache/conftool/dbconfig/20220209-163102-ladsgroup.json
  • 16:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:21 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-staging,name=eqiad
  • 16:17 otto@deploy1002: Finished deploy [airflow-dags/analytics_test@ddd10b4]: (no justification provided) (duration: 00m 03s)
  • 16:17 otto@deploy1002: Started deploy [airflow-dags/analytics_test@ddd10b4]: (no justification provided)
  • 16:16 otto@deploy1002: Finished deploy [airflow-dags/analytics_test@ddd10b4]: (no justification provided) (duration: 00m 20s)
  • 16:16 otto@deploy1002: Started deploy [airflow-dags/analytics_test@ddd10b4]: (no justification provided)
  • 15:57 jayme: ran sudo rm /var/run/confd-template/.k8s-ingress-staging*.err on puppetmaster2001 - T300740
  • 15:56 jayme: restarting pybal on lvs1015,lvs2009 - T300740
  • 15:44 jbond: change puppet hiera prefernce site vs site/role gerrit:761339
  • 15:43 jayme@cumin1001: conftool action : set/pooled=yes:weight=10; selector: cluster=kubernetes-staging,service=kubesvc
  • 15:31 jayme: restarting pybal on lvs2010,lvs1020 - T300740
  • 15:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298554)', diff saved to https://phabricator.wikimedia.org/P20420 and previous config saved to /var/cache/conftool/dbconfig/20220209-152522-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20419 and previous config saved to /var/cache/conftool/dbconfig/20220209-151017-ladsgroup.json
  • 15:06 moritzm: imported jenkins 2.319.3 to thirdparty/ci T301361
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20418 and previous config saved to /var/cache/conftool/dbconfig/20220209-145513-ladsgroup.json
  • 14:43 ema: prometheus: remove atskafka target files - '/srv/prometheus/ops/targets/atskafka_*' T247497
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298554)', diff saved to https://phabricator.wikimedia.org/P20416 and previous config saved to /var/cache/conftool/dbconfig/20220209-144008-ladsgroup.json
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T300510)', diff saved to https://phabricator.wikimedia.org/P20415 and previous config saved to /var/cache/conftool/dbconfig/20220209-143642-ladsgroup.json
  • 14:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2126.codfw.wmnet with OS bullseye
  • 14:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 14:22 reedy@deploy1002: Finished scap: Downgrading symfony/console (v5.4.3 => v5.4.2) T301320 (duration: 01m 31s)
  • 14:20 reedy@deploy1002: Started scap: Downgrading symfony/console (v5.4.3 => v5.4.2) T301320
  • 13:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2126.codfw.wmnet with OS bullseye
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T300510)', diff saved to https://phabricator.wikimedia.org/P20414 and previous config saved to /var/cache/conftool/dbconfig/20220209-135515-ladsgroup.json
  • 13:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 13:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 13:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Migrate to bullseye (T300510)
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Migrate to bullseye (T300510)
  • 13:48 jelto: update scap to 4.3.1 on all hosts - T301307
  • 13:38 reedy@deploy1002: Finished scap: Downgrading symfony/console \(v5.4.3 => v5.4.2\) T301320 (duration: 01m 34s)
  • 13:36 reedy@deploy1002: Started scap: Downgrading symfony/console \(v5.4.3 => v5.4.2\) T301320
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298554)', diff saved to https://phabricator.wikimedia.org/P20412 and previous config saved to /var/cache/conftool/dbconfig/20220209-131938-ladsgroup.json
  • 13:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:41 Lucas_WMDE: UTC morning backport+config window done
  • 12:40 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: sawikisource: Add audio book namespace (T282970) (duration: 00m 50s)
  • 12:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:14 lucaswerkmeister-wmde@deploy1002: Synchronized multiversion/MWRealm.php: Config: Stop writing to $wmfRealm (T45956) (3/3) (duration: 00m 49s)
  • 12:13 lucaswerkmeister-wmde@deploy1002: Synchronized multiversion/buildConfigCache.php: Config: Stop writing to $wmfRealm (T45956) (2/3) (duration: 00m 49s)
  • 12:11 lucaswerkmeister-wmde@deploy1002: Synchronized tests/loggingTest.php: Config: Stop writing to $wmfRealm (T45956) (1/3) (duration: 01m 38s)
  • 12:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T300775)', diff saved to https://phabricator.wikimedia.org/P20411 and previous config saved to /var/cache/conftool/dbconfig/20220209-112029-marostegui.json
  • 11:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 11:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 11:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-fe[2005-2008].codfw.wmnet
  • 10:50 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-fe[2005-2008].codfw.wmnet
  • 10:45 akosiaris: T300568 upload prometheus-etherpad-exporter_0.5_amd64 to apt.wikimedia.org bullseye-wikimedia/main
  • 10:35 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main
  • 10:34 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main
  • 10:34 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main
  • 10:32 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main
  • 10:25 jelto@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 22s)
  • 10:25 jelto@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 10:20 jelto: update scap to 4.3.1 on A:restbase-canary - T301307
  • 10:17 jelto: update scap to 4.3.1 on A:mw-canary or A:parsoid-canary or A:mw-jobrunner-canary - T301307
  • 10:16 ariel@deploy1002: Finished deploy [dumps/dumps@9993036]: fix up default api jobs entry for siteinfo v2 (duration: 00m 03s)
  • 10:15 ariel@deploy1002: Started deploy [dumps/dumps@9993036]: fix up default api jobs entry for siteinfo v2
  • 10:15 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ms-fe[2005-2008].codfw.wmnet
  • 10:14 volans: uploaded python3-wmflib_1.0.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 10:11 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-fe[2005-2008].codfw.wmnet
  • 10:03 akosiaris: T300568 upload prometheus-etherpad-exporter_0.4_amd64 to apt.wikimedia.org bullseye-wikimedia/main
  • 10:02 Emperor: rolling restart of swift frontends T301251
  • 09:46 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:45 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:45 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:45 elukey: update my ssh key on all network devices (will commit only when the diff is my key only)
  • 09:44 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:41 ema: cp3050: stop and disable atskafka-webrequest.service T247497
  • 09:15 ema: cp3050: ats-backend-restart to set the number of allowed Lua states back from 64 to 256 (default) T265625
  • 08:21 dcausse: restarting blazegraph on wdqs1004 (jvm stuck for 5hours)
  • 07:55 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 07:42 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove logpager group from s1 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P20410 and previous config saved to /var/cache/conftool/dbconfig/20220209-073528-marostegui.json
  • 04:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 04:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 03:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 03:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298554)', diff saved to https://phabricator.wikimedia.org/P20407 and previous config saved to /var/cache/conftool/dbconfig/20220209-034800-ladsgroup.json
  • 03:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P20406 and previous config saved to /var/cache/conftool/dbconfig/20220209-033255-ladsgroup.json
  • 03:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P20405 and previous config saved to /var/cache/conftool/dbconfig/20220209-031750-ladsgroup.json
  • 03:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298554)', diff saved to https://phabricator.wikimedia.org/P20404 and previous config saved to /var/cache/conftool/dbconfig/20220209-030245-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T298554)', diff saved to https://phabricator.wikimedia.org/P20403 and previous config saved to /var/cache/conftool/dbconfig/20220209-023446-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 02:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 11 hosts with reason: Maintenance
  • 02:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 11 hosts with reason: Maintenance
  • 02:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 02:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance

2022-02-08

  • 23:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2055.codfw.wmnet with OS buster
  • 23:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2054.codfw.wmnet with OS buster
  • 23:22 tzatziki: removing 1 file for legal compliance
  • 23:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2055.codfw.wmnet with OS buster
  • 23:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2053.codfw.wmnet with OS buster
  • 23:17 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2054.codfw.wmnet with OS buster
  • 23:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2052.codfw.wmnet with OS buster
  • 22:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2053.codfw.wmnet with OS buster
  • 22:44 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: sync on main
  • 22:42 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply on main
  • 22:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2052.codfw.wmnet with OS buster
  • 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300402)', diff saved to https://phabricator.wikimedia.org/P20402 and previous config saved to /var/cache/conftool/dbconfig/20220208-221545-marostegui.json
  • 22:12 topranks: doing planned 1-by-1 shutdown of ports xe-0/1/1, xe-0/1/2 and xe-0/1/9 on cr2-esams, to test reliability of each following user reports of issues at AMS-IX.
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20401 and previous config saved to /var/cache/conftool/dbconfig/20220208-220041-marostegui.json
  • 21:59 ryankemper: T294805 elastic10[68-83] erroneously weren't in pybal, added them just now: `sudo confctl select 'cluster=elasticsearch' set/pooled=yes:weight=10` (there's no hosts in the `conftool-data` list that we want depooled so we're okay setting all to pooled w/ equal weight)
  • 21:59 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: cluster=elasticsearch
  • 21:58 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: cluster=elasticsearch,name=elastic1*
  • 21:53 ryankemper@puppetmaster1001: conftool action : GET; selector: service=search
  • 21:52 ryankemper@puppetmaster1001: conftool action : GET; selector: service=search
  • 21:47 ryankemper: [Elastic] `ryankemper@elastic1081:~$ sudo systemctl restart elasticsearch_6*psi*` (9600 but not 9200 seemed to be having connectivity issues)
  • 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P20400 and previous config saved to /var/cache/conftool/dbconfig/20220208-214536-marostegui.json
  • 21:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300402)', diff saved to https://phabricator.wikimedia.org/P20399 and previous config saved to /var/cache/conftool/dbconfig/20220208-213031-marostegui.json
  • 21:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T300402)', diff saved to https://phabricator.wikimedia.org/P20398 and previous config saved to /var/cache/conftool/dbconfig/20220208-212558-marostegui.json
  • 21:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 21:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 21:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300402)', diff saved to https://phabricator.wikimedia.org/P20397 and previous config saved to /var/cache/conftool/dbconfig/20220208-212550-marostegui.json
  • 21:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P20396 and previous config saved to /var/cache/conftool/dbconfig/20220208-211046-marostegui.json
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P20395 and previous config saved to /var/cache/conftool/dbconfig/20220208-205541-marostegui.json
  • 20:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 20:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 20:52 jhuneidi@deploy1002: Finished scap: sync again in attempt to deploy 1.38.0-wmf.21 to group0 (duration: 16m 17s)
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2051.codfw.wmnet with OS buster
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300402)', diff saved to https://phabricator.wikimedia.org/P20394 and previous config saved to /var/cache/conftool/dbconfig/20220208-204036-marostegui.json
  • 20:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298554)', diff saved to https://phabricator.wikimedia.org/P20393 and previous config saved to /var/cache/conftool/dbconfig/20220208-203634-ladsgroup.json
  • 20:36 jhuneidi@deploy1002: Started scap: sync again in attempt to deploy 1.38.0-wmf.21 to group0
  • 20:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T300402)', diff saved to https://phabricator.wikimedia.org/P20392 and previous config saved to /var/cache/conftool/dbconfig/20220208-203529-marostegui.json
  • 20:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 20:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 20:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300402)', diff saved to https://phabricator.wikimedia.org/P20391 and previous config saved to /var/cache/conftool/dbconfig/20220208-203521-marostegui.json
  • 20:33 ryankemper: T294805 Banned `elastic10[32-47]` from main, omega, and psi elasticsearch clusters. Shards are relocating on main and omega clusters as expected, but they don't seem to be moving on psi. Investigating that currently. Might have to do with row allocation constraints, but unsure currently
  • 20:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2050.codfw.wmnet with OS buster
  • 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P20390 and previous config saved to /var/cache/conftool/dbconfig/20220208-202127-ladsgroup.json
  • 20:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P20389 and previous config saved to /var/cache/conftool/dbconfig/20220208-202016-marostegui.json
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:17 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.21 refs T300197
  • 20:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2051.codfw.wmnet with OS buster
  • 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P20388 and previous config saved to /var/cache/conftool/dbconfig/20220208-200621-ladsgroup.json
  • 20:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P20387 and previous config saved to /var/cache/conftool/dbconfig/20220208-200512-marostegui.json
  • 20:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2049.codfw.wmnet with OS buster
  • 19:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2050.codfw.wmnet with OS buster
  • 19:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2048.codfw.wmnet with OS buster
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298554)', diff saved to https://phabricator.wikimedia.org/P20386 and previous config saved to /var/cache/conftool/dbconfig/20220208-195115-ladsgroup.json
  • 19:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300402)', diff saved to https://phabricator.wikimedia.org/P20385 and previous config saved to /var/cache/conftool/dbconfig/20220208-195007-marostegui.json
  • 19:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T300402)', diff saved to https://phabricator.wikimedia.org/P20384 and previous config saved to /var/cache/conftool/dbconfig/20220208-194528-marostegui.json
  • 19:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 19:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 19:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300402)', diff saved to https://phabricator.wikimedia.org/P20383 and previous config saved to /var/cache/conftool/dbconfig/20220208-194520-marostegui.json
  • 19:32 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2049.codfw.wmnet with OS buster
  • 19:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P20382 and previous config saved to /var/cache/conftool/dbconfig/20220208-193016-marostegui.json
  • 19:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2047.codfw.wmnet with OS buster
  • 19:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2048.codfw.wmnet with OS buster
  • 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2046.codfw.wmnet with OS buster
  • 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T298554)', diff saved to https://phabricator.wikimedia.org/P20381 and previous config saved to /var/cache/conftool/dbconfig/20220208-192055-ladsgroup.json
  • 19:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 19:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298554)', diff saved to https://phabricator.wikimedia.org/P20380 and previous config saved to /var/cache/conftool/dbconfig/20220208-192047-ladsgroup.json
  • 19:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P20379 and previous config saved to /var/cache/conftool/dbconfig/20220208-191511-marostegui.json
  • 19:12 jhuneidi@deploy1002: Pruned MediaWiki: 1.38.0-wmf.19 (duration: 03m 12s)
  • 19:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 19:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:09 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.21 refs T300197 (duration: 39m 34s)
  • 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P20378 and previous config saved to /var/cache/conftool/dbconfig/20220208-190542-ladsgroup.json
  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300402)', diff saved to https://phabricator.wikimedia.org/P20377 and previous config saved to /var/cache/conftool/dbconfig/20220208-190006-marostegui.json
  • 18:58 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@49ba844]: query_clicks: resolve parse error in comment (duration: 02m 02s)
  • 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:56 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@49ba844]: query_clicks: resolve parse error in comment
  • 18:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2047.codfw.wmnet with OS buster
  • 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T300402)', diff saved to https://phabricator.wikimedia.org/P20376 and previous config saved to /var/cache/conftool/dbconfig/20220208-185420-marostegui.json
  • 18:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 18:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 18:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2046.codfw.wmnet with OS buster
  • 18:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2045.codfw.wmnet with OS buster
  • 18:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 18:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2044.codfw.wmnet with OS buster
  • 18:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 18:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P20375 and previous config saved to /var/cache/conftool/dbconfig/20220208-185037-ladsgroup.json
  • 18:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300402)', diff saved to https://phabricator.wikimedia.org/P20374 and previous config saved to /var/cache/conftool/dbconfig/20220208-184832-marostegui.json
  • 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298554)', diff saved to https://phabricator.wikimedia.org/P20373 and previous config saved to /var/cache/conftool/dbconfig/20220208-183532-ladsgroup.json
  • 18:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P20372 and previous config saved to /var/cache/conftool/dbconfig/20220208-183328-marostegui.json
  • 18:29 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.21 refs T300197
  • 18:22 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@ceff02f]: query_clicks: adjust start_date and catchup (duration: 02m 03s)
  • 18:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2045.codfw.wmnet with OS buster
  • 18:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2044.codfw.wmnet with OS buster
  • 18:20 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@ceff02f]: query_clicks: adjust start_date and catchup
  • 18:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P20371 and previous config saved to /var/cache/conftool/dbconfig/20220208-181823-marostegui.json
  • 18:13 moritzm: installing expat security updates
  • 18:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2043.codfw.wmnet with OS buster
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T298554)', diff saved to https://phabricator.wikimedia.org/P20370 and previous config saved to /var/cache/conftool/dbconfig/20220208-180810-ladsgroup.json
  • 18:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 18:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298554)', diff saved to https://phabricator.wikimedia.org/P20369 and previous config saved to /var/cache/conftool/dbconfig/20220208-180803-ladsgroup.json
  • 18:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300402)', diff saved to https://phabricator.wikimedia.org/P20368 and previous config saved to /var/cache/conftool/dbconfig/20220208-180316-marostegui.json
  • 17:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2042.codfw.wmnet with OS buster
  • 17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T300402)', diff saved to https://phabricator.wikimedia.org/P20367 and previous config saved to /var/cache/conftool/dbconfig/20220208-175844-marostegui.json
  • 17:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 17:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300402)', diff saved to https://phabricator.wikimedia.org/P20366 and previous config saved to /var/cache/conftool/dbconfig/20220208-175837-marostegui.json
  • 17:58 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@79cb98e]: move query clicks from oozie to airflow (duration: 02m 01s)
  • 17:56 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp4031.ulsfo.wmnet
  • 17:56 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@79cb98e]: move query clicks from oozie to airflow
  • 17:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P20365 and previous config saved to /var/cache/conftool/dbconfig/20220208-175258-ladsgroup.json
  • 17:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P20364 and previous config saved to /var/cache/conftool/dbconfig/20220208-174332-marostegui.json
  • 17:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2043.codfw.wmnet with OS buster
  • 17:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2041.codfw.wmnet with OS buster
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P20363 and previous config saved to /var/cache/conftool/dbconfig/20220208-173753-ladsgroup.json
  • 17:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 8 hosts with reason: Maintenance
  • 17:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 8 hosts with reason: Maintenance
  • 17:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 17:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T300775)', diff saved to https://phabricator.wikimedia.org/P20362 and previous config saved to /var/cache/conftool/dbconfig/20220208-173611-marostegui.json
  • 17:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2042.codfw.wmnet with OS buster
  • 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P20361 and previous config saved to /var/cache/conftool/dbconfig/20220208-172827-marostegui.json
  • 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2040.codfw.wmnet with OS buster
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298554)', diff saved to https://phabricator.wikimedia.org/P20360 and previous config saved to /var/cache/conftool/dbconfig/20220208-172248-ladsgroup.json
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P20359 and previous config saved to /var/cache/conftool/dbconfig/20220208-172106-marostegui.json
  • 17:17 rzl: rzl@cumin1001:~$ sudo cumin A:mw "enable-puppet T273323"
  • 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300402)', diff saved to https://phabricator.wikimedia.org/P20358 and previous config saved to /var/cache/conftool/dbconfig/20220208-171323-marostegui.json
  • 17:11 rzl: rzl@cumin1001:~$ sudo cumin A:mw "disable-puppet T273323"
  • 17:11 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@88cdfdc]: Deploy rdf-streaming-updater reconcilliation job (duration: 02m 01s)
  • 17:09 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@88cdfdc]: Deploy rdf-streaming-updater reconcilliation job
  • 17:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2041.codfw.wmnet with OS buster
  • 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T300402)', diff saved to https://phabricator.wikimedia.org/P20357 and previous config saved to /var/cache/conftool/dbconfig/20220208-170812-marostegui.json
  • 17:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 17:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300402)', diff saved to https://phabricator.wikimedia.org/P20356 and previous config saved to /var/cache/conftool/dbconfig/20220208-170805-marostegui.json
  • 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2039.codfw.wmnet with OS buster
  • 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P20355 and previous config saved to /var/cache/conftool/dbconfig/20220208-170601-marostegui.json
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T298554)', diff saved to https://phabricator.wikimedia.org/P20354 and previous config saved to /var/cache/conftool/dbconfig/20220208-165445-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298554)', diff saved to https://phabricator.wikimedia.org/P20353 and previous config saved to /var/cache/conftool/dbconfig/20220208-165436-ladsgroup.json
  • 16:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2040.codfw.wmnet with OS buster
  • 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P20352 and previous config saved to /var/cache/conftool/dbconfig/20220208-165300-marostegui.json
  • 16:51 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mc2040.codfw.wmnet with OS buster
  • 16:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2040.codfw.wmnet with OS buster
  • 16:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T300775)', diff saved to https://phabricator.wikimedia.org/P20351 and previous config saved to /var/cache/conftool/dbconfig/20220208-165057-marostegui.json
  • 16:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 16:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc2038.codfw.wmnet with OS buster
  • 16:45 dancy@deploy1002: Synchronized multiversion/MWMultiVersion.php: Config: Choose wikiversions.php file relative to MWMultiVersion.php (revived) (duration: 00m 49s)
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P20350 and previous config saved to /var/cache/conftool/dbconfig/20220208-163932-ladsgroup.json
  • 16:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P20349 and previous config saved to /var/cache/conftool/dbconfig/20220208-163755-marostegui.json
  • 16:37 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:37 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2039.codfw.wmnet with OS buster
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P20348 and previous config saved to /var/cache/conftool/dbconfig/20220208-162427-ladsgroup.json
  • 16:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300402)', diff saved to https://phabricator.wikimedia.org/P20347 and previous config saved to /var/cache/conftool/dbconfig/20220208-162250-marostegui.json
  • 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T300402)', diff saved to https://phabricator.wikimedia.org/P20346 and previous config saved to /var/cache/conftool/dbconfig/20220208-161812-marostegui.json
  • 16:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 16:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300402)', diff saved to https://phabricator.wikimedia.org/P20345 and previous config saved to /var/cache/conftool/dbconfig/20220208-161805-marostegui.json
  • 16:16 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc2038.codfw.wmnet with OS buster
  • 16:13 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298554)', diff saved to https://phabricator.wikimedia.org/P20344 and previous config saved to /var/cache/conftool/dbconfig/20220208-160922-ladsgroup.json
  • 16:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P20343 and previous config saved to /var/cache/conftool/dbconfig/20220208-160300-marostegui.json
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P20342 and previous config saved to /var/cache/conftool/dbconfig/20220208-154755-marostegui.json
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T298554)', diff saved to https://phabricator.wikimedia.org/P20341 and previous config saved to /var/cache/conftool/dbconfig/20220208-154049-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298554)', diff saved to https://phabricator.wikimedia.org/P20340 and previous config saved to /var/cache/conftool/dbconfig/20220208-154042-ladsgroup.json
  • 15:33 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 15:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 15:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300402)', diff saved to https://phabricator.wikimedia.org/P20339 and previous config saved to /var/cache/conftool/dbconfig/20220208-153251-marostegui.json
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T300402)', diff saved to https://phabricator.wikimedia.org/P20338 and previous config saved to /var/cache/conftool/dbconfig/20220208-152812-marostegui.json
  • 15:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 15:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 15:27 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P20337 and previous config saved to /var/cache/conftool/dbconfig/20220208-152536-ladsgroup.json
  • 15:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 15:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300402)', diff saved to https://phabricator.wikimedia.org/P20336 and previous config saved to /var/cache/conftool/dbconfig/20220208-152525-marostegui.json
  • 15:18 Emperor: depooling ms-fe200[5-8] T301251
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P20335 and previous config saved to /var/cache/conftool/dbconfig/20220208-151032-ladsgroup.json
  • 15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P20334 and previous config saved to /var/cache/conftool/dbconfig/20220208-151020-marostegui.json
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T300775)', diff saved to https://phabricator.wikimedia.org/P20333 and previous config saved to /var/cache/conftool/dbconfig/20220208-145731-marostegui.json
  • 14:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 14:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T300775)', diff saved to https://phabricator.wikimedia.org/P20332 and previous config saved to /var/cache/conftool/dbconfig/20220208-145724-marostegui.json
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298554)', diff saved to https://phabricator.wikimedia.org/P20331 and previous config saved to /var/cache/conftool/dbconfig/20220208-145527-ladsgroup.json
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P20330 and previous config saved to /var/cache/conftool/dbconfig/20220208-145516-marostegui.json
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P20329 and previous config saved to /var/cache/conftool/dbconfig/20220208-144219-marostegui.json
  • 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300402)', diff saved to https://phabricator.wikimedia.org/P20328 and previous config saved to /var/cache/conftool/dbconfig/20220208-144011-marostegui.json
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T300402)', diff saved to https://phabricator.wikimedia.org/P20327 and previous config saved to /var/cache/conftool/dbconfig/20220208-143545-marostegui.json
  • 14:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 14:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 14:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 14:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 14:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300402)', diff saved to https://phabricator.wikimedia.org/P20326 and previous config saved to /var/cache/conftool/dbconfig/20220208-143302-marostegui.json
  • 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T298554)', diff saved to https://phabricator.wikimedia.org/P20325 and previous config saved to /var/cache/conftool/dbconfig/20220208-142815-ladsgroup.json
  • 14:28 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 14:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 14:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298554)', diff saved to https://phabricator.wikimedia.org/P20324 and previous config saved to /var/cache/conftool/dbconfig/20220208-142808-ladsgroup.json
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P20323 and previous config saved to /var/cache/conftool/dbconfig/20220208-142714-marostegui.json
  • 14:26 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2001.codfw.wmnet with OS bullseye
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P20322 and previous config saved to /var/cache/conftool/dbconfig/20220208-141757-marostegui.json
  • 14:17 godog: update PERC firmware on thanos-be2001 - T288937
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P20321 and previous config saved to /var/cache/conftool/dbconfig/20220208-141303-ladsgroup.json
  • 14:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T300775)', diff saved to https://phabricator.wikimedia.org/P20320 and previous config saved to /var/cache/conftool/dbconfig/20220208-141210-marostegui.json
  • 14:07 godog: update NIC firmware on thanos-be2001 - T288937
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P20319 and previous config saved to /var/cache/conftool/dbconfig/20220208-140252-marostegui.json
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P20318 and previous config saved to /var/cache/conftool/dbconfig/20220208-135758-ladsgroup.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300402)', diff saved to https://phabricator.wikimedia.org/P20317 and previous config saved to /var/cache/conftool/dbconfig/20220208-134748-marostegui.json
  • 13:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 13:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T300402)', diff saved to https://phabricator.wikimedia.org/P20316 and previous config saved to /var/cache/conftool/dbconfig/20220208-134324-marostegui.json
  • 13:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298554)', diff saved to https://phabricator.wikimedia.org/P20315 and previous config saved to /var/cache/conftool/dbconfig/20220208-134254-ladsgroup.json
  • 13:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300402)', diff saved to https://phabricator.wikimedia.org/P20314 and previous config saved to /var/cache/conftool/dbconfig/20220208-134022-marostegui.json
  • 13:37 moritzm: migrating instances off ganeti1021
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T300775)', diff saved to https://phabricator.wikimedia.org/P20313 and previous config saved to /var/cache/conftool/dbconfig/20220208-133558-marostegui.json
  • 13:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 13:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T300775)', diff saved to https://phabricator.wikimedia.org/P20312 and previous config saved to /var/cache/conftool/dbconfig/20220208-133550-marostegui.json
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P20310 and previous config saved to /var/cache/conftool/dbconfig/20220208-132517-marostegui.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P20309 and previous config saved to /var/cache/conftool/dbconfig/20220208-132045-marostegui.json
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T298554)', diff saved to https://phabricator.wikimedia.org/P20308 and previous config saved to /var/cache/conftool/dbconfig/20220208-131430-ladsgroup.json
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300510)', diff saved to https://phabricator.wikimedia.org/P20307 and previous config saved to /var/cache/conftool/dbconfig/20220208-131427-ladsgroup.json
  • 13:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 13:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298554)', diff saved to https://phabricator.wikimedia.org/P20306 and previous config saved to /var/cache/conftool/dbconfig/20220208-131319-ladsgroup.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P20305 and previous config saved to /var/cache/conftool/dbconfig/20220208-131012-marostegui.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P20304 and previous config saved to /var/cache/conftool/dbconfig/20220208-130541-marostegui.json
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P20303 and previous config saved to /var/cache/conftool/dbconfig/20220208-125922-ladsgroup.json
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P20302 and previous config saved to /var/cache/conftool/dbconfig/20220208-125814-ladsgroup.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300402)', diff saved to https://phabricator.wikimedia.org/P20301 and previous config saved to /var/cache/conftool/dbconfig/20220208-125508-marostegui.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T300775)', diff saved to https://phabricator.wikimedia.org/P20300 and previous config saved to /var/cache/conftool/dbconfig/20220208-125036-marostegui.json
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P20299 and previous config saved to /var/cache/conftool/dbconfig/20220208-124418-ladsgroup.json
  • 12:43 Amir1: shut down dbmonitor1002 (T297605)
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P20298 and previous config saved to /var/cache/conftool/dbconfig/20220208-124309-ladsgroup.json
  • 12:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on dbmonitor1002.wikimedia.org with reason: Host will be shutdown in a week (T297605)
  • 12:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on dbmonitor1002.wikimedia.org with reason: Host will be shutdown in a week (T297605)
  • 12:37 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-be2001.codfw.wmnet with OS bullseye
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300510)', diff saved to https://phabricator.wikimedia.org/P20297 and previous config saved to /var/cache/conftool/dbconfig/20220208-122913-ladsgroup.json
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298554)', diff saved to https://phabricator.wikimedia.org/P20296 and previous config saved to /var/cache/conftool/dbconfig/20220208-122805-ladsgroup.json
  • 12:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1011.eqiad.wmnet with OS buster
  • 12:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1182.eqiad.wmnet with OS bullseye
  • 12:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2010.codfw.wmnet with reason: Decommissioning
  • 12:19 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2010.codfw.wmnet with reason: Decommissioning
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T300775)', diff saved to https://phabricator.wikimedia.org/P20295 and previous config saved to /var/cache/conftool/dbconfig/20220208-121430-marostegui.json
  • 12:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T300775)', diff saved to https://phabricator.wikimedia.org/P20294 and previous config saved to /var/cache/conftool/dbconfig/20220208-121422-marostegui.json
  • 12:11 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2010.wmnet
  • 12:11 hnowlan: Running c-foreach-nt decommission on restbase2010 in advance of decommissioning
  • 12:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T300402)', diff saved to https://phabricator.wikimedia.org/P20293 and previous config saved to /var/cache/conftool/dbconfig/20220208-120603-marostegui.json
  • 12:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300402)', diff saved to https://phabricator.wikimedia.org/P20292 and previous config saved to /var/cache/conftool/dbconfig/20220208-120556-marostegui.json
  • 12:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d9902a4: cowikimedia: Let admins grant confirmed and accountcreator flags (T300948) (duration: 00m 50s)
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T298554)', diff saved to https://phabricator.wikimedia.org/P20291 and previous config saved to /var/cache/conftool/dbconfig/20220208-120102-ladsgroup.json
  • 12:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298554)', diff saved to https://phabricator.wikimedia.org/P20290 and previous config saved to /var/cache/conftool/dbconfig/20220208-120054-ladsgroup.json
  • 11:59 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1011.eqiad.wmnet with OS buster
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P20289 and previous config saved to /var/cache/conftool/dbconfig/20220208-115918-marostegui.json
  • 11:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2019.wmnet
  • 11:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2020.wmnet
  • 11:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2019.codfw.wmnet with OS buster
  • 11:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1182.eqiad.wmnet with OS bullseye
  • 11:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2020.codfw.wmnet with OS buster
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P20288 and previous config saved to /var/cache/conftool/dbconfig/20220208-115051-marostegui.json
  • 11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T300510)', diff saved to https://phabricator.wikimedia.org/P20287 and previous config saved to /var/cache/conftool/dbconfig/20220208-114639-ladsgroup.json
  • 11:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P20286 and previous config saved to /var/cache/conftool/dbconfig/20220208-114549-ladsgroup.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P20285 and previous config saved to /var/cache/conftool/dbconfig/20220208-114413-marostegui.json
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300510)', diff saved to https://phabricator.wikimedia.org/P20284 and previous config saved to /var/cache/conftool/dbconfig/20220208-113910-ladsgroup.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P20283 and previous config saved to /var/cache/conftool/dbconfig/20220208-113547-marostegui.json
  • 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P20282 and previous config saved to /var/cache/conftool/dbconfig/20220208-113045-ladsgroup.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T300775)', diff saved to https://phabricator.wikimedia.org/P20281 and previous config saved to /var/cache/conftool/dbconfig/20220208-112909-marostegui.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P20280 and previous config saved to /var/cache/conftool/dbconfig/20220208-112406-ladsgroup.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300402)', diff saved to https://phabricator.wikimedia.org/P20279 and previous config saved to /var/cache/conftool/dbconfig/20220208-112042-marostegui.json
  • 11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298554)', diff saved to https://phabricator.wikimedia.org/P20278 and previous config saved to /var/cache/conftool/dbconfig/20220208-111540-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P20277 and previous config saved to /var/cache/conftool/dbconfig/20220208-110901-ladsgroup.json
  • 11:06 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2020.codfw.wmnet with OS buster
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T300402)', diff saved to https://phabricator.wikimedia.org/P20276 and previous config saved to /var/cache/conftool/dbconfig/20220208-110154-marostegui.json
  • 11:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 11:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300402)', diff saved to https://phabricator.wikimedia.org/P20275 and previous config saved to /var/cache/conftool/dbconfig/20220208-110147-marostegui.json
  • 10:59 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2019.codfw.wmnet with OS buster
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T300775)', diff saved to https://phabricator.wikimedia.org/P20274 and previous config saved to /var/cache/conftool/dbconfig/20220208-105453-marostegui.json
  • 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T300775)', diff saved to https://phabricator.wikimedia.org/P20273 and previous config saved to /var/cache/conftool/dbconfig/20220208-105440-marostegui.json
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300510)', diff saved to https://phabricator.wikimedia.org/P20272 and previous config saved to /var/cache/conftool/dbconfig/20220208-105356-ladsgroup.json
  • 10:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1162.eqiad.wmnet with OS bullseye
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P20271 and previous config saved to /var/cache/conftool/dbconfig/20220208-104642-marostegui.json
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T298554)', diff saved to https://phabricator.wikimedia.org/P20270 and previous config saved to /var/cache/conftool/dbconfig/20220208-104421-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 10:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298554)', diff saved to https://phabricator.wikimedia.org/P20269 and previous config saved to /var/cache/conftool/dbconfig/20220208-104414-ladsgroup.json
  • 10:43 elukey: update pcc facts
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P20268 and previous config saved to /var/cache/conftool/dbconfig/20220208-103935-marostegui.json
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P20267 and previous config saved to /var/cache/conftool/dbconfig/20220208-103137-marostegui.json
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P20266 and previous config saved to /var/cache/conftool/dbconfig/20220208-102909-ladsgroup.json
  • 10:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P20265 and previous config saved to /var/cache/conftool/dbconfig/20220208-102430-marostegui.json
  • 10:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1162.eqiad.wmnet with OS bullseye
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300402)', diff saved to https://phabricator.wikimedia.org/P20264 and previous config saved to /var/cache/conftool/dbconfig/20220208-101631-marostegui.json
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P20263 and previous config saved to /var/cache/conftool/dbconfig/20220208-101404-ladsgroup.json
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T300510)', diff saved to https://phabricator.wikimedia.org/P20262 and previous config saved to /var/cache/conftool/dbconfig/20220208-101238-ladsgroup.json
  • 10:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 10:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 10:09 jayme: updates scap to 4.3.0 on all hosts - T300804
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T300775)', diff saved to https://phabricator.wikimedia.org/P20261 and previous config saved to /var/cache/conftool/dbconfig/20220208-100926-marostegui.json
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T300775)', diff saved to https://phabricator.wikimedia.org/P20260 and previous config saved to /var/cache/conftool/dbconfig/20220208-095916-marostegui.json
  • 09:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T300775)', diff saved to https://phabricator.wikimedia.org/P20259 and previous config saved to /var/cache/conftool/dbconfig/20220208-095909-marostegui.json
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298554)', diff saved to https://phabricator.wikimedia.org/P20258 and previous config saved to /var/cache/conftool/dbconfig/20220208-095900-ladsgroup.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T300402)', diff saved to https://phabricator.wikimedia.org/P20257 and previous config saved to /var/cache/conftool/dbconfig/20220208-095427-marostegui.json
  • 09:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300402)', diff saved to https://phabricator.wikimedia.org/P20256 and previous config saved to /var/cache/conftool/dbconfig/20220208-095420-marostegui.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P20255 and previous config saved to /var/cache/conftool/dbconfig/20220208-094358-marostegui.json
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P20254 and previous config saved to /var/cache/conftool/dbconfig/20220208-093915-marostegui.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T298554)', diff saved to https://phabricator.wikimedia.org/P20253 and previous config saved to /var/cache/conftool/dbconfig/20220208-093315-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P20252 and previous config saved to /var/cache/conftool/dbconfig/20220208-092853-marostegui.json
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P20251 and previous config saved to /var/cache/conftool/dbconfig/20220208-092410-marostegui.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T300775)', diff saved to https://phabricator.wikimedia.org/P20250 and previous config saved to /var/cache/conftool/dbconfig/20220208-091349-marostegui.json
  • 09:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 09:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300402)', diff saved to https://phabricator.wikimedia.org/P20249 and previous config saved to /var/cache/conftool/dbconfig/20220208-090906-marostegui.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T300402)', diff saved to https://phabricator.wikimedia.org/P20248 and previous config saved to /var/cache/conftool/dbconfig/20220208-084851-marostegui.json
  • 08:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 08:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T300775)', diff saved to https://phabricator.wikimedia.org/P20247 and previous config saved to /var/cache/conftool/dbconfig/20220208-083815-marostegui.json
  • 08:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T300775)', diff saved to https://phabricator.wikimedia.org/P20246 and previous config saved to /var/cache/conftool/dbconfig/20220208-083808-marostegui.json
  • 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P20245 and previous config saved to /var/cache/conftool/dbconfig/20220208-082303-marostegui.json
  • 08:20 marostegui: Stop MySQL on db1115 to backup tendril T297605
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P20244 and previous config saved to /var/cache/conftool/dbconfig/20220208-080758-marostegui.json
  • 08:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T300402)', diff saved to https://phabricator.wikimedia.org/P20243 and previous config saved to /var/cache/conftool/dbconfig/20220208-080709-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T300775)', diff saved to https://phabricator.wikimedia.org/P20242 and previous config saved to /var/cache/conftool/dbconfig/20220208-075254-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P20241 and previous config saved to /var/cache/conftool/dbconfig/20220208-075204-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P20240 and previous config saved to /var/cache/conftool/dbconfig/20220208-073659-marostegui.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T300402)', diff saved to https://phabricator.wikimedia.org/P20239 and previous config saved to /var/cache/conftool/dbconfig/20220208-072155-marostegui.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T300402)', diff saved to https://phabricator.wikimedia.org/P20238 and previous config saved to /var/cache/conftool/dbconfig/20220208-070339-marostegui.json
  • 07:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2134.codfw.wmnet with OS bullseye
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 06:22 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db2134.codfw.wmnet with OS bullseye
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T300775)', diff saved to https://phabricator.wikimedia.org/P20237 and previous config saved to /var/cache/conftool/dbconfig/20220208-060943-marostegui.json
  • 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 06:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Remove contributions group from s1 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P20236 and previous config saved to /var/cache/conftool/dbconfig/20220208-060310-marostegui.json
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:12 ryankemper: T294805 Re-enabling puppet across eqiad elastic fleet: `ryankemper@cumin1001:~$ sudo cumin -b 8 'elastic1*' 'sudo enable-puppet "Add new eqiad replacement hosts elastic10[68-83] - T294805 - root" && sudo run-puppet-agent'` tmux session `elastic`
  • 00:12 ryankemper: T294805 old psi masters are out, done with all elastic master operations
  • 00:05 ryankemper: T294805 new psi masters `elastic1073`, `elastic1075`, and `elastic1083` are in

2022-02-07

  • 23:39 ryankemper: T294805 Removed old masters `elastic1034` and `elastic1038` (and `elastic1040` was removed earlier)
  • 23:35 ryankemper: T294805 Bringing in new omega master `elastic1057`
  • 23:31 ryankemper: T294805 Bringing in new omega master `elastic1076`
  • 23:27 ryankemper: T294805 Bringing in new master `elastic1068`
  • 23:27 ryankemper: T294805 Main search cluster all done, proceeding to `omega` cluster
  • 23:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2053.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:17 cwhite: end opensearch upgrade (eqiad) T299168
  • 23:09 ryankemper: T294805 Kicking out the final master `elastic1036` (which is also the currently elected leader); after this we'll be back to 3 masters as intended
  • 23:06 ryankemper: T294805 Running puppet and restarting elasticsearch services on `elastic1040` to make it no longer a master
  • 23:04 ryankemper: T294805 Bringing in new master `elastic1081`: `sudo systemctl restart elasticsearch_6@production-search-eqiad.service elasticsearch_6@production-search-psi-eqiad.service`
  • 23:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2053.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:04 ryankemper: T294805 Bringing in new master `elastic1081`: `sudo enable-puppet "Add new eqiad replacement hosts elastic10[68-83] - T294805 - root" && sudo run-puppet-agent`
  • 22:59 ryankemper: T294805 `sudo systemctl restart elasticsearch_6@production-search-eqiad.service elasticsearch_6@production-search-omega-eqiad.service` on `elastic1074`
  • 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2052.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:57 ryankemper: T294805 Running puppet agent on new master elastic1074.eqiad.wmnet: `sudo enable-puppet "Add new eqiad replacement hosts elastic10[68-83] - T294805 - root" && sudo run-puppet-agent`
  • 22:48 ryankemper: T294805 Disabled puppet across all of elastic1* in preparation for bringing new master hosts in
  • 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20235 and previous config saved to /var/cache/conftool/dbconfig/20220207-224733-ladsgroup.json
  • 22:45 inflatador: T294805 puppet-merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/736118
  • 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2052.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2051.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P20234 and previous config saved to /var/cache/conftool/dbconfig/20220207-223228-ladsgroup.json
  • 22:25 cwhite: begin opensearch upgrade (eqiad) T299168
  • 22:21 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2051.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P20233 and previous config saved to /var/cache/conftool/dbconfig/20220207-221723-ladsgroup.json
  • 22:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2050.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300510)', diff saved to https://phabricator.wikimedia.org/P20232 and previous config saved to /var/cache/conftool/dbconfig/20220207-221345-ladsgroup.json
  • 22:11 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2055.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20231 and previous config saved to /var/cache/conftool/dbconfig/20220207-220218-ladsgroup.json
  • 22:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2050.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2049.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:00 volans@cumin2002: START - Cookbook sre.hosts.provision for host mc2055.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P20230 and previous config saved to /var/cache/conftool/dbconfig/20220207-215840-ladsgroup.json
  • 21:46 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2049.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P20229 and previous config saved to /var/cache/conftool/dbconfig/20220207-214335-ladsgroup.json
  • 21:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2048.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20228 and previous config saved to /var/cache/conftool/dbconfig/20220207-213650-ladsgroup.json
  • 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 21:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300510)', diff saved to https://phabricator.wikimedia.org/P20227 and previous config saved to /var/cache/conftool/dbconfig/20220207-212830-ladsgroup.json
  • 21:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2048.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2047.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 21:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 21:09 otto@deploy1002: Finished deploy [airflow-dags/analytics-test@6d936db]: (no justification provided) (duration: 00m 08s)
  • 21:09 otto@deploy1002: Started deploy [airflow-dags/analytics-test@6d936db]: (no justification provided)
  • 21:04 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2047.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1129.eqiad.wmnet with OS bullseye
  • 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20225 and previous config saved to /var/cache/conftool/dbconfig/20220207-205620-ladsgroup.json
  • 20:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2046.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P20223 and previous config saved to /var/cache/conftool/dbconfig/20220207-204115-ladsgroup.json
  • 20:34 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2046.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1129.eqiad.wmnet with OS bullseye
  • 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T300510)', diff saved to https://phabricator.wikimedia.org/P20222 and previous config saved to /var/cache/conftool/dbconfig/20220207-203120-ladsgroup.json
  • 20:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 20:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 20:30 mforns@deploy1002: Finished deploy [airflow-dags/analytics-test@9afb96d]: (no justification provided) (duration: 00m 08s)
  • 20:30 mforns@deploy1002: Started deploy [airflow-dags/analytics-test@9afb96d]: (no justification provided)
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P20221 and previous config saved to /var/cache/conftool/dbconfig/20220207-202611-ladsgroup.json
  • 20:23 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: old kernel
  • 20:23 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: old kernel
  • 20:19 eileen: revision 7dcdc017 -> ccd5afc3 civicrm update
  • 20:19 eileen: revision 7dcdc017 -> ccd5afc3
  • 20:19 mforns@deploy1002: Finished deploy [airflow-dags/analytics-test@ef5783e]: (no justification provided) (duration: 00m 07s)
  • 20:18 mforns@deploy1002: Started deploy [airflow-dags/analytics-test@ef5783e]: (no justification provided)
  • 20:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2045.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20220 and previous config saved to /var/cache/conftool/dbconfig/20220207-201106-ladsgroup.json
  • 20:08 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync on main
  • 20:08 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply on main
  • 20:05 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync on main
  • 19:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2045.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:55 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply on main
  • 19:44 mforns@deploy1002: Finished deploy [airflow-dags/analytics-test@c83a4bc]: (no justification provided) (duration: 00m 08s)
  • 19:44 mforns@deploy1002: Started deploy [airflow-dags/analytics-test@c83a4bc]: (no justification provided)
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20219 and previous config saved to /var/cache/conftool/dbconfig/20220207-194020-ladsgroup.json
  • 19:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 19:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298554)', diff saved to https://phabricator.wikimedia.org/P20218 and previous config saved to /var/cache/conftool/dbconfig/20220207-194013-ladsgroup.json
  • 19:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2044.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P20217 and previous config saved to /var/cache/conftool/dbconfig/20220207-192508-ladsgroup.json
  • 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:19 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc2044.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P20216 and previous config saved to /var/cache/conftool/dbconfig/20220207-191003-ladsgroup.json
  • 19:08 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:05 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Turn on wgVectorLanguageAlertInSidebar for all wikis (T300559) (duration: 00m 49s)
  • 19:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298554)', diff saved to https://phabricator.wikimedia.org/P20215 and previous config saved to /var/cache/conftool/dbconfig/20220207-185459-ladsgroup.json
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298554)', diff saved to https://phabricator.wikimedia.org/P20214 and previous config saved to /var/cache/conftool/dbconfig/20220207-183059-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2005.codfw.wmnet with OS buster
  • 18:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 18:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 18:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 18:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20213 and previous config saved to /var/cache/conftool/dbconfig/20220207-180857-ladsgroup.json
  • 18:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase2020.codfw.wmnet with reason: Firmware upgrade
  • 18:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase2020.codfw.wmnet with reason: Firmware upgrade
  • 18:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on restbase2019.codfw.wmnet with reason: Firmware upgrade
  • 18:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on restbase2019.codfw.wmnet with reason: Firmware upgrade
  • 18:01 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:56 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2020.wmnet
  • 17:56 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2019.wmnet
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P20212 and previous config saved to /var/cache/conftool/dbconfig/20220207-175352-ladsgroup.json
  • 17:51 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2005.codfw.wmnet with OS buster
  • 17:42 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2042.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P20211 and previous config saved to /var/cache/conftool/dbconfig/20220207-173848-ladsgroup.json
  • 17:26 volans@cumin2002: START - Cookbook sre.hosts.provision for host mc2042.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2030.codfw.wmnet with OS buster
  • 17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20210 and previous config saved to /var/cache/conftool/dbconfig/20220207-172343-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298554)', diff saved to https://phabricator.wikimedia.org/P20209 and previous config saved to /var/cache/conftool/dbconfig/20220207-165952-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298554)', diff saved to https://phabricator.wikimedia.org/P20208 and previous config saved to /var/cache/conftool/dbconfig/20220207-165944-ladsgroup.json
  • 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2030.codfw.wmnet with OS buster
  • 16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2029.codfw.wmnet with OS buster
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P20207 and previous config saved to /var/cache/conftool/dbconfig/20220207-164439-ladsgroup.json
  • 16:41 moritzm: switch kubestagetcd2003 to plain disk storage
  • 16:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch to plain disk storage
  • 16:38 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2003.codfw.wmnet with reason: Switch to plain disk storage
  • 16:30 moritzm: switch kubestagetcd2002 to plain disk storage
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P20206 and previous config saved to /var/cache/conftool/dbconfig/20220207-162935-ladsgroup.json
  • 16:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch to plain disk storage
  • 16:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch to plain disk storage
  • 16:24 moritzm: switch kubestagetcd2001 to plain disk storage
  • 16:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch to plain disk storage
  • 16:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch to plain disk storage
  • 16:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2029.codfw.wmnet with OS buster
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298554)', diff saved to https://phabricator.wikimedia.org/P20205 and previous config saved to /var/cache/conftool/dbconfig/20220207-161430-ladsgroup.json
  • 16:05 moritzm: migrating instances off ganeti1021
  • 16:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2005.codfw.wmnet with OS bullseye
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298554)', diff saved to https://phabricator.wikimedia.org/P20204 and previous config saved to /var/cache/conftool/dbconfig/20220207-160441-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298554)', diff saved to https://phabricator.wikimedia.org/P20203 and previous config saved to /var/cache/conftool/dbconfig/20220207-160433-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P20201 and previous config saved to /var/cache/conftool/dbconfig/20220207-154928-ladsgroup.json
  • 15:47 moritzm: installing pillow security updates
  • 15:44 jayme@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 02m 30s)
  • 15:41 jayme@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 15:40 jayme: updated scap to 4.3.0 on A:mw-canary, A:parsoid-canary, A:mw-jobrunner-canary, A:restbase-canary - T300804
  • 15:37 jayme: uploaded scap 4.3-0 to apt.w.o - T300804
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P20200 and previous config saved to /var/cache/conftool/dbconfig/20220207-153424-ladsgroup.json
  • 15:30 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve2005.codfw.wmnet with OS bullseye
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298554)', diff saved to https://phabricator.wikimedia.org/P20199 and previous config saved to /var/cache/conftool/dbconfig/20220207-151917-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298554)', diff saved to https://phabricator.wikimedia.org/P20198 and previous config saved to /var/cache/conftool/dbconfig/20220207-151018-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298554)', diff saved to https://phabricator.wikimedia.org/P20197 and previous config saved to /var/cache/conftool/dbconfig/20220207-150959-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P20196 and previous config saved to /var/cache/conftool/dbconfig/20220207-145454-ladsgroup.json
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P20195 and previous config saved to /var/cache/conftool/dbconfig/20220207-143950-ladsgroup.json
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298554)', diff saved to https://phabricator.wikimedia.org/P20194 and previous config saved to /var/cache/conftool/dbconfig/20220207-142445-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298554)', diff saved to https://phabricator.wikimedia.org/P20193 and previous config saved to /var/cache/conftool/dbconfig/20220207-141452-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:14 jbond: update ferm on bullseye
  • 13:12 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1020.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
  • 13:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
  • 12:44 moritzm: installing ruby2.7 security updates
  • 12:40 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc2043.mgmt.codfw.wmnet with reboot policy FORCED
  • 12:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:34 moritzm: revert kubestagetcd1006 to plain disk storage
  • 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:32 taavi: UTC morning deploys done
  • 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: Switch to plain disk storage
  • 12:32 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Ensure GlobalBlocking is not loaded without CentralAuth (T299371) (2/2) (duration: 00m 48s)
  • 12:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1006.eqiad.wmnet with reason: Switch to plain disk storage
  • 12:31 moritzm: revert kubestagetcd1005 to plain disk storage
  • 12:31 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Ensure GlobalBlocking is not loaded without CentralAuth (T299371) (1/2) (duration: 00m 48s)
  • 12:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:27 taavi@deploy1002: Synchronized w/robots.php: Config: Migrate $wmfRealm calls to $wmgRealm (T45956) (3/3) (duration: 00m 48s)
  • 12:26 taavi@deploy1002: Synchronized wmf-config: Config: Migrate $wmfRealm calls to $wmgRealm (T45956) (2/3) (duration: 00m 48s)
  • 12:25 taavi@deploy1002: Synchronized multiversion: Config: Migrate $wmfRealm calls to $wmgRealm (T45956) (1/3) (duration: 00m 48s)
  • 12:25 volans@cumin2002: START - Cookbook sre.hosts.provision for host mc2043.mgmt.codfw.wmnet with reboot policy FORCED
  • 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: Switch to plain disk storage
  • 12:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1005.eqiad.wmnet with reason: Switch to plain disk storage
  • 12:19 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove redundant patrolmarks flag from patroller usergroup (T300913) (duration: 00m 48s)
  • 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:17 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1009.eqiad.wmnet
  • 12:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 12:09 taavi: taavi@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Config: Stop capturing media change tags (T286362) (2/2) (duration: 00m 50s)
  • 12:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 12:08 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Stop capturing media change tags (T286362) (1/2) (duration: 00m 50s)
  • 12:07 moritzm: revert kubestagetcd1004 to plain disk storage
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: Switch to plain disk storage
  • 12:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd1004.eqiad.wmnet with reason: Switch to plain disk storage
  • 11:59 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1008.eqiad.wmnet
  • 11:40 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1007.eqiad.wmnet
  • 11:18 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 11:18 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
  • 11:18 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync on production
  • 11:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on production
  • 11:14 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync on staging
  • 11:14 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync on production
  • 11:00 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1006.eqiad.wmnet
  • 10:51 mmandere: rolling upgrade of varnish from version 6.0.9 to 6.0.10 across DCs T300264
  • 10:49 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus2004.codfw.wmnet
  • 10:49 filippo@puppetmaster1001: conftool action : set/pooled=no; selector: name=prometheus1004.eqiad.wmnet
  • 10:22 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1005.eqiad.wmnet
  • 09:59 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=aqs1004.eqiad.wmnet
  • 09:21 godog: temp-disable mfa for 'filippo' - T296629
  • 09:09 jayme: uncordoned kubernetes1014 - T301099
  • 08:02 jayme: powercycle kubernetes1014 - T301099
  • 06:20 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on kubernetes1014.eqiad.wmnet with reason: potential HW error
  • 06:20 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on kubernetes1014.eqiad.wmnet with reason: potential HW error
  • 06:10 jayme: draining kubernetes1014

2022-02-05

  • 22:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2003-dev.codfw.wmnet with OS bullseye
  • 21:28 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2003-dev.codfw.wmnet with OS bullseye
  • 20:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
  • 19:29 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
  • 18:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 17:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 16:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 06:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 06:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 05:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bullseye

2022-02-04

  • 23:43 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 23:43 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 23:02 inflatador: bking@deployment-puppetmaster04 local commit to public/private repo, see T299797 for more details
  • 22:37 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 22:36 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 19:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices2002-dev.wikimedia.org with OS bullseye
  • 18:52 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices2002-dev.wikimedia.org with OS bullseye
  • 17:00 arturo: add mcrouter 2022.01.31.00-1 to bullseye-wikimedia (T300578)
  • 16:48 jbond: update add new ferm package ferm_2.5.1-1+wmf11u2
  • 16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:35 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:05 elukey: unmask prometheus-mysqld-exporter.service and clean up the old @analytics + wmf_auto_restart units (service+timer) not used anymore on an-coord100[12]
  • 14:25 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 14:18 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1020.eqiad.wmnet with OS buster
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20174 and previous config saved to /var/cache/conftool/dbconfig/20220204-114117-root.json
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20173 and previous config saved to /var/cache/conftool/dbconfig/20220204-112613-root.json
  • 11:14 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1020.eqiad.wmnet with OS buster
  • 11:13 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20172 and previous config saved to /var/cache/conftool/dbconfig/20220204-111110-root.json
  • 11:07 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Remove all special groups from s1 codfw T263127', diff saved to https://phabricator.wikimedia.org/P20171 and previous config saved to /var/cache/conftool/dbconfig/20220204-110427-marostegui.json
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20170 and previous config saved to /var/cache/conftool/dbconfig/20220204-105606-root.json
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P20165 and previous config saved to /var/cache/conftool/dbconfig/20220204-104102-root.json
  • 10:40 moritzm: rebalancing row A in ganeti/eqiad, all nodes of that row are now running Buster T296721
  • 10:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1008.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 10:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1008.eqiad.wmnet to ganeti01.svc.eqiad.wmnet
  • 09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1008.eqiad.wmnet
  • 09:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1008.eqiad.wmnet
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove watchlist group from s4 eqiad T263127', diff saved to https://phabricator.wikimedia.org/P20164 and previous config saved to /var/cache/conftool/dbconfig/20220204-082010-marostegui.json
  • 07:18 elukey: `git checkout main.html` on miscweb1002:/srv/org/wikidata/query to avoid puppet corrective actions (and the host being listed in alarms)
  • 07:09 elukey: cleanup wmf_auto_restart_prometheus-mysqld-exporter@analytics-meta on an-test-coord1001 and unmasked wmf_auto_restart_prometheus-mysqld-exporter (now used)
  • 07:03 elukey: clean up wmf_auto_restart_prometheus-mysqld-exporter@matomo on matomo1002 (not used anymore, listed as failed)
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 schema change', diff saved to https://phabricator.wikimedia.org/P20163 and previous config saved to /var/cache/conftool/dbconfig/20220204-070003-marostegui.json
  • 06:00 legoktm: uploaded pygments 2.11.2 to apt.wm.o (T298399)
  • 02:48 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic2035.codfw.wmnet
  • 02:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts elastic2035.codfw.wmnet
  • 02:41 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic2035.codfw.wmnet
  • 01:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 01:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 01:04 brennen: for-real end of utc late backport & config window
  • 01:04 brennen@deploy1002: Synchronized php-1.38.0-wmf.20/extensions/Thanks/modules/ext.thanks.flowthank.js: Backport: Correct attribute for flow thanks (T300831) (duration: 00m 49s)
  • 00:50 brennen: reopening utc late backport window for Correct attribute for flow thanks (T300831)
  • 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:12 cjming: end of UTC late backport & config window
  • 00:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update icons, wordmark for test wikis (T299512) (duration: 00m 49s)
  • 00:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 00:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 00:10 cjming@deploy1002: Synchronized static/images/mobile/copyright/: Config: Update icons, wordmark for test wikis (T299512) (duration: 00m 53s)
  • 00:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn

2022-02-03

  • 23:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300402)', diff saved to https://phabricator.wikimedia.org/P20159 and previous config saved to /var/cache/conftool/dbconfig/20220203-233447-marostegui.json
  • 23:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P20158 and previous config saved to /var/cache/conftool/dbconfig/20220203-231942-marostegui.json
  • 23:15 ryankemper: T294805 Added a silence on alerts.wikimedia.org for `CirrusSearchJVMGCOldPoolFlatlined`
  • 23:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P20157 and previous config saved to /var/cache/conftool/dbconfig/20220203-230437-marostegui.json
  • 22:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300402)', diff saved to https://phabricator.wikimedia.org/P20156 and previous config saved to /var/cache/conftool/dbconfig/20220203-224933-marostegui.json
  • 22:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T300402)', diff saved to https://phabricator.wikimedia.org/P20155 and previous config saved to /var/cache/conftool/dbconfig/20220203-223923-marostegui.json
  • 22:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 22:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 22:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300402)', diff saved to https://phabricator.wikimedia.org/P20154 and previous config saved to /var/cache/conftool/dbconfig/20220203-223916-marostegui.json
  • 22:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P20153 and previous config saved to /var/cache/conftool/dbconfig/20220203-222411-marostegui.json
  • 22:18 ryankemper: T294805 Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?orgId=1&var-cirrus_group=eqiad&var-cluster=elasticsearch&var-exported_cluster=production-search&var-smoothing=1&refresh=1m&from=now-3h&to=now as new hosts join the fleet
  • 22:18 ryankemper: T294805 Bringing in new eqiad hosts in batches of 4, with 15-20 mins between batches: `ryankemper@cumin1001:~$ sudo -E cumin -b 4 'elastic1*' 'sudo run-puppet-agent --force; sudo run-puppet-agent; sleep 900'` tmux session `es_eqiad`
  • 22:13 ryankemper: T294805 https://gerrit.wikimedia.org/r/c/operations/puppet/+/759617/ fixed the dependency issues, going to start bringing new hosts into service
  • 22:09 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P20152 and previous config saved to /var/cache/conftool/dbconfig/20220203-220906-marostegui.json
  • 22:05 eileen: civicrm revision 7dcdc017 -> 04cbf35b
  • 22:04 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 21:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300402)', diff saved to https://phabricator.wikimedia.org/P20150 and previous config saved to /var/cache/conftool/dbconfig/20220203-215402-marostegui.json
  • 21:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T300402)', diff saved to https://phabricator.wikimedia.org/P20149 and previous config saved to /var/cache/conftool/dbconfig/20220203-215154-marostegui.json
  • 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 21:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 21:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 21:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300402)', diff saved to https://phabricator.wikimedia.org/P20148 and previous config saved to /var/cache/conftool/dbconfig/20220203-215121-marostegui.json
  • 21:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P20147 and previous config saved to /var/cache/conftool/dbconfig/20220203-213616-marostegui.json
  • 21:28 rzl: root@apt1001:/home/rzl# reprepro copy bullseye-wikimedia buster-wikimedia envoyproxy # T300324
  • 21:27 rzl: root@apt1001:/home/rzl# reprepro copy stretch-wikimedia buster-wikimedia envoyproxy # T300324
  • 21:21 ryankemper: T294805 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/759588; hoping this resolves dependency issues. Running puppet agent on `elastic1068`
  • 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P20145 and previous config saved to /var/cache/conftool/dbconfig/20220203-212111-marostegui.json
  • 21:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300402)', diff saved to https://phabricator.wikimedia.org/P20144 and previous config saved to /var/cache/conftool/dbconfig/20220203-210607-marostegui.json
  • 21:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T300402)', diff saved to https://phabricator.wikimedia.org/P20143 and previous config saved to /var/cache/conftool/dbconfig/20220203-210358-marostegui.json
  • 21:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 21:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 21:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300402)', diff saved to https://phabricator.wikimedia.org/P20142 and previous config saved to /var/cache/conftool/dbconfig/20220203-210350-marostegui.json
  • 20:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P20140 and previous config saved to /var/cache/conftool/dbconfig/20220203-204846-marostegui.json
  • 20:43 rzl: rzl@mwmaint1002:~$ sudo systemctl start mediawiki_job_recount_categories.service # T299823
  • 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P20139 and previous config saved to /var/cache/conftool/dbconfig/20220203-203341-marostegui.json
  • 20:26 ryankemper: T294805 Running puppet on `elastic1068` failed, looks like `/usr/share/elasticsearch/lib` wasn't there: https://phabricator.wikimedia.org/P20138
  • 20:26 ryankemper: T294805 Running puppet on `elastic1068` failed, looks like `/usr/share/elasticsearch/lib' wasn't there: https://phabricator.wikimedia.org/P20138
  • 20:25 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx1001.wikimedia.org with reason: systemd testing
  • 20:25 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mx1001.wikimedia.org with reason: systemd testing
  • 20:22 ryankemper: T294805 Running puppet on single elastic host: `ryankemper@elastic1068:~$ sudo run-puppet-agent --force`
  • 20:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300402)', diff saved to https://phabricator.wikimedia.org/P20137 and previous config saved to /var/cache/conftool/dbconfig/20220203-201836-marostegui.json
  • 20:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T300402)', diff saved to https://phabricator.wikimedia.org/P20136 and previous config saved to /var/cache/conftool/dbconfig/20220203-201729-marostegui.json
  • 20:17 ryankemper: T294805 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/759317 to activate roles for elastic eqiad replacement hosts
  • 20:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 20:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 20:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300402)', diff saved to https://phabricator.wikimedia.org/P20135 and previous config saved to /var/cache/conftool/dbconfig/20220203-201721-marostegui.json
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:16 ryankemper: T294805 Disabled puppet on `elastic1*` in preparation for bringing new hosts into service: `ryankemper@cumin1001:~$ sudo cumin 'elastic1*' 'sudo disable-puppet "Add new eqiad replacement hosts elastic10[68-83] - T294805"'`
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1003.eqiad.wmnet with OS buster
  • 20:11 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.20 refs T293961
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:08 mutante: planet1002/planet2002 - sudo systemctl start planet-update-en to manually start update after adding diff.wikimedia.org T230444
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 20:07 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/skins/Vector/includes/Hooks.php: Backport: Drop skin override (T300814) (2/2) (duration: 00m 49s)
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 20:06 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/skins/Vector/skin.json: Backport: Drop skin override (T300814) (1/2) (duration: 00m 49s)
  • 20:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1004.eqiad.wmnet with OS buster
  • 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P20134 and previous config saved to /var/cache/conftool/dbconfig/20220203-200217-marostegui.json
  • 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P20133 and previous config saved to /var/cache/conftool/dbconfig/20220203-194712-marostegui.json
  • 19:45 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1003.eqiad.wmnet with OS buster
  • 19:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:41 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudbackup1003.eqiad.wmnet with OS buster
  • 19:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1004.eqiad.wmnet with OS buster
  • 19:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:39 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudbackup1004.eqiad.wmnet with OS buster
  • 19:35 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1003.eqiad.wmnet with OS buster
  • 19:34 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/skins/Vector/includes/Hooks.php: Backport: Pass skin name to Hooks::isSkinLegacy (T299971) (duration: 00m 49s)
  • 19:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:33 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/extensions/ContentTranslation/modules/entrypoints/ext.cx.entrypoints.contributionsmenu.js: Backport: Update skin checks with new vector skin key. (T298916 T300814) (duration: 00m 50s)
  • 19:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1004.eqiad.wmnet with OS buster
  • 19:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300402)', diff saved to https://phabricator.wikimedia.org/P20132 and previous config saved to /var/cache/conftool/dbconfig/20220203-193208-marostegui.json
  • 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:29 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/extensions/WikiEditor/modules/ext.wikiEditor.js: Backport: New bucket for abtest data (T291308) (2/2) (duration: 00m 50s)
  • 19:28 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/extensions/WikiEditor/includes/Hooks.php: Backport: New bucket for abtest data (T291308) (1/2) (duration: 00m 49s)
  • 19:27 taavi@deploy1002: Synchronized php-1.38.0-wmf.20/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.trackSubscriber.js: Backport: New bucket for abtest data (T291308) (duration: 00m 50s)
  • 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:26 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: commonswiki: Add three domains to the wgCopyUploadsDomains allowlist (T299835 T300848) (duration: 00m 54s)
  • 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync on pinkunicorn
  • 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply on pinkunicorn
  • 18:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:36 marostegui@cumin1001: