Server Admin Log/Archive 50

From Wikitech

2022-03-31

  • 23:45 mutante: gitlab2001 - fdisk /dev/vdb (g, w) (create partition table), (n, w) (create partition) ; mkfs.ext4 /dev/vdb1 (create filesystem); systemctl reset-failed (fix Icinga alert); mkdir /mnt/gitlab-backup; mount /dev/vdb1 /mnt/gitlab-backup ; blkid (get UUID); edit /etc/fstab and insert "UUID=c5235682-ac21-46a9-85ee-9603f694a6a4 /mnt/gitlab-backup ext4 errors=remount-ro 0 2" T274463
  • 23:27 mutante: gitlab2001 - rebooted on ganeti level (needed when adding new virtual hardware), then ran into the usual bug T272555 where you have to manually fix the interface in /etc/network/interfaces T274463
  • 23:21 mutante: gitlab2001 (gitlab-replica.wikimedia.org) - rebooting to add new virtual disk T274463
  • 23:11 ejegg: updated payments-wiki from 47d9bd27 to 6f888c28
  • 23:01 bblack: esams->drmrs failover test begins - T304089
  • 22:34 moritzm: updated CAS to 6.4.6.2
  • 22:28 mutante: ganeti - creating new 100G virtual disk on gitlab1001 T274463
  • 22:24 mutante: ganeti - creating new 100G virtual disk on gitlab2001 T274463
  • 22:16 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 22:03 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 22:02 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:51 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:48 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:40 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:19 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^(cp1075|cp1079|cp2035|cp3050|cp3051|cp3052|cp3054|cp4022|cp5013|cp5014|cp5015).*
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:17 bblack@cumin1001: conftool action : select; selector: name="^(cp1075|cp1079|cp2035|cp3050|cp3051|cp3052|cp3054|cp4022|cp5013|cp5014|cp5015).*"
  • 21:13 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Remove unused Flow config (duration: 00m 49s)
  • 21:07 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp5012.eqsin.wmnet
  • 21:07 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:06 thcipriani: utc late backport complete
  • 21:03 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 20:59 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:56 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 20:56 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/GrowthExperiments/modules/ext.growthExperiments.Homepage.SuggestedEdits/MatchModeSelectWidget.less: Backport: Newcomer tasks: always align button and text to the right (T301825) (duration: 00m 50s)
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:49 thcipriani@deploy1002: Synchronized tests: Config (noop -- tests) (duration: 00m 50s)
  • 20:47 thcipriani@deploy1002: Synchronized src/StaticSiteConfiguration.php: Config (noop -- comment change): phpcs: enable and fix PropertyDocumentation.MissingVar (T171115) (duration: 00m 50s)
  • 20:46 thcipriani@deploy1002: Synchronized phpcs.xml: Config (noop): phpcs: enable and fix PropertyDocumentation.MissingVar (T171115) phpcs: rename test files to match class names (T171115) phpcs: enable rules that are already passing (T171115) (duration: 00m 49s)
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:40 mutante: reserving port 4017 for new k8s service request 'image-suggestions' T304891
  • 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:36 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Stop writing to $wmfLocalServices (T45956) (duration: 00m 50s)
  • 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:29 thcipriani@deploy1002: Synchronized wmf-config: Config: Migrate $wmfLocalServices to $wmgLocalServices (T45956) (duration: 00m 51s)
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2007.codfw.wmnet
  • 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:22 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6001.drmrs.wmnet
  • 20:22 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Start writing to $wmgLocalServices the same value as to $wmfLocalServices (T45956) (duration: 00m 50s)
  • 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:21 mutante: contint2002 - reboot (insetup host)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:18 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6001.drmrs.wmnet
  • 20:17 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2007.codfw.wmnet
  • 20:16 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2035.codfw.wmnet,service=ats-be
  • 20:16 thcipriani@deploy1002: Synchronized wmf-config/PhpAutoPrepend.php: Config: Migrate $wmfServiceConfig to $wmgServiceConfig (T45956) (duration: 00m 50s)
  • 20:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
  • 20:12 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5001.eqsin.wmnet
  • 20:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp1075.eqiad.wmnet
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:11 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2376.codfw.wmnet
  • 20:10 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2374.codfw.wmnet
  • 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:09 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2272.codfw.wmnet
  • 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:09 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2252.codfw.wmnet
  • 20:08 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2271.codfw.wmnet
  • 20:08 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2251.codfw.wmnet
  • 20:07 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
  • 20:07 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5001.eqsin.wmnet
  • 20:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5014.eqsin.wmnet
  • 20:05 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2376.codfw.wmnet
  • 20:05 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2374.codfw.wmnet
  • 20:04 mutante: mw2271,mw2222 - canary appserver, rebooting
  • 20:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2035.codfw.wmnet
  • 20:04 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4005.ulsfo.wmnet
  • 20:01 mutante: mw2251,mw2252 - canary appserver, rebooting
  • 20:00 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4005.ulsfo.wmnet
  • 19:59 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2272.codfw.wmnet
  • 19:59 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2271.codfw.wmnet
  • 19:58 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2252.codfw.wmnet
  • 19:57 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=mw2251.codfw.wmnet
  • 19:55 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3006.esams.wmnet
  • 19:46 mutante: phab2001 - systemctl restart ssh-phab
  • 19:45 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3006.esams.wmnet
  • 19:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3052.esams.wmnet
  • 19:43 rzl: Rolling-restarted zotero to un-wedge wedged pods with offscale high CPU
  • 19:42 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: sync
  • 19:42 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: sync
  • 19:38 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2008.codfw.wmnet
  • 19:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5014.eqsin.wmnet
  • 19:31 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3052.esams.wmnet
  • 19:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3051.esams.wmnet
  • 19:28 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1016.eqiad.wmnet
  • 19:27 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5015.eqsin.wmnet
  • 19:26 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2008.codfw.wmnet
  • 19:24 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 19:24 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1016.eqiad.wmnet
  • 19:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1015.eqiad.wmnet
  • 19:23 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
  • 19:21 cwhite: remove openjdk-8-jre from eqiad logstash nodes T301770
  • 19:21 mutante: phab2001 - powercycling via mgmt
  • 19:20 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1015.eqiad.wmnet
  • 19:20 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1014.eqiad.wmnet
  • 19:19 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
  • 19:17 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 19:15 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1014.eqiad.wmnet
  • 19:15 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1013.eqiad.wmnet
  • 19:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6002.drmrs.wmnet
  • 19:14 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3051.esams.wmnet
  • 19:14 mutante: phab2001 - git-ssh.codfw - rebooting - might cause pybal alert
  • 19:13 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5015.eqsin.wmnet
  • 19:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4022.ulsfo.wmnet
  • 19:11 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1013.eqiad.wmnet
  • 19:09 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6002.drmrs.wmnet
  • 19:08 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2035.codfw.wmnet
  • 19:07 bblack@cumin1001: conftool action : set/pooled=yes; selector: cluster=ml_staging
  • 19:07 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp1075.eqiad.wmnet
  • 19:07 bblack@cumin1001: conftool action : set/weight=1; selector: cluster=ml_staging
  • 19:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5013.eqsin.wmnet
  • 19:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3050.esams.wmnet
  • 19:06 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5002.eqsin.wmnet
  • 19:05 mutante: doc.wikimedia.org - short downtime due to maintenance, rebooting doc1001
  • 19:02 mutante: testreduce1001 - needed manual nginx restart after reboot to make https://parsoid-rt-tests.wikimedia.org/ work again
  • 19:01 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5002.eqsin.wmnet
  • 19:00 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include bullseye-wikimedia /home/rzl/httpbb/bullseye/httpbb_0.0.1-1+deb11u1_source.changes
  • 19:00 mutante: testreduce1001 - rebooting
  • 18:59 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4006.ulsfo.wmnet
  • 18:59 mutante: https://parsoid-rt-tests.wikimedia.org/ - short downtime due to maintenance
  • 18:59 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4022.ulsfo.wmnet
  • 18:56 mutante: scandium - rebooting
  • 18:54 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4006.ulsfo.wmnet
  • 18:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3050.esams.wmnet
  • 18:53 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5013.eqsin.wmnet
  • 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3054.esams.wmnet
  • 18:50 mutante: mwdebug1001 - rebooting
  • 18:49 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3005.esams.wmnet
  • 18:43 duesen: removing /var/run/php/use-config-schema from canaries mw1415, mw1438, and mw1448 to disable config schema loading (T304460)
  • 18:41 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3005.esams.wmnet
  • 18:36 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3054.esams.wmnet
  • 18:36 mutante: gerrit-replica.wikimedia.org short downtime, rebooting gerrit2001
  • 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:23 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/TimedMediaHandler/resources/ext.tmh.player.styles.less: Backport: Set noflip for css rule that needs it (T305156) (duration: 00m 51s)
  • 18:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:20 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2009.codfw.wmnet
  • 18:19 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@ba88f51]: 0.3.109 (duration: 07m 24s)
  • 18:14 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host authdns2001.wikimedia.org
  • 18:13 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.109` on canary `wdqs1003`; proceeding to rest of fleet
  • 18:11 ryankemper@deploy1002: Started deploy [wdqs/wdqs@ba88f51]: 0.3.109
  • 18:11 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.109`. Pre-deploy tests passing on canary `wdqs1003`
  • 18:08 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2009.codfw.wmnet
  • 18:03 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
  • 17:57 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
  • 17:52 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host authdns2001.wikimedia.org
  • 17:47 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host authdns1001.wikimedia.org
  • 17:41 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host authdns1001.wikimedia.org
  • 17:37 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6003.drmrs.wmnet
  • 17:31 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1001.wikimedia.org
  • 17:30 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6003.drmrs.wmnet
  • 17:30 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5003.eqsin.wmnet
  • 17:25 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns1001.wikimedia.org
  • 17:25 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2001.wikimedia.org
  • 17:24 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5003.eqsin.wmnet
  • 17:24 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4007.ulsfo.wmnet
  • 17:17 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4007.ulsfo.wmnet
  • 17:17 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3007.esams.wmnet
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Maint', diff saved to https://phabricator.wikimedia.org/P24019 and previous config saved to /var/cache/conftool/dbconfig/20220331-171724-ladsgroup.json
  • 17:10 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3007.esams.wmnet
  • 17:10 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2010.codfw.wmnet
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Maint', diff saved to https://phabricator.wikimedia.org/P24018 and previous config saved to /var/cache/conftool/dbconfig/20220331-170221-ladsgroup.json
  • 16:58 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs2010.codfw.wmnet
  • 16:58 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
  • 16:57 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6002.wikimedia.org
  • 16:55 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns2001.wikimedia.org
  • 16:54 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3001.wikimedia.org
  • 16:51 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
  • 16:51 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns6002.wikimedia.org
  • 16:51 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5002.wikimedia.org
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Maint', diff saved to https://phabricator.wikimedia.org/P24017 and previous config saved to /var/cache/conftool/dbconfig/20220331-164717-ladsgroup.json
  • 16:47 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns3001.wikimedia.org
  • 16:47 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4001.wikimedia.org
  • 16:42 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns5002.wikimedia.org
  • 16:42 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4002.wikimedia.org
  • 16:37 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns4001.wikimedia.org
  • 16:37 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5001.wikimedia.org
  • 16:33 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns4002.wikimedia.org
  • 16:33 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns3002.wikimedia.org
  • 16:33 duesen: creating /var/run/php/use-config-schema on canaries mw1415, mw1438, and mw1448 to enable config schema loading (T304460)
  • 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Maint', diff saved to https://phabricator.wikimedia.org/P24016 and previous config saved to /var/cache/conftool/dbconfig/20220331-163213-ladsgroup.json
  • 16:28 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns5001.wikimedia.org
  • 16:28 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6001.wikimedia.org
  • 16:25 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns3002.wikimedia.org
  • 16:25 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns1002.wikimedia.org
  • 16:20 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns6001.wikimedia.org
  • 16:19 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns1002.wikimedia.org
  • 16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: Maint', diff saved to https://phabricator.wikimedia.org/P24015 and previous config saved to /var/cache/conftool/dbconfig/20220331-161709-ladsgroup.json
  • 16:17 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns2002.wikimedia.org
  • 16:11 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns2002.wikimedia.org
  • 16:11 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host dns2002.wikimedia.org
  • 16:11 bblack@cumin1001: START - Cookbook sre.hosts.reboot-single for host dns2002.wikimedia.org
  • 15:59 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:45 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:45 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:44 mmandere: pool cp6016 with HAProxy as TLS termination layer - T290005
  • 15:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster
  • 15:40 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 15:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 15:35 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:18 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
  • 15:15 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
  • 15:13 mmandere: pool cp5009 with HAProxy as TLS termination layer - T290005
  • 15:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:11 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:10 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 12 hosts with reason: reboot for update T304938
  • 15:10 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5009.eqsin.wmnet with OS buster
  • 15:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on 12 hosts with reason: reboot for update T304938
  • 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on durum[1001-1002].eqiad.wmnet with reason: reboot for update T304938
  • 15:05 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on durum[1001-1002].eqiad.wmnet with reason: reboot for update T304938
  • 15:05 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:57 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS buster
  • 14:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh6002.wikimedia.org with reason: reboot for kernel update T304938
  • 14:56 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh6002.wikimedia.org with reason: reboot for kernel update T304938
  • 14:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh6001.wikimedia.org with reason: reboot for kernel update T304938
  • 14:56 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh6001.wikimedia.org with reason: reboot for kernel update T304938
  • 14:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh5002.wikimedia.org with reason: reboot for kernel update T304938
  • 14:52 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh5002.wikimedia.org with reason: reboot for kernel update T304938
  • 14:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh5001.wikimedia.org with reason: reboot for kernel update T304938
  • 14:52 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh5001.wikimedia.org with reason: reboot for kernel update T304938
  • 14:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:50 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:47 mmandere: depool cp6016 for reimage - T290005
  • 14:46 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on doh4002.wikimedia.org with reason: reboot for kernel update T304938
  • 14:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on doh4002.wikimedia.org with reason: reboot for kernel update T304938
  • 14:44 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on doh4001.wikimedia.org with reason: reboot for kernel update T304938
  • 14:43 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on doh4001.wikimedia.org with reason: reboot for kernel update T304938
  • 14:39 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5009.eqsin.wmnet with reason: host reimage
  • 14:36 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5009.eqsin.wmnet with reason: host reimage
  • 14:22 duesen: (late) about 5 hours ago, I removed /var/run/php/use-config-schema from mw1415 to disable config schema loading (T304460)
  • 14:09 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5009.eqsin.wmnet with OS buster
  • 14:05 mmandere@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5009.eqsin.wmnet with OS buster
  • 14:03 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp5009.eqsin.wmnet with OS buster
  • 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:02 moritzm: installing vim security updates on buster
  • 14:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1002.wikimedia.org
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:56 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:55 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.5/includes/changetags/ChangeTags.php: Backport: ChangeTags: Use localizer with correct page title to parse messages (T302754) (duration: 00m 51s)
  • 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:53 mmandere: depool cp5009 for reimage - T290005
  • 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1002.wikimedia.org
  • 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2001.wikimedia.org
  • 13:51 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.5/resources/src/mediawiki.special.createaccount/HtmlformChecker.js: Backport: Fix error/warning boxes on signup form (T305098) (duration: 00m 50s)
  • 13:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2001.wikimedia.org
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/CentralAuth/includes/Special/GlobalUsersPager.php: Backport: Revert "GlobalUsersPager: add gu_id to GROUP BY" (duration: 00m 50s)
  • 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:20 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.5/tests/phpunit/structure/SpecialPageFatalTest.php: Backport: Revert "Add SpecialPageFatalTest to @group Database" (no-op) (duration: 00m 50s)
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Configure `mul` language code on Test Wikidata and its clients (T297393) (2/2) (duration: 00m 50s)
  • 13:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Configure `mul` language code on Test Wikidata and its clients (T297393) (1/2) (duration: 00m 51s)
  • 13:03 mmandere: pool cp4023 with HAProxy as TLS termination layer - T290005
  • 12:53 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4023.ulsfo.wmnet with OS buster
  • 12:53 mmandere: pool cp3057 with HAProxy as TLS termination layer - T290005
  • 12:50 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3057.esams.wmnet with OS buster
  • 12:48 XioNoX: analytics1-b/c/d-eqiad: replace firewall filter with strict uRPF - T298087
  • 12:31 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4023.ulsfo.wmnet with reason: host reimage
  • 12:28 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4023.ulsfo.wmnet with reason: host reimage
  • 12:25 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298557)', diff saved to https://phabricator.wikimedia.org/P24013 and previous config saved to /var/cache/conftool/dbconfig/20220331-122247-marostegui.json
  • 12:22 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3057.esams.wmnet with reason: host reimage
  • 12:12 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4023.ulsfo.wmnet with OS buster
  • 12:07 mmandere: depool cp4023 for reimage - T290005
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P24012 and previous config saved to /var/cache/conftool/dbconfig/20220331-120742-marostegui.json
  • 12:04 moritzm: installing wireshark security updates
  • 11:54 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3057.esams.wmnet with OS buster
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P24011 and previous config saved to /var/cache/conftool/dbconfig/20220331-115235-marostegui.json
  • 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
  • 11:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
  • 11:39 mmandere: depool cp3057 for reimage - T290005
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298557)', diff saved to https://phabricator.wikimedia.org/P24010 and previous config saved to /var/cache/conftool/dbconfig/20220331-113730-marostegui.json
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2002.codfw.wmnet
  • 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2002.codfw.wmnet
  • 11:19 moritzm: installing libpcap security updates
  • 11:16 mmandere: pool cp3056 with HAProxy as TLS termination layer - T290005
  • 11:08 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3056.esams.wmnet with OS buster
  • 10:55 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:55 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:55 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:53 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:53 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:44 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 10:41 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1002.eqiad.wmnet
  • 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1002.eqiad.wmnet
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T297189)', diff saved to https://phabricator.wikimedia.org/P24009 and previous config saved to /var/cache/conftool/dbconfig/20220331-102819-marostegui.json
  • 10:26 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:26 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 10:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2002.codfw.wmnet
  • 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2002.codfw.wmnet
  • 10:14 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3056.esams.wmnet with OS buster
  • 10:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P24007 and previous config saved to /var/cache/conftool/dbconfig/20220331-101314-marostegui.json
  • 10:12 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:12 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb1002.eqiad.wmnet
  • 10:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host miscweb1002.eqiad.wmnet
  • 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb2002.codfw.wmnet
  • 10:00 mmandere: pool cp4029 with HAProxy as TLS termination layer - T290005
  • 10:00 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P24006 and previous config saved to /var/cache/conftool/dbconfig/20220331-095809-marostegui.json
  • 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host miscweb2002.codfw.wmnet
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T298557)', diff saved to https://phabricator.wikimedia.org/P24005 and previous config saved to /var/cache/conftool/dbconfig/20220331-095319-marostegui.json
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P24004 and previous config saved to /var/cache/conftool/dbconfig/20220331-095228-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T297189)', diff saved to https://phabricator.wikimedia.org/P24003 and previous config saved to /var/cache/conftool/dbconfig/20220331-094304-marostegui.json
  • 09:43 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4029.ulsfo.wmnet with OS buster
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P24002 and previous config saved to /var/cache/conftool/dbconfig/20220331-093725-root.json
  • 09:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1003.eqiad.wmnet
  • 09:26 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3056.esams.wmnet with OS buster
  • 09:25 duesen: removed /var/run/php/use-config-schema from mwdebug1002 to disable config schema loading (T304460)
  • 09:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1003.eqiad.wmnet
  • 09:23 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1002.eqiad.wmnet
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P24001 and previous config saved to /var/cache/conftool/dbconfig/20220331-092221-root.json
  • 09:21 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4029.ulsfo.wmnet with reason: host reimage
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet
  • 09:18 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4029.ulsfo.wmnet with reason: host reimage
  • 09:18 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1002.eqiad.wmnet
  • 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet
  • 09:16 duesen: created /var/run/php/use-config-schema on canary mw1415 to enable config schema loading (T304460)
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T297189)', diff saved to https://phabricator.wikimedia.org/P24000 and previous config saved to /var/cache/conftool/dbconfig/20220331-091626-marostegui.json
  • 09:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on ms-be1069.eqiad.wmnet with reason: Puppet errors during reimage
  • 09:09 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on ms-be1069.eqiad.wmnet with reason: Puppet errors during reimage
  • 09:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be1069.eqiad.wmnet with reason: Puppet errors during reimage
  • 09:08 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ms-be1069.eqiad.wmnet with reason: Puppet errors during reimage
  • 09:08 duesen: created /var/run/php/use-config-schema on mwdebug1002 to enable config schema loading (T304460)
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23999 and previous config saved to /var/cache/conftool/dbconfig/20220331-090717-root.json
  • 09:02 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp4029.ulsfo.wmnet with OS buster
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp1001.wikimedia.org
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
  • 08:58 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1069.eqiad.wmnet with OS stretch
  • 08:57 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 08:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
  • 08:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-corp1001.wikimedia.org
  • 08:54 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3056.esams.wmnet with reason: host reimage
  • 08:53 mmandere: depool cp4029 for reimage - T290005
  • 08:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1001.eqiad.wmnet
  • 08:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1001.eqiad.wmnet
  • 08:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
  • 08:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:40 XioNoX: analytics1-a-eqiad: replace firewall filter with strict uRPF - T298087
  • 08:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
  • 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-corp2001.wikimedia.org
  • 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:35 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.5 refs T300204
  • 08:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-corp2001.wikimedia.org
  • 08:30 hashar@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/OATHAuth/src/OATHUserRepository.php: Backport: Revert "OATHUserRepository: Stop handling legacy single-key" (T305029) (duration: 00m 51s)
  • 08:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298557)', diff saved to https://phabricator.wikimedia.org/P23997 and previous config saved to /var/cache/conftool/dbconfig/20220331-082525-marostegui.json
  • 08:25 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp3056.esams.wmnet with OS buster
  • 08:19 daniel@deploy1002: Synchronized php-1.39.0-wmf.5/extensions/GrowthExperiments/modules/ext.growthExperiments.PostEdit/index.js: Backport: Post-edit dialog: check for presence of preferences.topicFilters (T305057) (duration: 00m 53s)
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P23996 and previous config saved to /var/cache/conftool/dbconfig/20220331-081020-marostegui.json
  • 08:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P23995 and previous config saved to /var/cache/conftool/dbconfig/20220331-075515-marostegui.json
  • 07:41 mmandere: depool cp3056 for reimage - T290005
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298557)', diff saved to https://phabricator.wikimedia.org/P23994 and previous config saved to /var/cache/conftool/dbconfig/20220331-074010-marostegui.json
  • 07:30 daniel@deploy1002: Synchronized multiversion/defines.php: Config: Set MW_USE_CONFIG_SCHEMA constant if file exists. (T304460) (duration: 00m 52s)
  • 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:18 moritzm: updating libapache2-mod-auth-cas on buster hosts
  • 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:49 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 06:48 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23993 and previous config saved to /var/cache/conftool/dbconfig/20220331-063429-ladsgroup.json
  • 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23992 and previous config saved to /var/cache/conftool/dbconfig/20220331-061923-ladsgroup.json
  • 06:12 marostegui: dbmaint s5@eqiad T300381
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 T303798', diff saved to https://phabricator.wikimedia.org/P23991 and previous config saved to /var/cache/conftool/dbconfig/20220331-060820-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T298557)', diff saved to https://phabricator.wikimedia.org/P23990 and previous config saved to /var/cache/conftool/dbconfig/20220331-060517-marostegui.json
  • 06:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 06:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298557)', diff saved to https://phabricator.wikimedia.org/P23989 and previous config saved to /var/cache/conftool/dbconfig/20220331-060509-marostegui.json
  • 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23988 and previous config saved to /var/cache/conftool/dbconfig/20220331-060418-ladsgroup.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1100 to s5 primary and set section read-write T303798', diff saved to https://phabricator.wikimedia.org/P23987 and previous config saved to /var/cache/conftool/dbconfig/20220331-060122-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T303798', diff saved to https://phabricator.wikimedia.org/P23986 and previous config saved to /var/cache/conftool/dbconfig/20220331-060042-root.json
  • 06:00 marostegui: Starting s5 eqiad failover from db1130 to db1100 - T303798
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23985 and previous config saved to /var/cache/conftool/dbconfig/20220331-055004-marostegui.json
  • 05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23984 and previous config saved to /var/cache/conftool/dbconfig/20220331-054913-ladsgroup.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23983 and previous config saved to /var/cache/conftool/dbconfig/20220331-053459-marostegui.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298557)', diff saved to https://phabricator.wikimedia.org/P23981 and previous config saved to /var/cache/conftool/dbconfig/20220331-051954-marostegui.json
  • 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23980 and previous config saved to /var/cache/conftool/dbconfig/20220331-044859-ladsgroup.json
  • 04:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 04:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23979 and previous config saved to /var/cache/conftool/dbconfig/20220331-044851-ladsgroup.json
  • 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 T303798', diff saved to https://phabricator.wikimedia.org/P23978 and previous config saved to /var/cache/conftool/dbconfig/20220331-043906-marostegui.json
  • 04:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 22 hosts with reason: Primary switchover s5 T303798
  • 04:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 22 hosts with reason: Primary switchover s5 T303798
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23977 and previous config saved to /var/cache/conftool/dbconfig/20220331-043346-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23976 and previous config saved to /var/cache/conftool/dbconfig/20220331-041841-ladsgroup.json
  • 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23975 and previous config saved to /var/cache/conftool/dbconfig/20220331-040940-ladsgroup.json
  • 04:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 04:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23974 and previous config saved to /var/cache/conftool/dbconfig/20220331-040916-ladsgroup.json
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23973 and previous config saved to /var/cache/conftool/dbconfig/20220331-040336-ladsgroup.json
  • 03:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23972 and previous config saved to /var/cache/conftool/dbconfig/20220331-035411-ladsgroup.json
  • 03:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T298557)', diff saved to https://phabricator.wikimedia.org/P23971 and previous config saved to /var/cache/conftool/dbconfig/20220331-034709-marostegui.json
  • 03:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 03:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 03:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298557)', diff saved to https://phabricator.wikimedia.org/P23970 and previous config saved to /var/cache/conftool/dbconfig/20220331-034701-marostegui.json
  • 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23969 and previous config saved to /var/cache/conftool/dbconfig/20220331-033906-ladsgroup.json
  • 03:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P23968 and previous config saved to /var/cache/conftool/dbconfig/20220331-033156-marostegui.json
  • 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23967 and previous config saved to /var/cache/conftool/dbconfig/20220331-032401-ladsgroup.json
  • 03:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P23966 and previous config saved to /var/cache/conftool/dbconfig/20220331-031651-marostegui.json
  • 03:15 ejegg: civicrm revision changed from a6f49bb3 to 84c737b6
  • 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23965 and previous config saved to /var/cache/conftool/dbconfig/20220331-030531-ladsgroup.json
  • 03:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 03:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23964 and previous config saved to /var/cache/conftool/dbconfig/20220331-030523-ladsgroup.json
  • 03:04 eileen: civicrm revision changed from a9c323af to a6f49bb3
  • 03:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23963 and previous config saved to /var/cache/conftool/dbconfig/20220331-030321-ladsgroup.json
  • 03:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 03:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 03:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23962 and previous config saved to /var/cache/conftool/dbconfig/20220331-030313-ladsgroup.json
  • 03:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298557)', diff saved to https://phabricator.wikimedia.org/P23961 and previous config saved to /var/cache/conftool/dbconfig/20220331-030146-marostegui.json
  • 02:50 catrope@deploy1002: Synchronized multiversion/MWConfigCacheGenerator.php: Code style-only change to MWConfigCacheGenerator.php (duration: 00m 52s)
  • 02:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23960 and previous config saved to /var/cache/conftool/dbconfig/20220331-025018-ladsgroup.json
  • 02:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23959 and previous config saved to /var/cache/conftool/dbconfig/20220331-024808-ladsgroup.json
  • 02:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23958 and previous config saved to /var/cache/conftool/dbconfig/20220331-023513-ladsgroup.json
  • 02:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23957 and previous config saved to /var/cache/conftool/dbconfig/20220331-023303-ladsgroup.json
  • 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23956 and previous config saved to /var/cache/conftool/dbconfig/20220331-022008-ladsgroup.json
  • 02:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23955 and previous config saved to /var/cache/conftool/dbconfig/20220331-021758-ladsgroup.json
  • 02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23954 and previous config saved to /var/cache/conftool/dbconfig/20220331-021450-ladsgroup.json
  • 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23953 and previous config saved to /var/cache/conftool/dbconfig/20220331-021413-ladsgroup.json
  • 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23952 and previous config saved to /var/cache/conftool/dbconfig/20220331-020643-ladsgroup.json
  • 02:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 02:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23951 and previous config saved to /var/cache/conftool/dbconfig/20220331-020635-ladsgroup.json
  • 01:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23950 and previous config saved to /var/cache/conftool/dbconfig/20220331-015908-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23949 and previous config saved to /var/cache/conftool/dbconfig/20220331-015130-ladsgroup.json
  • 01:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23948 and previous config saved to /var/cache/conftool/dbconfig/20220331-014403-ladsgroup.json
  • 01:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T300775)', diff saved to https://phabricator.wikimedia.org/P23947 and previous config saved to /var/cache/conftool/dbconfig/20220331-014140-marostegui.json
  • 01:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 01:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 01:38 eileen: revision changed from 4bb3ec09 to a9c323af
  • 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23946 and previous config saved to /var/cache/conftool/dbconfig/20220331-013625-ladsgroup.json
  • 01:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23945 and previous config saved to /var/cache/conftool/dbconfig/20220331-012858-ladsgroup.json
  • 01:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T298557)', diff saved to https://phabricator.wikimedia.org/P23944 and previous config saved to /var/cache/conftool/dbconfig/20220331-012734-marostegui.json
  • 01:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 01:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 01:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298557)', diff saved to https://phabricator.wikimedia.org/P23943 and previous config saved to /var/cache/conftool/dbconfig/20220331-012726-marostegui.json
  • 01:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23942 and previous config saved to /var/cache/conftool/dbconfig/20220331-012650-ladsgroup.json
  • 01:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 01:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23941 and previous config saved to /var/cache/conftool/dbconfig/20220331-012637-ladsgroup.json
  • 01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23940 and previous config saved to /var/cache/conftool/dbconfig/20220331-012120-ladsgroup.json
  • 01:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P23939 and previous config saved to /var/cache/conftool/dbconfig/20220331-011221-marostegui.json
  • 01:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23938 and previous config saved to /var/cache/conftool/dbconfig/20220331-011132-ladsgroup.json
  • 00:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P23937 and previous config saved to /var/cache/conftool/dbconfig/20220331-005716-marostegui.json
  • 00:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23936 and previous config saved to /var/cache/conftool/dbconfig/20220331-005627-ladsgroup.json
  • 00:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298557)', diff saved to https://phabricator.wikimedia.org/P23935 and previous config saved to /var/cache/conftool/dbconfig/20220331-004211-marostegui.json
  • 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23934 and previous config saved to /var/cache/conftool/dbconfig/20220331-004122-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23933 and previous config saved to /var/cache/conftool/dbconfig/20220331-003914-ladsgroup.json
  • 00:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23932 and previous config saved to /var/cache/conftool/dbconfig/20220331-003906-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23931 and previous config saved to /var/cache/conftool/dbconfig/20220331-003834-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 00:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23930 and previous config saved to /var/cache/conftool/dbconfig/20220331-003826-ladsgroup.json
  • 00:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23929 and previous config saved to /var/cache/conftool/dbconfig/20220331-002401-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23928 and previous config saved to /var/cache/conftool/dbconfig/20220331-002321-ladsgroup.json
  • 00:17 rzl: rzl@apt1001:~$ sudo -i reprepro -C main include buster-wikimedia /home/rzl/httpbb/buster/httpbb_0.0.1-1_source.changes # T299705
  • 00:13 eileen: revision changed from 951ffb1d to 4bb3ec09
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23927 and previous config saved to /var/cache/conftool/dbconfig/20220331-000856-ladsgroup.json
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23926 and previous config saved to /var/cache/conftool/dbconfig/20220331-000816-ladsgroup.json

2022-03-30

  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23925 and previous config saved to /var/cache/conftool/dbconfig/20220330-235351-ladsgroup.json
  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23924 and previous config saved to /var/cache/conftool/dbconfig/20220330-235311-ladsgroup.json
  • 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23923 and previous config saved to /var/cache/conftool/dbconfig/20220330-235143-ladsgroup.json
  • 23:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 23:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23922 and previous config saved to /var/cache/conftool/dbconfig/20220330-235131-ladsgroup.json
  • 23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23921 and previous config saved to /var/cache/conftool/dbconfig/20220330-233625-ladsgroup.json
  • 23:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23920 and previous config saved to /var/cache/conftool/dbconfig/20220330-232120-ladsgroup.json
  • 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23919 and previous config saved to /var/cache/conftool/dbconfig/20220330-230914-ladsgroup.json
  • 23:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23918 and previous config saved to /var/cache/conftool/dbconfig/20220330-230905-ladsgroup.json
  • 23:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T298557)', diff saved to https://phabricator.wikimedia.org/P23917 and previous config saved to /var/cache/conftool/dbconfig/20220330-230803-marostegui.json
  • 23:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298557)', diff saved to https://phabricator.wikimedia.org/P23916 and previous config saved to /var/cache/conftool/dbconfig/20220330-230755-marostegui.json
  • 23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23915 and previous config saved to /var/cache/conftool/dbconfig/20220330-230615-ladsgroup.json
  • 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23914 and previous config saved to /var/cache/conftool/dbconfig/20220330-230408-ladsgroup.json
  • 23:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 23:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23913 and previous config saved to /var/cache/conftool/dbconfig/20220330-230336-ladsgroup.json
  • 22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23912 and previous config saved to /var/cache/conftool/dbconfig/20220330-225401-ladsgroup.json
  • 22:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P23911 and previous config saved to /var/cache/conftool/dbconfig/20220330-225250-marostegui.json
  • 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23910 and previous config saved to /var/cache/conftool/dbconfig/20220330-224831-ladsgroup.json
  • 22:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23909 and previous config saved to /var/cache/conftool/dbconfig/20220330-223856-ladsgroup.json
  • 22:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P23908 and previous config saved to /var/cache/conftool/dbconfig/20220330-223745-marostegui.json
  • 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23907 and previous config saved to /var/cache/conftool/dbconfig/20220330-223325-ladsgroup.json
  • 22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23906 and previous config saved to /var/cache/conftool/dbconfig/20220330-222351-ladsgroup.json
  • 22:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298557)', diff saved to https://phabricator.wikimedia.org/P23905 and previous config saved to /var/cache/conftool/dbconfig/20220330-222240-marostegui.json
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23904 and previous config saved to /var/cache/conftool/dbconfig/20220330-221820-ladsgroup.json
  • 22:15 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 21:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:21 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23903 and previous config saved to /var/cache/conftool/dbconfig/20220330-211806-ladsgroup.json
  • 21:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 21:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23902 and previous config saved to /var/cache/conftool/dbconfig/20220330-211758-ladsgroup.json
  • 21:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:03 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23900 and previous config saved to /var/cache/conftool/dbconfig/20220330-210253-ladsgroup.json
  • 20:56 ejegg: updated fundraising python tools from 8f5119f6 to af97fc4a
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23899 and previous config saved to /var/cache/conftool/dbconfig/20220330-205529-ladsgroup.json
  • 20:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 20:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23898 and previous config saved to /var/cache/conftool/dbconfig/20220330-205521-ladsgroup.json
  • 20:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 20:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23897 and previous config saved to /var/cache/conftool/dbconfig/20220330-204748-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23896 and previous config saved to /var/cache/conftool/dbconfig/20220330-204016-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23895 and previous config saved to /var/cache/conftool/dbconfig/20220330-203243-ladsgroup.json
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23894 and previous config saved to /var/cache/conftool/dbconfig/20220330-203035-ladsgroup.json
  • 20:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 20:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23893 and previous config saved to /var/cache/conftool/dbconfig/20220330-203028-ladsgroup.json
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23892 and previous config saved to /var/cache/conftool/dbconfig/20220330-202511-ladsgroup.json
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23891 and previous config saved to /var/cache/conftool/dbconfig/20220330-201522-ladsgroup.json
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23890 and previous config saved to /var/cache/conftool/dbconfig/20220330-201006-ladsgroup.json
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T298557)', diff saved to https://phabricator.wikimedia.org/P23889 and previous config saved to /var/cache/conftool/dbconfig/20220330-200236-marostegui.json
  • 20:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 20:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 20:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298557)', diff saved to https://phabricator.wikimedia.org/P23888 and previous config saved to /var/cache/conftool/dbconfig/20220330-200229-marostegui.json
  • 20:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23887 and previous config saved to /var/cache/conftool/dbconfig/20220330-200017-ladsgroup.json
  • 19:56 razzi@cumin1001: END (PASS) - Cookbook sre.kafka.reboot-workers (exit_code=0) for Kafka test-eqiad cluster: Reboot kafka nodes
  • 19:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P23886 and previous config saved to /var/cache/conftool/dbconfig/20220330-194723-marostegui.json
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23885 and previous config saved to /var/cache/conftool/dbconfig/20220330-194512-ladsgroup.json
  • 19:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P23884 and previous config saved to /var/cache/conftool/dbconfig/20220330-193218-marostegui.json
  • 19:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23883 and previous config saved to /var/cache/conftool/dbconfig/20220330-192355-ladsgroup.json
  • 19:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 19:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 19:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23882 and previous config saved to /var/cache/conftool/dbconfig/20220330-192347-ladsgroup.json
  • 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298557)', diff saved to https://phabricator.wikimedia.org/P23881 and previous config saved to /var/cache/conftool/dbconfig/20220330-191713-marostegui.json
  • 19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23880 and previous config saved to /var/cache/conftool/dbconfig/20220330-190842-ladsgroup.json
  • 18:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23879 and previous config saved to /var/cache/conftool/dbconfig/20220330-185337-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23878 and previous config saved to /var/cache/conftool/dbconfig/20220330-184458-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23877 and previous config saved to /var/cache/conftool/dbconfig/20220330-184445-ladsgroup.json
  • 18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23876 and previous config saved to /var/cache/conftool/dbconfig/20220330-183832-ladsgroup.json
  • 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23875 and previous config saved to /var/cache/conftool/dbconfig/20220330-182940-ladsgroup.json
  • 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23874 and previous config saved to /var/cache/conftool/dbconfig/20220330-182537-ladsgroup.json
  • 18:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 18:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23873 and previous config saved to /var/cache/conftool/dbconfig/20220330-181435-ladsgroup.json
  • 18:11 razzi@cumin1001: START - Cookbook sre.kafka.reboot-workers for Kafka test-eqiad cluster: Reboot kafka nodes
  • 18:08 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
  • 18:03 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
  • 18:01 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
  • 18:00 razzi@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host zookeeper-test1002.eqiad.wmnet
  • 18:00 razzi@cumin1001: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
  • 18:00 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23872 and previous config saved to /var/cache/conftool/dbconfig/20220330-175930-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23871 and previous config saved to /var/cache/conftool/dbconfig/20220330-175822-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23870 and previous config saved to /var/cache/conftool/dbconfig/20220330-175814-ladsgroup.json
  • 17:47 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1069.eqiad.wmnet with OS stretch
  • 17:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 17:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 17:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T298557)', diff saved to https://phabricator.wikimedia.org/P23869 and previous config saved to /var/cache/conftool/dbconfig/20220330-174426-marostegui.json
  • 17:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 17:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 17:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298557)', diff saved to https://phabricator.wikimedia.org/P23868 and previous config saved to /var/cache/conftool/dbconfig/20220330-174418-marostegui.json
  • 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23867 and previous config saved to /var/cache/conftool/dbconfig/20220330-174309-ladsgroup.json
  • 17:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23866 and previous config saved to /var/cache/conftool/dbconfig/20220330-172913-marostegui.json
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23865 and previous config saved to /var/cache/conftool/dbconfig/20220330-172804-ladsgroup.json
  • 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T297189)', diff saved to https://phabricator.wikimedia.org/P23864 and previous config saved to /var/cache/conftool/dbconfig/20220330-171732-marostegui.json
  • 17:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23862 and previous config saved to /var/cache/conftool/dbconfig/20220330-171408-marostegui.json
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23861 and previous config saved to /var/cache/conftool/dbconfig/20220330-171259-ladsgroup.json
  • 17:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 17:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P23859 and previous config saved to /var/cache/conftool/dbconfig/20220330-170227-marostegui.json
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23858 and previous config saved to /var/cache/conftool/dbconfig/20220330-170150-ladsgroup.json
  • 17:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23857 and previous config saved to /var/cache/conftool/dbconfig/20220330-170142-ladsgroup.json
  • 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298557)', diff saved to https://phabricator.wikimedia.org/P23856 and previous config saved to /var/cache/conftool/dbconfig/20220330-165903-marostegui.json
  • 16:52 topranks: "Manually decommissioning xe-0/0/1 on lsw1-e2-eqiad before reimage of ms-be1069 from scratch, attempt to replicate ARP error seen previously while running debug."
  • 16:52 volans: sudo systemctl reload icinga.service on alert1001
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P23855 and previous config saved to /var/cache/conftool/dbconfig/20220330-164722-marostegui.json
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23854 and previous config saved to /var/cache/conftool/dbconfig/20220330-164637-ladsgroup.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T297189)', diff saved to https://phabricator.wikimedia.org/P23853 and previous config saved to /var/cache/conftool/dbconfig/20220330-163217-marostegui.json
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23852 and previous config saved to /var/cache/conftool/dbconfig/20220330-163132-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 16:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 16:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
  • 16:24 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
  • 16:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
  • 16:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23850 and previous config saved to /var/cache/conftool/dbconfig/20220330-161626-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23849 and previous config saved to /var/cache/conftool/dbconfig/20220330-161418-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23848 and previous config saved to /var/cache/conftool/dbconfig/20220330-161337-ladsgroup.json
  • 16:04 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab1001.wikimedia.org
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23847 and previous config saved to /var/cache/conftool/dbconfig/20220330-155832-ladsgroup.json
  • 15:52 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab1001.wikimedia.org
  • 15:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T297189)', diff saved to https://phabricator.wikimedia.org/P23845 and previous config saved to /var/cache/conftool/dbconfig/20220330-155139-marostegui.json
  • 15:51 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab-runner2001.codfw.wmnet
  • 15:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 15:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T297189)', diff saved to https://phabricator.wikimedia.org/P23844 and previous config saved to /var/cache/conftool/dbconfig/20220330-155126-marostegui.json
  • 15:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:47 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab-runner2001.codfw.wmnet
  • 15:46 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM gitlab-runner1001.eqiad.wmnet
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23843 and previous config saved to /var/cache/conftool/dbconfig/20220330-154326-ladsgroup.json
  • 15:43 jelto@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab-runner1001.eqiad.wmnet
  • 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P23842 and previous config saved to /var/cache/conftool/dbconfig/20220330-153621-marostegui.json
  • 15:32 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23841 and previous config saved to /var/cache/conftool/dbconfig/20220330-152821-ladsgroup.json
  • 15:26 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
  • 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23840 and previous config saved to /var/cache/conftool/dbconfig/20220330-152613-ladsgroup.json
  • 15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23839 and previous config saved to /var/cache/conftool/dbconfig/20220330-152539-ladsgroup.json
  • 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2009.codfw.wmnet
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P23838 and previous config saved to /var/cache/conftool/dbconfig/20220330-152116-marostegui.json
  • 15:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-client1001.eqiad.wmnet
  • 15:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-client1001.eqiad.wmnet
  • 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2009.codfw.wmnet
  • 15:15 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1003.eqiad.wmnet
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T298557)', diff saved to https://phabricator.wikimedia.org/P23837 and previous config saved to /var/cache/conftool/dbconfig/20220330-151346-marostegui.json
  • 15:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 15:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 15:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298557)', diff saved to https://phabricator.wikimedia.org/P23836 and previous config saved to /var/cache/conftool/dbconfig/20220330-151338-marostegui.json
  • 15:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23835 and previous config saved to /var/cache/conftool/dbconfig/20220330-151034-ladsgroup.json
  • 15:10 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1003.eqiad.wmnet
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T297189)', diff saved to https://phabricator.wikimedia.org/P23834 and previous config saved to /var/cache/conftool/dbconfig/20220330-150611-marostegui.json
  • 15:05 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1002.eqiad.wmnet
  • 15:01 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2008.codfw.wmnet
  • 14:59 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1002.eqiad.wmnet
  • 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P23833 and previous config saved to /var/cache/conftool/dbconfig/20220330-145833-marostegui.json
  • 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2001.codfw.wmnet
  • 14:56 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2008.codfw.wmnet
  • 14:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2007.codfw.wmnet
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23831 and previous config saved to /var/cache/conftool/dbconfig/20220330-145529-ladsgroup.json
  • 14:55 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
  • 14:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2001.codfw.wmnet
  • 14:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2093.codfw.wmnet with OS bullseye
  • 14:50 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2007.codfw.wmnet
  • 14:47 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
  • 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2001.wikimedia.org
  • 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P23830 and previous config saved to /var/cache/conftool/dbconfig/20220330-144328-marostegui.json
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23829 and previous config saved to /var/cache/conftool/dbconfig/20220330-144023-ladsgroup.json
  • 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp2001.wikimedia.org
  • 14:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
  • 14:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2006.codfw.wmnet
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23828 and previous config saved to /var/cache/conftool/dbconfig/20220330-143252-ladsgroup.json
  • 14:32 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2093.codfw.wmnet with reason: host reimage
  • 14:31 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
  • 14:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1001.eqiad.wmnet
  • 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2093.codfw.wmnet with reason: host reimage
  • 14:29 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2006.codfw.wmnet
  • 14:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2005.codfw.wmnet
  • 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298557)', diff saved to https://phabricator.wikimedia.org/P23827 and previous config saved to /var/cache/conftool/dbconfig/20220330-142823-marostegui.json
  • 14:25 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet
  • 14:22 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2005.codfw.wmnet
  • 14:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:19 moritzm: installing remaining tiff security updates
  • 14:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23826 and previous config saved to /var/cache/conftool/dbconfig/20220330-141747-ladsgroup.json
  • 14:15 hashar: deploy1002: `git fetch && git rebase` to catchup with `group1 wikis to 1.39.0-wmf.5` commit which did not get send to Gerrit but got deployed earlier today
  • 14:13 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
  • 14:11 kormat@cumin1001: START - Cookbook sre.hosts.reimage for host db2093.codfw.wmnet with OS bullseye
  • 14:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2003.codfw.wmnet
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T297189)', diff saved to https://phabricator.wikimedia.org/P23825 and previous config saved to /var/cache/conftool/dbconfig/20220330-140556-marostegui.json
  • 14:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T297189)', diff saved to https://phabricator.wikimedia.org/P23824 and previous config saved to /var/cache/conftool/dbconfig/20220330-140549-marostegui.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23823 and previous config saved to /var/cache/conftool/dbconfig/20220330-140242-ladsgroup.json
  • 14:01 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2003.codfw.wmnet
  • 13:59 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2002.codfw.wmnet
  • 13:55 kormat: stopping orchestrator for backend move T301315
  • 13:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2002.codfw.wmnet
  • 13:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2001.codfw.wmnet
  • 13:51 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:51 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:51 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:51 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P23822 and previous config saved to /var/cache/conftool/dbconfig/20220330-135044-marostegui.json
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23821 and previous config saved to /var/cache/conftool/dbconfig/20220330-134737-ladsgroup.json
  • 13:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2001.codfw.wmnet
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23820 and previous config saved to /var/cache/conftool/dbconfig/20220330-134010-ladsgroup.json
  • 13:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 13:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23819 and previous config saved to /var/cache/conftool/dbconfig/20220330-134002-ladsgroup.json
  • 13:36 jayme: restarting pybal on lvs1019 and lvs2009
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P23818 and previous config saved to /var/cache/conftool/dbconfig/20220330-133538-marostegui.json
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23817 and previous config saved to /var/cache/conftool/dbconfig/20220330-133436-ladsgroup.json
  • 13:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 13:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23816 and previous config saved to /var/cache/conftool/dbconfig/20220330-133423-ladsgroup.json
  • 13:33 jayme: restarting pybal on lvs1020 and lvs2010
  • 13:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2003.codfw.wmnet
  • 13:30 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2003.codfw.wmnet
  • 13:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2002.codfw.wmnet
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23815 and previous config saved to /var/cache/conftool/dbconfig/20220330-132457-ladsgroup.json
  • 13:22 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2002.codfw.wmnet
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T297189)', diff saved to https://phabricator.wikimedia.org/P23814 and previous config saved to /var/cache/conftool/dbconfig/20220330-132033-marostegui.json
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23813 and previous config saved to /var/cache/conftool/dbconfig/20220330-131918-ladsgroup.json
  • 13:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2001.codfw.wmnet
  • 13:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2001.codfw.wmnet
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23812 and previous config saved to /var/cache/conftool/dbconfig/20220330-130952-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23811 and previous config saved to /var/cache/conftool/dbconfig/20220330-130413-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23810 and previous config saved to /var/cache/conftool/dbconfig/20220330-125447-ladsgroup.json
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23809 and previous config saved to /var/cache/conftool/dbconfig/20220330-125239-ladsgroup.json
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23808 and previous config saved to /var/cache/conftool/dbconfig/20220330-125201-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23807 and previous config saved to /var/cache/conftool/dbconfig/20220330-124908-ladsgroup.json
  • 12:41 Amir1: start of templatelinks backfill on s3 (T299424)
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T298557)', diff saved to https://phabricator.wikimedia.org/P23806 and previous config saved to /var/cache/conftool/dbconfig/20220330-123931-marostegui.json
  • 12:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 12:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23805 and previous config saved to /var/cache/conftool/dbconfig/20220330-123656-ladsgroup.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T297189)', diff saved to https://phabricator.wikimedia.org/P23804 and previous config saved to /var/cache/conftool/dbconfig/20220330-123249-marostegui.json
  • 12:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:27 mmandere: pool cp2028 with HAProxy as TLS termination layer - T290005
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23802 and previous config saved to /var/cache/conftool/dbconfig/20220330-122151-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23801 and previous config saved to /var/cache/conftool/dbconfig/20220330-120646-ladsgroup.json
  • 12:05 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2028.codfw.wmnet with OS buster
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23800 and previous config saved to /var/cache/conftool/dbconfig/20220330-120439-ladsgroup.json
  • 12:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23799 and previous config saved to /var/cache/conftool/dbconfig/20220330-120426-ladsgroup.json
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23798 and previous config saved to /var/cache/conftool/dbconfig/20220330-115839-ladsgroup.json
  • 11:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23797 and previous config saved to /var/cache/conftool/dbconfig/20220330-115831-ladsgroup.json
  • 11:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23796 and previous config saved to /var/cache/conftool/dbconfig/20220330-114921-ladsgroup.json
  • 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 6 hosts with reason: Maintenance
  • 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 6 hosts with reason: Maintenance
  • 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 11:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23795 and previous config saved to /var/cache/conftool/dbconfig/20220330-114326-ladsgroup.json
  • 11:43 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2028.codfw.wmnet with reason: host reimage
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23794 and previous config saved to /var/cache/conftool/dbconfig/20220330-113416-ladsgroup.json
  • 11:30 moritzm: updating libapache2-mod-auth-cas on buster hosts
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23793 and previous config saved to /var/cache/conftool/dbconfig/20220330-112821-ladsgroup.json
  • 11:24 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2028.codfw.wmnet with OS buster
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23792 and previous config saved to /var/cache/conftool/dbconfig/20220330-111911-ladsgroup.json
  • 11:19 XioNoX: apply urpf strict filter to eqiad cloud-hosts vlan - T285461
  • 11:15 mmandere: depool cp2028 for reimage - T290005
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23791 and previous config saved to /var/cache/conftool/dbconfig/20220330-111316-ladsgroup.json
  • 11:12 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 11:12 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23790 and previous config saved to /var/cache/conftool/dbconfig/20220330-110701-ladsgroup.json
  • 11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23789 and previous config saved to /var/cache/conftool/dbconfig/20220330-110654-ladsgroup.json
  • 11:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 12 hosts with reason: Maintenance
  • 11:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 12 hosts with reason: Maintenance
  • 11:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 11:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 11:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:59 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1009.eqiad.wmnet
  • 10:52 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1009.eqiad.wmnet
  • 10:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
  • 10:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Maintenance
  • 10:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 10:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300775)', diff saved to https://phabricator.wikimedia.org/P23788 and previous config saved to /var/cache/conftool/dbconfig/20220330-105210-marostegui.json
  • 10:52 moritzm: installing glibc updates from Bullseye 11.3 point release
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23787 and previous config saved to /var/cache/conftool/dbconfig/20220330-105149-ladsgroup.json
  • 10:40 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1008.eqiad.wmnet
  • 10:38 mmandere: pool cp2030 with HAProxy as TLS termination layer - T290005
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P23786 and previous config saved to /var/cache/conftool/dbconfig/20220330-103705-marostegui.json
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23785 and previous config saved to /var/cache/conftool/dbconfig/20220330-103644-ladsgroup.json
  • 10:34 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1008.eqiad.wmnet
  • 10:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T297189)', diff saved to https://phabricator.wikimedia.org/P23784 and previous config saved to /var/cache/conftool/dbconfig/20220330-102701-marostegui.json
  • 10:26 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2030.codfw.wmnet with OS buster
  • 10:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1007.eqiad.wmnet
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P23783 and previous config saved to /var/cache/conftool/dbconfig/20220330-102200-marostegui.json
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23782 and previous config saved to /var/cache/conftool/dbconfig/20220330-102138-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23781 and previous config saved to /var/cache/conftool/dbconfig/20220330-101931-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23780 and previous config saved to /var/cache/conftool/dbconfig/20220330-101918-ladsgroup.json
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23779 and previous config saved to /var/cache/conftool/dbconfig/20220330-101847-ladsgroup.json
  • 10:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 10:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23778 and previous config saved to /var/cache/conftool/dbconfig/20220330-101839-ladsgroup.json
  • 10:14 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1007.eqiad.wmnet
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P23777 and previous config saved to /var/cache/conftool/dbconfig/20220330-101156-marostegui.json
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300775)', diff saved to https://phabricator.wikimedia.org/P23776 and previous config saved to /var/cache/conftool/dbconfig/20220330-100654-marostegui.json
  • 10:06 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1006.eqiad.wmnet
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23775 and previous config saved to /var/cache/conftool/dbconfig/20220330-100413-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23774 and previous config saved to /var/cache/conftool/dbconfig/20220330-100333-ladsgroup.json
  • 10:03 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2030.codfw.wmnet with reason: host reimage
  • 10:01 XioNoX: cumin1001:~$ sudo cumin 'ganeti[1005-1028].eqiad.wmnet' 'sysctl -w net.ipv6.conf.analytics.accept_ra=0' - T305034
  • 09:59 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2030.codfw.wmnet with reason: host reimage
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P23773 and previous config saved to /var/cache/conftool/dbconfig/20220330-095651-marostegui.json
  • 09:55 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1006.eqiad.wmnet
  • 09:51 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1005.eqiad.wmnet
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23772 and previous config saved to /var/cache/conftool/dbconfig/20220330-094908-ladsgroup.json
  • 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp1001.wikimedia.org
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23771 and previous config saved to /var/cache/conftool/dbconfig/20220330-094829-ladsgroup.json
  • 09:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp1001.wikimedia.org
  • 09:43 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1005.eqiad.wmnet
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T297189)', diff saved to https://phabricator.wikimedia.org/P23770 and previous config saved to /var/cache/conftool/dbconfig/20220330-094146-marostegui.json
  • 09:40 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2030.codfw.wmnet with OS buster
  • 09:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:35 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1004.eqiad.wmnet
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23769 and previous config saved to /var/cache/conftool/dbconfig/20220330-093403-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23768 and previous config saved to /var/cache/conftool/dbconfig/20220330-093324-ladsgroup.json
  • 09:32 mmandere: depool cp2030 for reimage - T290005
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23767 and previous config saved to /var/cache/conftool/dbconfig/20220330-093156-ladsgroup.json
  • 09:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 09:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23766 and previous config saved to /var/cache/conftool/dbconfig/20220330-093148-ladsgroup.json
  • 09:27 XioNoX: ganeti1025:~$ sudo sysctl -w sysctl net.ipv6.conf.analytics.accept_ra=0 - T305034
  • 09:26 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
  • 09:26 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ores1004.eqiad.wmnet
  • 09:26 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
  • 09:25 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ores1004.eqiad.wmnet
  • 09:25 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
  • 09:25 klausman@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ores1004.eqiad.wmnet
  • 09:24 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1004.eqiad.wmnet
  • 09:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2003.wikimedia.org
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23765 and previous config saved to /var/cache/conftool/dbconfig/20220330-091643-ladsgroup.json
  • 09:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2003.wikimedia.org
  • 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1003.wikimedia.org
  • 09:09 mmandere: pool cp2032 with HAProxy as TLS termination layer - T290005
  • 09:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1003.wikimedia.org
  • 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23764 and previous config saved to /var/cache/conftool/dbconfig/20220330-090138-ladsgroup.json
  • 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6001.wikimedia.org
  • 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5001.wikimedia.org
  • 08:53 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2032.codfw.wmnet with OS buster
  • 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6001.wikimedia.org
  • 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5001.wikimedia.org
  • 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23763 and previous config saved to /var/cache/conftool/dbconfig/20220330-085010-ladsgroup.json
  • 08:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23762 and previous config saved to /var/cache/conftool/dbconfig/20220330-085003-ladsgroup.json
  • 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4001.wikimedia.org
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23761 and previous config saved to /var/cache/conftool/dbconfig/20220330-084633-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23760 and previous config saved to /var/cache/conftool/dbconfig/20220330-084425-ladsgroup.json
  • 08:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 08:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23759 and previous config saved to /var/cache/conftool/dbconfig/20220330-084353-ladsgroup.json
  • 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3001.wikimedia.org
  • 08:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4001.wikimedia.org
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T297189)', diff saved to https://phabricator.wikimedia.org/P23758 and previous config saved to /var/cache/conftool/dbconfig/20220330-083826-marostegui.json
  • 08:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3001.wikimedia.org
  • 08:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T297189)', diff saved to https://phabricator.wikimedia.org/P23757 and previous config saved to /var/cache/conftool/dbconfig/20220330-083819-marostegui.json
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23756 and previous config saved to /var/cache/conftool/dbconfig/20220330-083458-ladsgroup.json
  • 08:33 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
  • 08:30 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2032.codfw.wmnet with reason: host reimage
  • 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23755 and previous config saved to /var/cache/conftool/dbconfig/20220330-082848-ladsgroup.json
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P23754 and previous config saved to /var/cache/conftool/dbconfig/20220330-082314-marostegui.json
  • 08:20 XioNoX: temporarily apply log only RPF filter on eqiad analytics-a
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23753 and previous config saved to /var/cache/conftool/dbconfig/20220330-081952-ladsgroup.json
  • 08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23752 and previous config saved to /var/cache/conftool/dbconfig/20220330-081343-ladsgroup.json
  • 08:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298557)', diff saved to https://phabricator.wikimedia.org/P23751 and previous config saved to /var/cache/conftool/dbconfig/20220330-081128-marostegui.json
  • 08:11 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2032.codfw.wmnet with OS buster
  • 08:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.5 (duration: 01m 00s)
  • 08:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.5
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P23750 and previous config saved to /var/cache/conftool/dbconfig/20220330-080808-marostegui.json
  • 08:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1004.eqiad.wmnet
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23749 and previous config saved to /var/cache/conftool/dbconfig/20220330-080447-ladsgroup.json
  • 08:03 mmandere: depool cp2032 for reimage - T290005
  • 08:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1003.eqiad.wmnet
  • 08:02 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1004.eqiad.wmnet
  • 08:00 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter1003.eqiad.wmnet
  • 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23748 and previous config saved to /var/cache/conftool/dbconfig/20220330-075838-ladsgroup.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P23747 and previous config saved to /var/cache/conftool/dbconfig/20220330-075623-marostegui.json
  • 07:55 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1003.eqiad.wmnet
  • 07:54 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter1003.eqiad.wmnet
  • 07:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2004.codfw.wmnet
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T297189)', diff saved to https://phabricator.wikimedia.org/P23746 and previous config saved to /var/cache/conftool/dbconfig/20220330-075303-marostegui.json
  • 07:50 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2004.codfw.wmnet
  • 07:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1002.eqiad.wmnet
  • 07:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host orespoolcounter2003.codfw.wmnet
  • 07:44 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host orespoolcounter2003.codfw.wmnet
  • 07:42 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1002.eqiad.wmnet
  • 07:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores1001.eqiad.wmnet
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P23745 and previous config saved to /var/cache/conftool/dbconfig/20220330-074118-marostegui.json
  • 07:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet
  • 07:33 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
  • 07:33 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores1001.eqiad.wmnet
  • 07:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
  • 07:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet
  • 07:31 moritzm: updating libapache2-mod-auth-cas on bullseye hosts
  • 07:27 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
  • 07:26 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1002.eqiad.wmnet
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298557)', diff saved to https://phabricator.wikimedia.org/P23744 and previous config saved to /var/cache/conftool/dbconfig/20220330-072613-marostegui.json
  • 07:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23743 and previous config saved to /var/cache/conftool/dbconfig/20220330-072045-ladsgroup.json
  • 07:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23742 and previous config saved to /var/cache/conftool/dbconfig/20220330-072037-ladsgroup.json
  • 07:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T297189)', diff saved to https://phabricator.wikimedia.org/P23741 and previous config saved to /var/cache/conftool/dbconfig/20220330-071650-marostegui.json
  • 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 07:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:15 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
  • 07:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2009.codfw.wmnet
  • 07:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2009.codfw.wmnet
  • 07:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2008.codfw.wmnet
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:08 taavi: UTC morning deploys done
  • 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:08 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Realtime Preview on testwiki (T302506) (duration: 00m 56s)
  • 07:06 elukey: restart rsyslog on ml-serve1002
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23740 and previous config saved to /var/cache/conftool/dbconfig/20220330-070604-root.json
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23739 and previous config saved to /var/cache/conftool/dbconfig/20220330-070532-ladsgroup.json
  • 07:03 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2008.codfw.wmnet
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23738 and previous config saved to /var/cache/conftool/dbconfig/20220330-065822-ladsgroup.json
  • 06:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23737 and previous config saved to /var/cache/conftool/dbconfig/20220330-065814-ladsgroup.json
  • 06:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2007.codfw.wmnet
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23736 and previous config saved to /var/cache/conftool/dbconfig/20220330-065100-root.json
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23735 and previous config saved to /var/cache/conftool/dbconfig/20220330-065027-ladsgroup.json
  • 06:49 jayme: updated scap to 4.5.0 on all hosts - T304134
  • 06:48 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2007.codfw.wmnet
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23734 and previous config saved to /var/cache/conftool/dbconfig/20220330-064309-ladsgroup.json
  • 06:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23733 and previous config saved to /var/cache/conftool/dbconfig/20220330-064037-root.json
  • 06:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2006.codfw.wmnet
  • 06:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23732 and previous config saved to /var/cache/conftool/dbconfig/20220330-063556-root.json
  • 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23731 and previous config saved to /var/cache/conftool/dbconfig/20220330-063522-ladsgroup.json
  • 06:35 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 06:34 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ml-serve1001.eqiad.wmnet
  • 06:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 06:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2006.codfw.wmnet
  • 06:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2005.codfw.wmnet
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23730 and previous config saved to /var/cache/conftool/dbconfig/20220330-062804-ladsgroup.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23729 and previous config saved to /var/cache/conftool/dbconfig/20220330-062533-root.json
  • 06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23728 and previous config saved to /var/cache/conftool/dbconfig/20220330-062203-ladsgroup.json
  • 06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23727 and previous config saved to /var/cache/conftool/dbconfig/20220330-062155-ladsgroup.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23726 and previous config saved to /var/cache/conftool/dbconfig/20220330-062052-root.json
  • 06:20 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2005.codfw.wmnet
  • 06:15 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2004.codfw.wmnet
  • 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23725 and previous config saved to /var/cache/conftool/dbconfig/20220330-061259-ladsgroup.json
  • 06:11 elukey: restart rsyslogd on ml-serve1001
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23724 and previous config saved to /var/cache/conftool/dbconfig/20220330-061051-ladsgroup.json
  • 06:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23723 and previous config saved to /var/cache/conftool/dbconfig/20220330-061042-ladsgroup.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23722 and previous config saved to /var/cache/conftool/dbconfig/20220330-061029-root.json
  • 06:07 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2004.codfw.wmnet
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23721 and previous config saved to /var/cache/conftool/dbconfig/20220330-060650-ladsgroup.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23720 and previous config saved to /var/cache/conftool/dbconfig/20220330-060548-root.json
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23719 and previous config saved to /var/cache/conftool/dbconfig/20220330-055537-ladsgroup.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23718 and previous config saved to /var/cache/conftool/dbconfig/20220330-055525-root.json
  • 05:51 marostegui: dbmaint s6@eqiad T297189
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23717 and previous config saved to /var/cache/conftool/dbconfig/20220330-055145-ladsgroup.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P23716 and previous config saved to /var/cache/conftool/dbconfig/20220330-055045-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T298557)', diff saved to https://phabricator.wikimedia.org/P23715 and previous config saved to /var/cache/conftool/dbconfig/20220330-054548-marostegui.json
  • 05:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 05:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23714 and previous config saved to /var/cache/conftool/dbconfig/20220330-054032-ladsgroup.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23713 and previous config saved to /var/cache/conftool/dbconfig/20220330-054021-root.json
  • 05:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P23712 and previous config saved to /var/cache/conftool/dbconfig/20220330-053745-root.json
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23711 and previous config saved to /var/cache/conftool/dbconfig/20220330-053640-ladsgroup.json
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23710 and previous config saved to /var/cache/conftool/dbconfig/20220330-052525-ladsgroup.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23709 and previous config saved to /var/cache/conftool/dbconfig/20220330-052516-root.json
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23708 and previous config saved to /var/cache/conftool/dbconfig/20220330-052344-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23707 and previous config saved to /var/cache/conftool/dbconfig/20220330-052320-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23706 and previous config saved to /var/cache/conftool/dbconfig/20220330-052312-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23705 and previous config saved to /var/cache/conftool/dbconfig/20220330-052259-ladsgroup.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: After kernel upgrade', diff saved to https://phabricator.wikimedia.org/P23704 and previous config saved to /var/cache/conftool/dbconfig/20220330-052241-root.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for reboot', diff saved to https://phabricator.wikimedia.org/P23703 and previous config saved to /var/cache/conftool/dbconfig/20220330-051524-root.json
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23702 and previous config saved to /var/cache/conftool/dbconfig/20220330-051012-root.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23701 and previous config saved to /var/cache/conftool/dbconfig/20220330-050808-ladsgroup.json
  • 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23700 and previous config saved to /var/cache/conftool/dbconfig/20220330-050754-ladsgroup.json
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 for downgrade', diff saved to https://phabricator.wikimedia.org/P23699 and previous config saved to /var/cache/conftool/dbconfig/20220330-050406-root.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T297189)', diff saved to https://phabricator.wikimedia.org/P23698 and previous config saved to /var/cache/conftool/dbconfig/20220330-045747-marostegui.json
  • 04:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 04:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23697 and previous config saved to /var/cache/conftool/dbconfig/20220330-045303-ladsgroup.json
  • 04:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23696 and previous config saved to /var/cache/conftool/dbconfig/20220330-045249-ladsgroup.json
  • 04:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23695 and previous config saved to /var/cache/conftool/dbconfig/20220330-043758-ladsgroup.json
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23694 and previous config saved to /var/cache/conftool/dbconfig/20220330-043744-ladsgroup.json
  • 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23693 and previous config saved to /var/cache/conftool/dbconfig/20220330-043536-ladsgroup.json
  • 04:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 04:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23692 and previous config saved to /var/cache/conftool/dbconfig/20220330-043528-ladsgroup.json
  • 04:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23691 and previous config saved to /var/cache/conftool/dbconfig/20220330-042443-ladsgroup.json
  • 04:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23690 and previous config saved to /var/cache/conftool/dbconfig/20220330-042435-ladsgroup.json
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23689 and previous config saved to /var/cache/conftool/dbconfig/20220330-042023-ladsgroup.json
  • 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23688 and previous config saved to /var/cache/conftool/dbconfig/20220330-040930-ladsgroup.json
  • 04:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23687 and previous config saved to /var/cache/conftool/dbconfig/20220330-040518-ladsgroup.json
  • 03:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23686 and previous config saved to /var/cache/conftool/dbconfig/20220330-035425-ladsgroup.json
  • 03:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23685 and previous config saved to /var/cache/conftool/dbconfig/20220330-035013-ladsgroup.json
  • 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23684 and previous config saved to /var/cache/conftool/dbconfig/20220330-033920-ladsgroup.json
  • 03:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23683 and previous config saved to /var/cache/conftool/dbconfig/20220330-032617-ladsgroup.json
  • 03:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 03:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 03:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23682 and previous config saved to /var/cache/conftool/dbconfig/20220330-032610-ladsgroup.json
  • 03:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23681 and previous config saved to /var/cache/conftool/dbconfig/20220330-032201-ladsgroup.json
  • 03:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 03:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23680 and previous config saved to /var/cache/conftool/dbconfig/20220330-032154-ladsgroup.json
  • 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23679 and previous config saved to /var/cache/conftool/dbconfig/20220330-031105-ladsgroup.json
  • 03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23678 and previous config saved to /var/cache/conftool/dbconfig/20220330-030649-ladsgroup.json
  • 02:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23677 and previous config saved to /var/cache/conftool/dbconfig/20220330-025600-ladsgroup.json
  • 02:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20220330-025139-ladsgroup.json
  • 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23676 and previous config saved to /var/cache/conftool/dbconfig/20220330-024055-ladsgroup.json
  • 02:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23675 and previous config saved to /var/cache/conftool/dbconfig/20220330-023634-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23674 and previous config saved to /var/cache/conftool/dbconfig/20220330-023426-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 02:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 02:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 02:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 02:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23673 and previous config saved to /var/cache/conftool/dbconfig/20220330-023344-ladsgroup.json
  • 02:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23672 and previous config saved to /var/cache/conftool/dbconfig/20220330-021839-ladsgroup.json
  • 02:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T300775)', diff saved to https://phabricator.wikimedia.org/P23671 and previous config saved to /var/cache/conftool/dbconfig/20220330-021111-marostegui.json
  • 02:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 02:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 02:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 02:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 02:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300775)', diff saved to https://phabricator.wikimedia.org/P23670 and previous config saved to /var/cache/conftool/dbconfig/20220330-021058-marostegui.json
  • 02:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23669 and previous config saved to /var/cache/conftool/dbconfig/20220330-020334-ladsgroup.json
  • 01:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P23668 and previous config saved to /var/cache/conftool/dbconfig/20220330-015552-marostegui.json
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23667 and previous config saved to /var/cache/conftool/dbconfig/20220330-015527-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23666 and previous config saved to /var/cache/conftool/dbconfig/20220330-015519-ladsgroup.json
  • 01:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23665 and previous config saved to /var/cache/conftool/dbconfig/20220330-014829-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23664 and previous config saved to /var/cache/conftool/dbconfig/20220330-014621-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 01:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23663 and previous config saved to /var/cache/conftool/dbconfig/20220330-014549-ladsgroup.json
  • 01:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P23662 and previous config saved to /var/cache/conftool/dbconfig/20220330-014047-marostegui.json
  • 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23661 and previous config saved to /var/cache/conftool/dbconfig/20220330-014014-ladsgroup.json
  • 01:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23660 and previous config saved to /var/cache/conftool/dbconfig/20220330-013044-ladsgroup.json
  • 01:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300775)', diff saved to https://phabricator.wikimedia.org/P23659 and previous config saved to /var/cache/conftool/dbconfig/20220330-012542-marostegui.json
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23658 and previous config saved to /var/cache/conftool/dbconfig/20220330-012509-ladsgroup.json
  • 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23657 and previous config saved to /var/cache/conftool/dbconfig/20220330-011539-ladsgroup.json
  • 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23656 and previous config saved to /var/cache/conftool/dbconfig/20220330-011004-ladsgroup.json
  • 01:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23655 and previous config saved to /var/cache/conftool/dbconfig/20220330-010034-ladsgroup.json
  • 00:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23654 and previous config saved to /var/cache/conftool/dbconfig/20220330-002523-ladsgroup.json
  • 00:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 00:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23653 and previous config saved to /var/cache/conftool/dbconfig/20220330-002515-ladsgroup.json
  • 00:24 catrope@deploy1002: Finished scap: Update Kashmiri namespace names (T304790) (duration: 12m 29s)
  • 00:12 catrope@deploy1002: Started scap: Update Kashmiri namespace names (T304790)
  • 00:12 catrope@deploy1002: scap failed: RuntimeError Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back. (duration: 00m 28s)
  • 00:11 catrope@deploy1002: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back.
  • 00:11 catrope@deploy1002: Started scap: Update Kashmiri namespace names (T304790)
  • 00:10 catrope@deploy1002: scap failed: RuntimeError Scap failed!: 8/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back. (duration: 00m 28s)
  • 00:10 catrope@deploy1002: Scap failed!: 8/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back.
  • 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23652 and previous config saved to /var/cache/conftool/dbconfig/20220330-001010-ladsgroup.json
  • 00:09 catrope@deploy1002: Started scap: Update Kashmiri namespace names (T304790)
  • 00:07 catrope@deploy1002: scap failed: RuntimeError Scap failed!: 6/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back. (duration: 04m 32s)
  • 00:07 catrope@deploy1002: Scap failed!: 6/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back.
  • 00:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:02 catrope@deploy1002: Started scap: Update Kashmiri namespace names (T304790)
  • 00:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23651 and previous config saved to /var/cache/conftool/dbconfig/20220330-000019-ladsgroup.json
  • 00:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23650 and previous config saved to /var/cache/conftool/dbconfig/20220330-000011-ladsgroup.json

2022-03-29

  • 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23649 and previous config saved to /var/cache/conftool/dbconfig/20220329-235505-ladsgroup.json
  • 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23648 and previous config saved to /var/cache/conftool/dbconfig/20220329-234506-ladsgroup.json
  • 23:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23647 and previous config saved to /var/cache/conftool/dbconfig/20220329-234000-ladsgroup.json
  • 23:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23646 and previous config saved to /var/cache/conftool/dbconfig/20220329-233001-ladsgroup.json
  • 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23645 and previous config saved to /var/cache/conftool/dbconfig/20220329-231456-ladsgroup.json
  • 23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23644 and previous config saved to /var/cache/conftool/dbconfig/20220329-231248-ladsgroup.json
  • 23:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 23:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23643 and previous config saved to /var/cache/conftool/dbconfig/20220329-231205-ladsgroup.json
  • 22:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23642 and previous config saved to /var/cache/conftool/dbconfig/20220329-225700-ladsgroup.json
  • 22:50 mutante: cumin1001 - systemctl start httpbb_hourly_appserver fixed Icinga alert after gerrit:774981 T205361
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23641 and previous config saved to /var/cache/conftool/dbconfig/20220329-224652-ladsgroup.json
  • 22:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 22:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23640 and previous config saved to /var/cache/conftool/dbconfig/20220329-224644-ladsgroup.json
  • 22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23639 and previous config saved to /var/cache/conftool/dbconfig/20220329-224155-ladsgroup.json
  • 22:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 22:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 22:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 22:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 22:38 mutante: mwdebug2001 - rebooting
  • 22:36 mutante: mwdebug2002 - rebooting
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23638 and previous config saved to /var/cache/conftool/dbconfig/20220329-223139-ladsgroup.json
  • 22:31 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 22:30 mutante: moscovium (rt.wikimedia.org) - rebooting
  • 22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23637 and previous config saved to /var/cache/conftool/dbconfig/20220329-222650-ladsgroup.json
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23636 and previous config saved to /var/cache/conftool/dbconfig/20220329-222141-ladsgroup.json
  • 22:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23635 and previous config saved to /var/cache/conftool/dbconfig/20220329-222128-ladsgroup.json
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23634 and previous config saved to /var/cache/conftool/dbconfig/20220329-221634-ladsgroup.json
  • 22:14 mutante: doc1001 - rebooting (doc.wikimedia.org)
  • 22:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23633 and previous config saved to /var/cache/conftool/dbconfig/20220329-220623-ladsgroup.json
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23632 and previous config saved to /var/cache/conftool/dbconfig/20220329-220128-ladsgroup.json
  • 21:54 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:54 mutante: cumin1001 systemctl start httpbb_hourly_appserver
  • 21:54 mutante: cumin1001 systemctl status httpbb_hourly_appserver
  • 21:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23631 and previous config saved to /var/cache/conftool/dbconfig/20220329-215118-ladsgroup.json
  • 21:48 mutante: doc1002 - rebooting
  • 21:46 mutante: doc2001 - rebooting
  • 21:38 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23630 and previous config saved to /var/cache/conftool/dbconfig/20220329-213613-ladsgroup.json
  • 21:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 21:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 21:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23629 and previous config saved to /var/cache/conftool/dbconfig/20220329-212804-ladsgroup.json
  • 21:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 21:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 21:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23628 and previous config saved to /var/cache/conftool/dbconfig/20220329-212756-ladsgroup.json
  • 21:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:22 mutante: aphlict1001 - manually starting aphlict service after reboot (was needed for some reason)
  • 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs2007.codfw.wmnet
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:17 mutante: aphlict1001 - rebooting - this will temp break Phabricator realtime notifications but will be back shortly
  • 21:17 mutante: planet1002 - rebooting
  • 21:14 mutante: planet2002 - rebooting
  • 21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23627 and previous config saved to /var/cache/conftool/dbconfig/20220329-211251-ladsgroup.json
  • 21:10 mutante: phab1004 - rebooting
  • 21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23626 and previous config saved to /var/cache/conftool/dbconfig/20220329-210856-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23625 and previous config saved to /var/cache/conftool/dbconfig/20220329-210848-ladsgroup.json
  • 21:05 mutante: phab2002 - rebooting
  • 21:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 20:59 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix I7ce58529cdd320a9500dc215291ef1c369cee9d3: Rearranging restriction levels and add editautopatrolprotected for eliminators. (T303579) (duration: 00m 56s)
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23624 and previous config saved to /var/cache/conftool/dbconfig/20220329-205746-ladsgroup.json
  • 20:57 catrope@deploy1002: Synchronized php-1.39.0-wmf.5/skins/Vector/skin.json: Backport: Restore the classes skin-vector and skin-vector-search-vue to body (duration: 00m 55s)
  • 20:54 catrope@deploy1002: Synchronized php-1.39.0-wmf.4/skins/Vector: Backport: Revert: End migration mode (duration: 00m 53s)
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23623 and previous config saved to /var/cache/conftool/dbconfig/20220329-205343-ladsgroup.json
  • 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:50 catrope@deploy1002: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org). WARNING: canaries have not been rolled back.
  • 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23622 and previous config saved to /var/cache/conftool/dbconfig/20220329-204241-ladsgroup.json
  • 20:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23621 and previous config saved to /var/cache/conftool/dbconfig/20220329-204034-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 20:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 20:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23620 and previous config saved to /var/cache/conftool/dbconfig/20220329-204021-ladsgroup.json
  • 20:39 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add wikimedia.com to wgNoFollowDomainExceptions (T304555) (duration: 01m 06s)
  • 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23619 and previous config saved to /var/cache/conftool/dbconfig/20220329-203838-ladsgroup.json
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:36 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [config]: Deploy gdi-safety-survey to ES,EN,FR and PT wikis (duration: 00m 56s)
  • 20:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:30 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Config for new android schemas (duration: 01m 00s)
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23618 and previous config saved to /var/cache/conftool/dbconfig/20220329-202516-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23617 and previous config saved to /var/cache/conftool/dbconfig/20220329-202333-ladsgroup.json
  • 20:20 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 20:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 20:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T297189)', diff saved to https://phabricator.wikimedia.org/P23616 and previous config saved to /var/cache/conftool/dbconfig/20220329-201611-marostegui.json
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23615 and previous config saved to /var/cache/conftool/dbconfig/20220329-201041-ladsgroup.json
  • 20:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 20:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23614 and previous config saved to /var/cache/conftool/dbconfig/20220329-201011-ladsgroup.json
  • 20:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P23613 and previous config saved to /var/cache/conftool/dbconfig/20220329-200106-marostegui.json
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23612 and previous config saved to /var/cache/conftool/dbconfig/20220329-195505-ladsgroup.json
  • 19:48 eileen: civicrm revision changed from 1c5d10e1 to 951ffb1d
  • 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P23611 and previous config saved to /var/cache/conftool/dbconfig/20220329-194601-marostegui.json
  • 19:43 mforns@deploy1002: Finished deploy [analytics/refinery@8e9f97c] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8e9f97c] (duration: 07m 17s)
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23610 and previous config saved to /var/cache/conftool/dbconfig/20220329-194256-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23609 and previous config saved to /var/cache/conftool/dbconfig/20220329-194248-ladsgroup.json
  • 19:40 moritzm: uploaded cachelib 0.4.1-2~wmf1 to bullseye-wikimedia T301638
  • 19:35 mforns@deploy1002: Started deploy [analytics/refinery@8e9f97c] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8e9f97c]
  • 19:35 mforns@deploy1002: Finished deploy [analytics/refinery@8e9f97c] (thin): Regular analytics weekly train THIN [analytics/refinery@8e9f97c] (duration: 00m 08s)
  • 19:35 mforns@deploy1002: Started deploy [analytics/refinery@8e9f97c] (thin): Regular analytics weekly train THIN [analytics/refinery@8e9f97c]
  • 19:35 mforns@deploy1002: Finished deploy [analytics/refinery@8e9f97c]: Regular analytics weekly train [analytics/refinery@8e9f97c] (duration: 21m 13s)
  • 19:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T297189)', diff saved to https://phabricator.wikimedia.org/P23608 and previous config saved to /var/cache/conftool/dbconfig/20220329-193055-marostegui.json
  • 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 19:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 19:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 19:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23607 and previous config saved to /var/cache/conftool/dbconfig/20220329-192743-ladsgroup.json
  • 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T300775)', diff saved to https://phabricator.wikimedia.org/P23606 and previous config saved to /var/cache/conftool/dbconfig/20220329-191738-marostegui.json
  • 19:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 19:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300775)', diff saved to https://phabricator.wikimedia.org/P23605 and previous config saved to /var/cache/conftool/dbconfig/20220329-191731-marostegui.json
  • 19:14 mforns@deploy1002: Started deploy [analytics/refinery@8e9f97c]: Regular analytics weekly train [analytics/refinery@8e9f97c]
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23604 and previous config saved to /var/cache/conftool/dbconfig/20220329-191238-ladsgroup.json
  • 19:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2001.wikimedia.org
  • 19:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P23603 and previous config saved to /var/cache/conftool/dbconfig/20220329-190226-marostegui.json
  • 19:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
  • 18:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
  • 18:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23602 and previous config saved to /var/cache/conftool/dbconfig/20220329-185733-ladsgroup.json
  • 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23601 and previous config saved to /var/cache/conftool/dbconfig/20220329-185526-ladsgroup.json
  • 18:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23600 and previous config saved to /var/cache/conftool/dbconfig/20220329-185454-ladsgroup.json
  • 18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P23599 and previous config saved to /var/cache/conftool/dbconfig/20220329-184720-marostegui.json
  • 18:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23598 and previous config saved to /var/cache/conftool/dbconfig/20220329-183949-ladsgroup.json
  • 18:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300775)', diff saved to https://phabricator.wikimedia.org/P23597 and previous config saved to /var/cache/conftool/dbconfig/20220329-183215-marostegui.json
  • 18:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T297189)', diff saved to https://phabricator.wikimedia.org/P23596 and previous config saved to /var/cache/conftool/dbconfig/20220329-183041-marostegui.json
  • 18:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 18:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 18:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P23595 and previous config saved to /var/cache/conftool/dbconfig/20220329-183034-marostegui.json
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23594 and previous config saved to /var/cache/conftool/dbconfig/20220329-182444-ladsgroup.json
  • 18:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P23593 and previous config saved to /var/cache/conftool/dbconfig/20220329-181529-marostegui.json
  • 18:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23592 and previous config saved to /var/cache/conftool/dbconfig/20220329-180938-ladsgroup.json
  • 18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:02 moritzm: restarting fpm on mw canaries to pick up new libtiff
  • 18:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P23591 and previous config saved to /var/cache/conftool/dbconfig/20220329-180023-marostegui.json
  • 17:47 moritzm: installing tiff security updates
  • 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P23590 and previous config saved to /var/cache/conftool/dbconfig/20220329-174518-marostegui.json
  • 17:29 mutante: gitlab2001 - systemctl reset-failed
  • 17:23 mutante: gitlab2001 - did not come back from reboot via cookbook. logged in via console. then "s/ens5/ens13" in /etc/network/interfaces ; reboot ; issue was like T272555 and others
  • 17:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:13 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Testing deploy of superset 1.4.2 to staging
  • 17:13 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-tool1005.eqiad.wmnet with reason: Testing deploy of superset 1.4.2 to staging
  • 17:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2003.codfw.wmnet
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23589 and previous config saved to /var/cache/conftool/dbconfig/20220329-170924-ladsgroup.json
  • 17:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 17:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23588 and previous config saved to /var/cache/conftool/dbconfig/20220329-170916-ladsgroup.json
  • 17:04 klausman@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-staging2001.codfw.wmnet
  • 17:04 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
  • 17:03 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2003.codfw.wmnet
  • 17:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2002.codfw.wmnet
  • 17:00 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 17:00 aokoth@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM gitlab2001.wikimedia.org
  • 16:55 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2002.codfw.wmnet
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23586 and previous config saved to /var/cache/conftool/dbconfig/20220329-165411-ladsgroup.json
  • 16:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ores2001.codfw.wmnet
  • 16:45 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ores2001.codfw.wmnet
  • 16:39 hashar@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 16:39 hashar@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23584 and previous config saved to /var/cache/conftool/dbconfig/20220329-163906-ladsgroup.json
  • 16:39 hashar@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 16:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:38 hashar@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 16:38 hashar@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 16:37 hashar@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 16:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P23583 and previous config saved to /var/cache/conftool/dbconfig/20220329-163503-marostegui.json
  • 16:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 16:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 16:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T297189)', diff saved to https://phabricator.wikimedia.org/P23582 and previous config saved to /var/cache/conftool/dbconfig/20220329-163455-marostegui.json
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23581 and previous config saved to /var/cache/conftool/dbconfig/20220329-162401-ladsgroup.json
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23580 and previous config saved to /var/cache/conftool/dbconfig/20220329-162153-ladsgroup.json
  • 16:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23579 and previous config saved to /var/cache/conftool/dbconfig/20220329-162146-ladsgroup.json
  • 16:21 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 16:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P23578 and previous config saved to /var/cache/conftool/dbconfig/20220329-161950-marostegui.json
  • 16:19 aokoth@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab2001.wikimedia.org
  • 16:17 aokoth@cumin1001: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM gitlab2001.wikimedia.org
  • 16:17 aokoth@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM gitlab2001.wikimedia.org
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23577 and previous config saved to /var/cache/conftool/dbconfig/20220329-160640-ladsgroup.json
  • 16:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P23576 and previous config saved to /var/cache/conftool/dbconfig/20220329-160446-marostegui.json
  • 15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23575 and previous config saved to /var/cache/conftool/dbconfig/20220329-155415-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23574 and previous config saved to /var/cache/conftool/dbconfig/20220329-155135-ladsgroup.json
  • 15:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T297189)', diff saved to https://phabricator.wikimedia.org/P23573 and previous config saved to /var/cache/conftool/dbconfig/20220329-154941-marostegui.json
  • 15:47 jayme@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 18s)
  • 15:47 jayme@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 15:46 jayme: updated scap to 4.5.0 on canary hosts - T304134
  • 15:43 jayme: imported scap 4.5.0 to strets-/buster-/bullseye-wikimedia - T304134
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23572 and previous config saved to /var/cache/conftool/dbconfig/20220329-153910-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23571 and previous config saved to /var/cache/conftool/dbconfig/20220329-153630-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23570 and previous config saved to /var/cache/conftool/dbconfig/20220329-153423-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23569 and previous config saved to /var/cache/conftool/dbconfig/20220329-153410-ladsgroup.json
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P23568 and previous config saved to /var/cache/conftool/dbconfig/20220329-152405-ladsgroup.json
  • 15:22 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 15:20 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 15:20 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 15:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 15:19 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23567 and previous config saved to /var/cache/conftool/dbconfig/20220329-151905-ladsgroup.json
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23566 and previous config saved to /var/cache/conftool/dbconfig/20220329-150900-ladsgroup.json
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23565 and previous config saved to /var/cache/conftool/dbconfig/20220329-150359-ladsgroup.json
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T297189)', diff saved to https://phabricator.wikimedia.org/P23564 and previous config saved to /var/cache/conftool/dbconfig/20220329-150253-marostegui.json
  • 15:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T297189)', diff saved to https://phabricator.wikimedia.org/P23563 and previous config saved to /var/cache/conftool/dbconfig/20220329-150245-marostegui.json
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23562 and previous config saved to /var/cache/conftool/dbconfig/20220329-144854-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298565)', diff saved to https://phabricator.wikimedia.org/P23561 and previous config saved to /var/cache/conftool/dbconfig/20220329-144848-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23560 and previous config saved to /var/cache/conftool/dbconfig/20220329-144835-ladsgroup.json
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23559 and previous config saved to /var/cache/conftool/dbconfig/20220329-144747-ladsgroup.json
  • 14:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P23558 and previous config saved to /var/cache/conftool/dbconfig/20220329-144740-marostegui.json
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23557 and previous config saved to /var/cache/conftool/dbconfig/20220329-144739-ladsgroup.json
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23556 and previous config saved to /var/cache/conftool/dbconfig/20220329-143330-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23555 and previous config saved to /var/cache/conftool/dbconfig/20220329-143234-ladsgroup.json
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P23554 and previous config saved to /var/cache/conftool/dbconfig/20220329-141825-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23553 and previous config saved to /var/cache/conftool/dbconfig/20220329-141729-ladsgroup.json
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23552 and previous config saved to /var/cache/conftool/dbconfig/20220329-140320-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23551 and previous config saved to /var/cache/conftool/dbconfig/20220329-140224-ladsgroup.json
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23550 and previous config saved to /var/cache/conftool/dbconfig/20220329-140017-ladsgroup.json
  • 14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23549 and previous config saved to /var/cache/conftool/dbconfig/20220329-140009-ladsgroup.json
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23548 and previous config saved to /var/cache/conftool/dbconfig/20220329-134504-ladsgroup.json
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23547 and previous config saved to /var/cache/conftool/dbconfig/20220329-132959-ladsgroup.json
  • 13:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set write both for all wikis except s1 and s4 (T299421) (duration: 00m 55s)
  • 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:19 urbanecm: UTC afternoon B&C window done
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: d632476: 64226d7: Set IPInfo config for path to MaxMind files (T304604) (duration: 00m 54s)
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23546 and previous config saved to /var/cache/conftool/dbconfig/20220329-131453-ladsgroup.json
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T297189)', diff saved to https://phabricator.wikimedia.org/P23545 and previous config saved to /var/cache/conftool/dbconfig/20220329-131251-marostegui.json
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23544 and previous config saved to /var/cache/conftool/dbconfig/20220329-131246-ladsgroup.json
  • 13:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T297189)', diff saved to https://phabricator.wikimedia.org/P23543 and previous config saved to /var/cache/conftool/dbconfig/20220329-131238-marostegui.json
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23542 and previous config saved to /var/cache/conftool/dbconfig/20220329-131159-ladsgroup.json
  • 13:10 XioNoX: roolback: temporarily apply urpf with action: log only, on cr1-eqiad:xe-3/0/4.1118
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23541 and previous config saved to /var/cache/conftool/dbconfig/20220329-130741-ladsgroup.json
  • 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23540 and previous config saved to /var/cache/conftool/dbconfig/20220329-130733-ladsgroup.json
  • 13:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P23539 and previous config saved to /var/cache/conftool/dbconfig/20220329-125733-marostegui.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23538 and previous config saved to /var/cache/conftool/dbconfig/20220329-125654-ladsgroup.json
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23537 and previous config saved to /var/cache/conftool/dbconfig/20220329-125228-ladsgroup.json
  • 12:51 XioNoX: temporarily apply urpf with action: log only, on cr1-eqiad:xe-3/0/4.1118
  • 12:44 mmandere: pool cp2034 with HAProxy as TLS termination layer - T290005
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P23536 and previous config saved to /var/cache/conftool/dbconfig/20220329-124227-marostegui.json
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23535 and previous config saved to /var/cache/conftool/dbconfig/20220329-124148-ladsgroup.json
  • 12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23534 and previous config saved to /var/cache/conftool/dbconfig/20220329-123723-ladsgroup.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T297189)', diff saved to https://phabricator.wikimedia.org/P23533 and previous config saved to /var/cache/conftool/dbconfig/20220329-122722-marostegui.json
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23532 and previous config saved to /var/cache/conftool/dbconfig/20220329-122643-ladsgroup.json
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23531 and previous config saved to /var/cache/conftool/dbconfig/20220329-122436-ladsgroup.json
  • 12:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23530 and previous config saved to /var/cache/conftool/dbconfig/20220329-122404-ladsgroup.json
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23529 and previous config saved to /var/cache/conftool/dbconfig/20220329-122218-ladsgroup.json
  • 12:17 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2034.codfw.wmnet with OS buster
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T300775)', diff saved to https://phabricator.wikimedia.org/P23528 and previous config saved to /var/cache/conftool/dbconfig/20220329-121248-marostegui.json
  • 12:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300775)', diff saved to https://phabricator.wikimedia.org/P23527 and previous config saved to /var/cache/conftool/dbconfig/20220329-121240-marostegui.json
  • 12:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23526 and previous config saved to /var/cache/conftool/dbconfig/20220329-120859-ladsgroup.json
  • 12:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:02 hashar@deploy1002: Synchronized php-1.39.0-wmf.5/skins/Timeless/includes/TimelessTemplate.php: Use null coalescing operator - T304917 (duration: 06m 50s)
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P23525 and previous config saved to /var/cache/conftool/dbconfig/20220329-115735-marostegui.json
  • 11:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:56 mmandere@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23524 and previous config saved to /var/cache/conftool/dbconfig/20220329-115354-ladsgroup.json
  • 11:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2034.codfw.wmnet with reason: host reimage
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P23523 and previous config saved to /var/cache/conftool/dbconfig/20220329-114230-marostegui.json
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23522 and previous config saved to /var/cache/conftool/dbconfig/20220329-113849-ladsgroup.json
  • 11:33 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2034.codfw.wmnet with OS buster
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P23521 and previous config saved to /var/cache/conftool/dbconfig/20220329-112958-root.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300775)', diff saved to https://phabricator.wikimedia.org/P23520 and previous config saved to /var/cache/conftool/dbconfig/20220329-112725-marostegui.json
  • 11:25 mmandere: depool cp2034 for reimage - T290005
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23519 and previous config saved to /var/cache/conftool/dbconfig/20220329-112109-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23518 and previous config saved to /var/cache/conftool/dbconfig/20220329-112101-ladsgroup.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P23517 and previous config saved to /var/cache/conftool/dbconfig/20220329-111454-root.json
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23516 and previous config saved to /var/cache/conftool/dbconfig/20220329-110555-ladsgroup.json
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T297189)', diff saved to https://phabricator.wikimedia.org/P23515 and previous config saved to /var/cache/conftool/dbconfig/20220329-110024-marostegui.json
  • 11:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P23514 and previous config saved to /var/cache/conftool/dbconfig/20220329-110016-marostegui.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P23513 and previous config saved to /var/cache/conftool/dbconfig/20220329-105950-root.json
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23512 and previous config saved to /var/cache/conftool/dbconfig/20220329-105050-ladsgroup.json
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P23511 and previous config saved to /var/cache/conftool/dbconfig/20220329-104511-marostegui.json
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P23510 and previous config saved to /var/cache/conftool/dbconfig/20220329-104446-root.json
  • 10:43 mmandere: pool cp2027 with HAProxy as TLS termination layer - T290005
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23509 and previous config saved to /var/cache/conftool/dbconfig/20220329-103834-ladsgroup.json
  • 10:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23508 and previous config saved to /var/cache/conftool/dbconfig/20220329-103826-ladsgroup.json
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23507 and previous config saved to /var/cache/conftool/dbconfig/20220329-103544-ladsgroup.json
  • 10:35 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2027.codfw.wmnet with OS buster
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P23506 and previous config saved to /var/cache/conftool/dbconfig/20220329-103006-marostegui.json
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P23505 and previous config saved to /var/cache/conftool/dbconfig/20220329-102942-root.json
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23504 and previous config saved to /var/cache/conftool/dbconfig/20220329-102321-ladsgroup.json
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P23503 and previous config saved to /var/cache/conftool/dbconfig/20220329-101501-marostegui.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 5%: After reimage', diff saved to https://phabricator.wikimedia.org/P23502 and previous config saved to /var/cache/conftool/dbconfig/20220329-101439-root.json
  • 10:13 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
  • 10:10 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2027.codfw.wmnet with reason: host reimage
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23501 and previous config saved to /var/cache/conftool/dbconfig/20220329-100821-root.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23500 and previous config saved to /var/cache/conftool/dbconfig/20220329-100816-ladsgroup.json
  • 10:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 10:02 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 10:02 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 1%: After reimage', diff saved to https://phabricator.wikimedia.org/P23499 and previous config saved to /var/cache/conftool/dbconfig/20220329-095935-root.json
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1157.eqiad.wmnet with OS bullseye
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23498 and previous config saved to /var/cache/conftool/dbconfig/20220329-095317-root.json
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23497 and previous config saved to /var/cache/conftool/dbconfig/20220329-095310-ladsgroup.json
  • 09:51 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2027.codfw.wmnet with OS buster
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23496 and previous config saved to /var/cache/conftool/dbconfig/20220329-095103-ladsgroup.json
  • 09:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 09:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23495 and previous config saved to /var/cache/conftool/dbconfig/20220329-095026-ladsgroup.json
  • 09:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298565)', diff saved to https://phabricator.wikimedia.org/P23494 and previous config saved to /var/cache/conftool/dbconfig/20220329-094342-ladsgroup.json
  • 09:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23493 and previous config saved to /var/cache/conftool/dbconfig/20220329-094334-ladsgroup.json
  • 09:43 mmandere: depool cp2027 for reimage - T290005
  • 09:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: host reimage
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23492 and previous config saved to /var/cache/conftool/dbconfig/20220329-093807-root.json
  • 09:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: host reimage
  • 09:35 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23491 and previous config saved to /var/cache/conftool/dbconfig/20220329-093521-ladsgroup.json
  • 09:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:31 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.5 refs T300204
  • 09:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23490 and previous config saved to /var/cache/conftool/dbconfig/20220329-092829-ladsgroup.json
  • 09:28 hashar@deploy1002: Pruned MediaWiki: 1.39.0-wmf.1 (duration: 03m 49s)
  • 09:24 hashar@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.5 (duration: 77m 17s)
  • 09:24 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1157.eqiad.wmnet with OS bullseye
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23489 and previous config saved to /var/cache/conftool/dbconfig/20220329-092303-root.json
  • 09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23488 and previous config saved to /var/cache/conftool/dbconfig/20220329-092016-ladsgroup.json
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23487 and previous config saved to /var/cache/conftool/dbconfig/20220329-091324-ladsgroup.json
  • 09:11 marostegui: dbmaint s3@eqiad T298294
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23486 and previous config saved to /var/cache/conftool/dbconfig/20220329-090759-root.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P23485 and previous config saved to /var/cache/conftool/dbconfig/20220329-090737-marostegui.json
  • 09:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23484 and previous config saved to /var/cache/conftool/dbconfig/20220329-090510-ladsgroup.json
  • 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23483 and previous config saved to /var/cache/conftool/dbconfig/20220329-090303-ladsgroup.json
  • 09:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23482 and previous config saved to /var/cache/conftool/dbconfig/20220329-090250-ladsgroup.json
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23481 and previous config saved to /var/cache/conftool/dbconfig/20220329-085819-ladsgroup.json
  • 08:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23480 and previous config saved to /var/cache/conftool/dbconfig/20220329-084745-ladsgroup.json
  • 08:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:41 marostegui: dbmaint s3@eqiad T298557
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23479 and previous config saved to /var/cache/conftool/dbconfig/20220329-083240-ladsgroup.json
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23478 and previous config saved to /var/cache/conftool/dbconfig/20220329-081735-ladsgroup.json
  • 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23477 and previous config saved to /var/cache/conftool/dbconfig/20220329-081527-ladsgroup.json
  • 08:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 08:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23476 and previous config saved to /var/cache/conftool/dbconfig/20220329-081519-ladsgroup.json
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298565)', diff saved to https://phabricator.wikimedia.org/P23475 and previous config saved to /var/cache/conftool/dbconfig/20220329-081124-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23474 and previous config saved to /var/cache/conftool/dbconfig/20220329-081116-ladsgroup.json
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:07 hashar@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.5
  • 08:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:02 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host hppxetest2001.codfw.wmnet with OS bullseye
  • 08:01 marostegui: dbmaint s3@eqiad T298563
  • 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23473 and previous config saved to /var/cache/conftool/dbconfig/20220329-080014-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23472 and previous config saved to /var/cache/conftool/dbconfig/20220329-075611-ladsgroup.json
  • 07:48 marostegui: dbmaint s3@eqiad T298554
  • 07:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23471 and previous config saved to /var/cache/conftool/dbconfig/20220329-074509-ladsgroup.json
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23470 and previous config saved to /var/cache/conftool/dbconfig/20220329-074106-ladsgroup.json
  • 07:37 marostegui: dbmaint s6@eqiad T297189
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P23469 and previous config saved to /var/cache/conftool/dbconfig/20220329-073703-root.json
  • 07:36 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host hppxetest2001.codfw.wmnet with OS bullseye
  • 07:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host hppxetest2001.codfw.wmnet
  • 07:34 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host hppxetest2001.codfw.wmnet
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23468 and previous config saved to /var/cache/conftool/dbconfig/20220329-073004-ladsgroup.json
  • 07:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23467 and previous config saved to /var/cache/conftool/dbconfig/20220329-072756-ladsgroup.json
  • 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23466 and previous config saved to /var/cache/conftool/dbconfig/20220329-072744-ladsgroup.json
  • 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23465 and previous config saved to /var/cache/conftool/dbconfig/20220329-072601-ladsgroup.json
  • 07:24 taavi: UTC morning deploys done
  • 07:23 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add viwiki eliminators to wgContentTranslationPublishRequirements (T299636) (duration: 00m 50s)
  • 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23464 and previous config saved to /var/cache/conftool/dbconfig/20220329-071239-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298565)', diff saved to https://phabricator.wikimedia.org/P23463 and previous config saved to /var/cache/conftool/dbconfig/20220329-071148-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 07:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23462 and previous config saved to /var/cache/conftool/dbconfig/20220329-071140-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23461 and previous config saved to /var/cache/conftool/dbconfig/20220329-065734-ladsgroup.json
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23460 and previous config saved to /var/cache/conftool/dbconfig/20220329-065635-ladsgroup.json
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23459 and previous config saved to /var/cache/conftool/dbconfig/20220329-064229-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23458 and previous config saved to /var/cache/conftool/dbconfig/20220329-064130-ladsgroup.json
  • 06:40 _joe_: restarting varnish text-fe on cp1079
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23457 and previous config saved to /var/cache/conftool/dbconfig/20220329-064021-ladsgroup.json
  • 06:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 06:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23456 and previous config saved to /var/cache/conftool/dbconfig/20220329-064013-ladsgroup.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298556)', diff saved to https://phabricator.wikimedia.org/P23455 and previous config saved to /var/cache/conftool/dbconfig/20220329-062912-marostegui.json
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23454 and previous config saved to /var/cache/conftool/dbconfig/20220329-062625-ladsgroup.json
  • 06:25 marostegui: dbmaint s3@eqiad T300775
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23453 and previous config saved to /var/cache/conftool/dbconfig/20220329-062508-ladsgroup.json
  • 06:17 marostegui: dbmaint s3@eqiad T300381
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23452 and previous config saved to /var/cache/conftool/dbconfig/20220329-061407-marostegui.json
  • 06:11 marostegui: Maintenance on db1157 (old s3 master) T301848
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23451 and previous config saved to /var/cache/conftool/dbconfig/20220329-061004-ladsgroup.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157 T301850', diff saved to https://phabricator.wikimedia.org/P23450 and previous config saved to /var/cache/conftool/dbconfig/20220329-060532-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1123 to s3 primary and set section read-write T301850', diff saved to https://phabricator.wikimedia.org/P23449 and previous config saved to /var/cache/conftool/dbconfig/20220329-060059-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T301850', diff saved to https://phabricator.wikimedia.org/P23448 and previous config saved to /var/cache/conftool/dbconfig/20220329-060024-marostegui.json
  • 06:00 marostegui: Starting s3 eqiad failover from db1157 to db1123 - T301850
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23447 and previous config saved to /var/cache/conftool/dbconfig/20220329-055902-marostegui.json
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298565)', diff saved to https://phabricator.wikimedia.org/P23446 and previous config saved to /var/cache/conftool/dbconfig/20220329-055544-ladsgroup.json
  • 05:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 05:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23445 and previous config saved to /var/cache/conftool/dbconfig/20220329-055458-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23444 and previous config saved to /var/cache/conftool/dbconfig/20220329-055251-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298556)', diff saved to https://phabricator.wikimedia.org/P23443 and previous config saved to /var/cache/conftool/dbconfig/20220329-054357-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T298556)', diff saved to https://phabricator.wikimedia.org/P23442 and previous config saved to /var/cache/conftool/dbconfig/20220329-052331-marostegui.json
  • 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T300775)', diff saved to https://phabricator.wikimedia.org/P23441 and previous config saved to /var/cache/conftool/dbconfig/20220329-051951-marostegui.json
  • 05:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 05:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300775)', diff saved to https://phabricator.wikimedia.org/P23440 and previous config saved to /var/cache/conftool/dbconfig/20220329-051943-marostegui.json
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P23439 and previous config saved to /var/cache/conftool/dbconfig/20220329-050438-marostegui.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1123 with weight 0 T301850', diff saved to https://phabricator.wikimedia.org/P23438 and previous config saved to /var/cache/conftool/dbconfig/20220329-050234-root.json
  • 05:02 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 20 hosts with reason: Primary switchover s3 T301850
  • 05:02 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 20 hosts with reason: Primary switchover s3 T301850
  • 04:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P23437 and previous config saved to /var/cache/conftool/dbconfig/20220329-044933-marostegui.json
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300775)', diff saved to https://phabricator.wikimedia.org/P23436 and previous config saved to /var/cache/conftool/dbconfig/20220329-043428-marostegui.json
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-03-28

  • 23:15 eileen: civicrm revision 15d22bd1 -> 1c5d10e1
  • 23:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T300775)', diff saved to https://phabricator.wikimedia.org/P23434 and previous config saved to /var/cache/conftool/dbconfig/20220328-230012-marostegui.json
  • 23:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 23:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 23:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300775)', diff saved to https://phabricator.wikimedia.org/P23433 and previous config saved to /var/cache/conftool/dbconfig/20220328-230004-marostegui.json
  • 22:52 ejegg: updated fundraising python tools from 409c80b7 to 8f5119f6
  • 22:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P23431 and previous config saved to /var/cache/conftool/dbconfig/20220328-224459-marostegui.json
  • 22:39 rzl: rzl@cumin2002:~$ sudo cumin A:mw 'enable-puppet T205361'
  • 22:31 rzl: rzl@cumin2002:~$ sudo cumin A:mw 'disable-puppet T205361'
  • 22:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P23430 and previous config saved to /var/cache/conftool/dbconfig/20220328-222953-marostegui.json
  • 22:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300775)', diff saved to https://phabricator.wikimedia.org/P23429 and previous config saved to /var/cache/conftool/dbconfig/20220328-221448-marostegui.json
  • 21:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:32 sbassett: Undeployed sec patch for T285159, which caused a high volume of errors on the canaries
  • 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:12 eileen: civicrm revision 4e5b37c3 -> 15d22bd1
  • 21:09 eileen: tools revision changed from d1d7b100 to 409c80b7
  • 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:06 eileen: revision changed from d1d7b100 to 409c80b7
  • 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:03 sbassett@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Deploy CS-labs.php config to set StopForumSpam to enforce on beta (duration: 01m 03s)
  • 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:34 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.4/extensions/VisualEditor/modules/ve-mw/ui/ve.ui.MWSequenceRegistry.js: f32ae21: Disable backtick sequence in ve-mw while conflict with Catalan is investigated (T304804) (duration: 00m 57s)
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:22 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: dfa9638: Stop writing to $wmfAllServices (T45956) (duration: 00m 55s)
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e8a5b3b: GrowthExperiments: Add more expanded topics for GLAM campaign (T301029) (duration: 00m 50s)
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:14 herron: pruned /var/log/apache2/puppetmaster.puppet.log.[123]* on puppetmaster1001 T304898
  • 19:20 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 19:09 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 19:09 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=phab2001-vcs.codfw.wmnet
  • 19:07 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=phab2001.codfw.wmnet
  • 18:53 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=phab2001.codfw.wmnet
  • 18:50 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=phab2001.codfw.wmnet
  • 17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298556)', diff saved to https://phabricator.wikimedia.org/P23426 and previous config saved to /var/cache/conftool/dbconfig/20220328-173340-marostegui.json
  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P23425 and previous config saved to /var/cache/conftool/dbconfig/20220328-171835-marostegui.json
  • 17:05 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1142.eqiad.wmnet with OS buster
  • 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P23424 and previous config saved to /var/cache/conftool/dbconfig/20220328-170330-marostegui.json
  • 16:59 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298556)', diff saved to https://phabricator.wikimedia.org/P23423 and previous config saved to /var/cache/conftool/dbconfig/20220328-164825-marostegui.json
  • 16:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1142.eqiad.wmnet with OS buster
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T300775)', diff saved to https://phabricator.wikimedia.org/P23422 and previous config saved to /var/cache/conftool/dbconfig/20220328-163903-marostegui.json
  • 16:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 16:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300775)', diff saved to https://phabricator.wikimedia.org/P23421 and previous config saved to /var/cache/conftool/dbconfig/20220328-163855-marostegui.json
  • 16:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 16:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T298556)', diff saved to https://phabricator.wikimedia.org/P23420 and previous config saved to /var/cache/conftool/dbconfig/20220328-162644-marostegui.json
  • 16:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298556)', diff saved to https://phabricator.wikimedia.org/P23419 and previous config saved to /var/cache/conftool/dbconfig/20220328-162633-marostegui.json
  • 16:24 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P23418 and previous config saved to /var/cache/conftool/dbconfig/20220328-162350-marostegui.json
  • 16:22 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:20 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:19 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:16 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:14 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:13 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P23417 and previous config saved to /var/cache/conftool/dbconfig/20220328-161128-marostegui.json
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P23416 and previous config saved to /var/cache/conftool/dbconfig/20220328-160845-marostegui.json
  • 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P23415 and previous config saved to /var/cache/conftool/dbconfig/20220328-155622-marostegui.json
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300775)', diff saved to https://phabricator.wikimedia.org/P23414 and previous config saved to /var/cache/conftool/dbconfig/20220328-155340-marostegui.json
  • 15:52 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@b5b63c3]: (no justification provided) (duration: 02m 09s)
  • 15:50 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@b5b63c3]: (no justification provided)
  • 15:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298556)', diff saved to https://phabricator.wikimedia.org/P23413 and previous config saved to /var/cache/conftool/dbconfig/20220328-154117-marostegui.json
  • 15:39 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 00s)
  • 15:38 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 54s)
  • 15:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2027.codfw.wmnet with OS buster
  • 15:23 moritzm: imported libapache2-mod-auth-cas 1.2-1+wmf11u2 to apt.wikimedia.org/bullseye-wikimedia
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T298556)', diff saved to https://phabricator.wikimedia.org/P23412 and previous config saved to /var/cache/conftool/dbconfig/20220328-152114-marostegui.json
  • 15:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 15:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298556)', diff saved to https://phabricator.wikimedia.org/P23411 and previous config saved to /var/cache/conftool/dbconfig/20220328-152105-marostegui.json
  • 15:15 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubernetes[1001-1004].eqiad.wmnet
  • 15:15 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubernetes[2001-2004].codfw.wmnet
  • 15:11 moritzm: imported libapache2-mod-auth-cas 1.2-1+wmf10u2 to apt.wikimedia.org/buster-wikimedia
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P23410 and previous config saved to /var/cache/conftool/dbconfig/20220328-150600-marostegui.json
  • 15:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2027.codfw.wmnet with OS buster
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P23409 and previous config saved to /var/cache/conftool/dbconfig/20220328-145055-marostegui.json
  • 14:48 akosiaris@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:47 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:47 inflatador: 'bking@cumin1001 repooling wdqs services in IAD ref T302494'
  • 14:46 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=eqiad
  • 14:45 bking@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs*,name=eqiad
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298556)', diff saved to https://phabricator.wikimedia.org/P23408 and previous config saved to /var/cache/conftool/dbconfig/20220328-143550-marostegui.json
  • 14:28 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:20 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 14:20 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:19 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T298556)', diff saved to https://phabricator.wikimedia.org/P23407 and previous config saved to /var/cache/conftool/dbconfig/20220328-141552-marostegui.json
  • 14:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 14:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298556)', diff saved to https://phabricator.wikimedia.org/P23406 and previous config saved to /var/cache/conftool/dbconfig/20220328-141544-marostegui.json
  • 14:15 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts kubernetes[2001-2004].codfw.wmnet
  • 14:13 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts kubernetes[1001-1004].eqiad.wmnet
  • 14:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:12 akosiaris: decommission kubernetes100[1-4]. T303044
  • 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:07 mmandere: pool cp2029 with HAProxy as TLS termination layer - T290005
  • 14:06 taavi: deploy security patch for T226212
  • 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23405 and previous config saved to /var/cache/conftool/dbconfig/20220328-140039-marostegui.json
  • 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: Migrate $wmfAllServices to $wmgAllServices (T45956) (5/5, prod noop) (duration: 01m 04s)
  • 13:57 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CirrusSearch-labs.php: Config: Migrate $wmfAllServices to $wmgAllServices (T45956) (4/5, prod noop) (duration: 01m 07s)
  • 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/filebackend.php: Config: Migrate $wmfAllServices to $wmgAllServices (T45956) (3/5) (duration: 00m 51s)
  • 13:54 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Migrate $wmfAllServices to $wmgAllServices (T45956) (2/5) (duration: 00m 56s)
  • 13:53 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CirrusSearch-production.php: Config: Migrate $wmfAllServices to $wmgAllServices (T45956) (1/5) (duration: 00m 51s)
  • 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/throttle.php: Config: Throttle: Add rule for Bard College class project on enwiki (T304687) (duration: 00m 54s)
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P23404 and previous config saved to /var/cache/conftool/dbconfig/20220328-134534-marostegui.json
  • 13:40 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2029.codfw.wmnet with OS buster
  • 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298556)', diff saved to https://phabricator.wikimedia.org/P23403 and previous config saved to /var/cache/conftool/dbconfig/20220328-133029-marostegui.json
  • 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:23 lucaswerkmeister-wmde@deploy1002: Synchronized phpcs.xml: Config: phpcs: narrow some exclusions only needed for cirrusTest.php (T171115) (2/2) (duration: 00m 55s)
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:22 lucaswerkmeister-wmde@deploy1002: Synchronized tests/cirrusTest.php: Config: phpcs: narrow some exclusions only needed for cirrusTest.php (T171115) (1/2) (duration: 00m 56s)
  • 13:18 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
  • 13:17 lucaswerkmeister-wmde@deploy1002: Synchronized phpcs.xml: Config: phpcs: enable passing rule UnusedGlobalVariables (T171115) (includes phpcs.xml change from previous sync) (duration: 00m 56s)
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:15 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2029.codfw.wmnet with reason: host reimage
  • 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:13 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: phpcs: enable and fix SingleSpaceBeforeSingleLineComment (T171115) (phpcs.xml will be synced with next patch) (duration: 01m 01s)
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Write "unexpectedUnconnectedPage" page prop everywhere (duration: 00m 56s)
  • 13:04 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:03 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:57 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2029.codfw.wmnet with OS buster
  • 12:50 mmandere: depool cp2029 for reimage - T290005
  • 12:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:44 moritzm: installing Intel microcode updates 2022-02-07 on Buster
  • 12:44 mmandere: pool cp2031 with HAProxy as TLS termination layer - T290005
  • 12:43 urbanecm: Clear signup authentication throttle per https://wikitech.wikimedia.org/wiki/Increasing_account_creation_threshold for 195.113.155.4 (T304836)
  • 12:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:41 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: 3ba524d: throttle: Add rule for Czech Wikigap 2022 (T304836) (duration: 00m 52s)
  • 12:40 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 12:39 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 12:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2031.codfw.wmnet with OS buster
  • 12:36 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 12:36 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 12:34 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 12:34 jayme@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 12:32 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 12:31 jayme@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T298556)', diff saved to https://phabricator.wikimedia.org/P23402 and previous config saved to /var/cache/conftool/dbconfig/20220328-123015-marostegui.json
  • 12:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298556)', diff saved to https://phabricator.wikimedia.org/P23401 and previous config saved to /var/cache/conftool/dbconfig/20220328-123007-marostegui.json
  • 12:16 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P23400 and previous config saved to /var/cache/conftool/dbconfig/20220328-121501-marostegui.json
  • 12:13 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2031.codfw.wmnet with reason: host reimage
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P23399 and previous config saved to /var/cache/conftool/dbconfig/20220328-115956-marostegui.json
  • 11:55 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2031.codfw.wmnet with OS buster
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298556)', diff saved to https://phabricator.wikimedia.org/P23398 and previous config saved to /var/cache/conftool/dbconfig/20220328-114451-marostegui.json
  • 11:44 mmandere: depool cp2031 for reimage - T290005
  • 11:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 11:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 11:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 11:25 moritzm: installing Intel microcode updates 2022-02-07 on Bullseye
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T298556)', diff saved to https://phabricator.wikimedia.org/P23397 and previous config saved to /var/cache/conftool/dbconfig/20220328-112352-marostegui.json
  • 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298556)', diff saved to https://phabricator.wikimedia.org/P23396 and previous config saved to /var/cache/conftool/dbconfig/20220328-112345-marostegui.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P23395 and previous config saved to /var/cache/conftool/dbconfig/20220328-110839-marostegui.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P23394 and previous config saved to /var/cache/conftool/dbconfig/20220328-105333-marostegui.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298556)', diff saved to https://phabricator.wikimedia.org/P23393 and previous config saved to /var/cache/conftool/dbconfig/20220328-103828-marostegui.json
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T300775)', diff saved to https://phabricator.wikimedia.org/P23392 and previous config saved to /var/cache/conftool/dbconfig/20220328-102915-marostegui.json
  • 10:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 10:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23391 and previous config saved to /var/cache/conftool/dbconfig/20220328-102014-root.json
  • 10:17 mmandere: pool cp2033 with HAProxy as TLS termination layer - T290005
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T298556)', diff saved to https://phabricator.wikimedia.org/P23390 and previous config saved to /var/cache/conftool/dbconfig/20220328-101712-marostegui.json
  • 10:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298556)', diff saved to https://phabricator.wikimedia.org/P23389 and previous config saved to /var/cache/conftool/dbconfig/20220328-101704-marostegui.json
  • 10:13 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2033.codfw.wmnet with OS buster
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23387 and previous config saved to /var/cache/conftool/dbconfig/20220328-100511-root.json
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P23386 and previous config saved to /var/cache/conftool/dbconfig/20220328-100159-marostegui.json
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23385 and previous config saved to /var/cache/conftool/dbconfig/20220328-095007-root.json
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P23384 and previous config saved to /var/cache/conftool/dbconfig/20220328-094653-marostegui.json
  • 09:46 moritzm: installing Linux 4.9.303 on Stretch hosts
  • 09:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2033.codfw.wmnet with reason: host reimage
  • 09:43 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2033.codfw.wmnet with reason: host reimage
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23383 and previous config saved to /var/cache/conftool/dbconfig/20220328-093503-root.json
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298556)', diff saved to https://phabricator.wikimedia.org/P23382 and previous config saved to /var/cache/conftool/dbconfig/20220328-093148-marostegui.json
  • 09:24 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp2033.codfw.wmnet with OS buster
  • 09:13 moritzm: installing Linux 4.19.235 on Buster hosts
  • 09:11 mmandere: depool cp2033 for reimage - T290005
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T298556)', diff saved to https://phabricator.wikimedia.org/P23379 and previous config saved to /var/cache/conftool/dbconfig/20220328-091041-marostegui.json
  • 09:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298556)', diff saved to https://phabricator.wikimedia.org/P23378 and previous config saved to /var/cache/conftool/dbconfig/20220328-091033-marostegui.json
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After downgrade ', diff saved to https://phabricator.wikimedia.org/P23377 and previous config saved to /var/cache/conftool/dbconfig/20220328-090445-root.json
  • 09:03 moritzm: installing Linux 5.10.106 on Bullseye hosts
  • 08:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P23376 and previous config saved to /var/cache/conftool/dbconfig/20220328-085528-marostegui.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23375 and previous config saved to /var/cache/conftool/dbconfig/20220328-085507-root.json
  • 08:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:50 jynus: deploy new alerting (0.7.1) for db backups at alert1001 T138562
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After downgrade ', diff saved to https://phabricator.wikimedia.org/P23374 and previous config saved to /var/cache/conftool/dbconfig/20220328-084941-root.json
  • 08:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:47 marostegui: dbmaint s1@eqiad T304812
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 T304812', diff saved to https://phabricator.wikimedia.org/P23373 and previous config saved to /var/cache/conftool/dbconfig/20220328-084705-marostegui.json
  • 08:46 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable WRITE BOTH for templatelinks normalization in more wikis (T299421) (duration: 00m 54s)
  • 08:46 _joe_: uploading conftool 2.0.0, T302471
  • 08:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable videojs in the second batch of wikis (T248418) (duration: 00m 55s)
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P23371 and previous config saved to /var/cache/conftool/dbconfig/20220328-084023-marostegui.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23370 and previous config saved to /var/cache/conftool/dbconfig/20220328-084003-root.json
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After downgrade ', diff saved to https://phabricator.wikimedia.org/P23369 and previous config saved to /var/cache/conftool/dbconfig/20220328-083437-root.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298556)', diff saved to https://phabricator.wikimedia.org/P23368 and previous config saved to /var/cache/conftool/dbconfig/20220328-082518-marostegui.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23367 and previous config saved to /var/cache/conftool/dbconfig/20220328-082459-root.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After downgrade ', diff saved to https://phabricator.wikimedia.org/P23366 and previous config saved to /var/cache/conftool/dbconfig/20220328-081933-root.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23365 and previous config saved to /var/cache/conftool/dbconfig/20220328-080955-root.json
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23364 and previous config saved to /var/cache/conftool/dbconfig/20220328-080841-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: After downgrade ', diff saved to https://phabricator.wikimedia.org/P23363 and previous config saved to /var/cache/conftool/dbconfig/20220328-080429-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T298556)', diff saved to https://phabricator.wikimedia.org/P23362 and previous config saved to /var/cache/conftool/dbconfig/20220328-080409-marostegui.json
  • 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298556)', diff saved to https://phabricator.wikimedia.org/P23361 and previous config saved to /var/cache/conftool/dbconfig/20220328-080401-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23360 and previous config saved to /var/cache/conftool/dbconfig/20220328-075451-root.json
  • 07:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23359 and previous config saved to /var/cache/conftool/dbconfig/20220328-075337-root.json
  • 07:51 marostegui: dbmaint s1@codfw T304812
  • 07:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23358 and previous config saved to /var/cache/conftool/dbconfig/20220328-074856-marostegui.json
  • 07:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:39 moritzm: updated d-i images for Buster 10.12 release T304546
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23357 and previous config saved to /var/cache/conftool/dbconfig/20220328-073833-root.json
  • 07:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:34 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Remove unused CentralAuth settings (2/2) (duration: 00m 55s)
  • 07:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:33 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused CentralAuth settings (1/2) (duration: 00m 56s)
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23356 and previous config saved to /var/cache/conftool/dbconfig/20220328-073351-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23355 and previous config saved to /var/cache/conftool/dbconfig/20220328-072329-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298556)', diff saved to https://phabricator.wikimedia.org/P23354 and previous config saved to /var/cache/conftool/dbconfig/20220328-071846-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P23353 and previous config saved to /var/cache/conftool/dbconfig/20220328-071427-marostegui.json
  • 07:13 moritzm: updated d-i images for Bullseye 11.3 release T304599
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23352 and previous config saved to /var/cache/conftool/dbconfig/20220328-070825-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After schema downgrade ', diff saved to https://phabricator.wikimedia.org/P23351 and previous config saved to /var/cache/conftool/dbconfig/20220328-070700-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23350 and previous config saved to /var/cache/conftool/dbconfig/20220328-070154-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23349 and previous config saved to /var/cache/conftool/dbconfig/20220328-070139-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 for downgrade', diff saved to https://phabricator.wikimedia.org/P23348 and previous config saved to /var/cache/conftool/dbconfig/20220328-070056-marostegui.json
  • 06:52 elukey: reboot ml-serve-ctrl1002 - ganeti console available but slow (attempted to root login but never get to input the password)
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After schema downgrade ', diff saved to https://phabricator.wikimedia.org/P23347 and previous config saved to /var/cache/conftool/dbconfig/20220328-065156-root.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T298556)', diff saved to https://phabricator.wikimedia.org/P23346 and previous config saved to /var/cache/conftool/dbconfig/20220328-065048-marostegui.json
  • 06:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 06:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298556)', diff saved to https://phabricator.wikimedia.org/P23345 and previous config saved to /var/cache/conftool/dbconfig/20220328-065040-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23344 and previous config saved to /var/cache/conftool/dbconfig/20220328-064650-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23343 and previous config saved to /var/cache/conftool/dbconfig/20220328-064635-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After schema downgrade ', diff saved to https://phabricator.wikimedia.org/P23342 and previous config saved to /var/cache/conftool/dbconfig/20220328-063652-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P23341 and previous config saved to /var/cache/conftool/dbconfig/20220328-063535-marostegui.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23340 and previous config saved to /var/cache/conftool/dbconfig/20220328-063146-root.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23339 and previous config saved to /var/cache/conftool/dbconfig/20220328-063131-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After schema downgrade ', diff saved to https://phabricator.wikimedia.org/P23338 and previous config saved to /var/cache/conftool/dbconfig/20220328-062149-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P23337 and previous config saved to /var/cache/conftool/dbconfig/20220328-062030-marostegui.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23336 and previous config saved to /var/cache/conftool/dbconfig/20220328-061642-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23335 and previous config saved to /var/cache/conftool/dbconfig/20220328-061627-root.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After schema downgrade ', diff saved to https://phabricator.wikimedia.org/P23334 and previous config saved to /var/cache/conftool/dbconfig/20220328-060645-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298556)', diff saved to https://phabricator.wikimedia.org/P23333 and previous config saved to /var/cache/conftool/dbconfig/20220328-060525-marostegui.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for downgrade', diff saved to https://phabricator.wikimedia.org/P23332 and previous config saved to /var/cache/conftool/dbconfig/20220328-060239-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23331 and previous config saved to /var/cache/conftool/dbconfig/20220328-060138-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 10%: After downgrade', diff saved to https://phabricator.wikimedia.org/P23330 and previous config saved to /var/cache/conftool/dbconfig/20220328-060123-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 for downgrade', diff saved to https://phabricator.wikimedia.org/P23329 and previous config saved to /var/cache/conftool/dbconfig/20220328-054552-marostegui.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T298556)', diff saved to https://phabricator.wikimedia.org/P23328 and previous config saved to /var/cache/conftool/dbconfig/20220328-053816-marostegui.json
  • 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 05:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 12 hosts with reason: Maintenance
  • 05:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 12 hosts with reason: Maintenance
  • 05:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 05:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:32 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: sync
  • 05:32 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: sync
  • 04:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23327 and previous config saved to /var/cache/conftool/dbconfig/20220328-042334-ladsgroup.json
  • 04:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23326 and previous config saved to /var/cache/conftool/dbconfig/20220328-040829-ladsgroup.json
  • 03:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23325 and previous config saved to /var/cache/conftool/dbconfig/20220328-035323-ladsgroup.json
  • 03:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23324 and previous config saved to /var/cache/conftool/dbconfig/20220328-033818-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23323 and previous config saved to /var/cache/conftool/dbconfig/20220328-023804-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 02:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23322 and previous config saved to /var/cache/conftool/dbconfig/20220328-023756-ladsgroup.json
  • 02:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23321 and previous config saved to /var/cache/conftool/dbconfig/20220328-022251-ladsgroup.json
  • 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23320 and previous config saved to /var/cache/conftool/dbconfig/20220328-020746-ladsgroup.json
  • 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23319 and previous config saved to /var/cache/conftool/dbconfig/20220328-015241-ladsgroup.json
  • 01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23318 and previous config saved to /var/cache/conftool/dbconfig/20220328-012553-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 01:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23317 and previous config saved to /var/cache/conftool/dbconfig/20220328-012543-ladsgroup.json
  • 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23316 and previous config saved to /var/cache/conftool/dbconfig/20220328-011038-ladsgroup.json
  • 00:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23315 and previous config saved to /var/cache/conftool/dbconfig/20220328-005533-ladsgroup.json
  • 00:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23314 and previous config saved to /var/cache/conftool/dbconfig/20220328-004027-ladsgroup.json
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23313 and previous config saved to /var/cache/conftool/dbconfig/20220328-001707-ladsgroup.json
  • 00:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 00:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance

2022-03-27

  • 23:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23312 and previous config saved to /var/cache/conftool/dbconfig/20220327-235516-ladsgroup.json
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23311 and previous config saved to /var/cache/conftool/dbconfig/20220327-234011-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23310 and previous config saved to /var/cache/conftool/dbconfig/20220327-232506-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23309 and previous config saved to /var/cache/conftool/dbconfig/20220327-231001-ladsgroup.json
  • 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23308 and previous config saved to /var/cache/conftool/dbconfig/20220327-224707-ladsgroup.json
  • 22:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23307 and previous config saved to /var/cache/conftool/dbconfig/20220327-224659-ladsgroup.json
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23306 and previous config saved to /var/cache/conftool/dbconfig/20220327-223154-ladsgroup.json
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23305 and previous config saved to /var/cache/conftool/dbconfig/20220327-221649-ladsgroup.json
  • 22:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23304 and previous config saved to /var/cache/conftool/dbconfig/20220327-220143-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23303 and previous config saved to /var/cache/conftool/dbconfig/20220327-215440-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 21:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23302 and previous config saved to /var/cache/conftool/dbconfig/20220327-215432-ladsgroup.json
  • 21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23301 and previous config saved to /var/cache/conftool/dbconfig/20220327-213927-ladsgroup.json
  • 21:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23300 and previous config saved to /var/cache/conftool/dbconfig/20220327-212422-ladsgroup.json
  • 21:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23299 and previous config saved to /var/cache/conftool/dbconfig/20220327-210917-ladsgroup.json
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23298 and previous config saved to /var/cache/conftool/dbconfig/20220327-204604-ladsgroup.json
  • 20:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 20:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:20 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: sync
  • 20:20 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: sync
  • 19:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 19:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23297 and previous config saved to /var/cache/conftool/dbconfig/20220327-195258-ladsgroup.json
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23296 and previous config saved to /var/cache/conftool/dbconfig/20220327-193753-ladsgroup.json
  • 19:35 _joe_: $ sudo cumin -b1 -s20 'A:mw-api and P{mw13[56-82].eqiad.wmnet}' 'restart-php7.2-fpm'
  • 19:25 _joe_: restarting php on mw1380
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23295 and previous config saved to /var/cache/conftool/dbconfig/20220327-192247-ladsgroup.json
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23294 and previous config saved to /var/cache/conftool/dbconfig/20220327-190742-ladsgroup.json
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23293 and previous config saved to /var/cache/conftool/dbconfig/20220327-184107-ladsgroup.json
  • 18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23292 and previous config saved to /var/cache/conftool/dbconfig/20220327-184059-ladsgroup.json
  • 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23291 and previous config saved to /var/cache/conftool/dbconfig/20220327-182554-ladsgroup.json
  • 18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23290 and previous config saved to /var/cache/conftool/dbconfig/20220327-181049-ladsgroup.json
  • 17:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23289 and previous config saved to /var/cache/conftool/dbconfig/20220327-175544-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23288 and previous config saved to /var/cache/conftool/dbconfig/20220327-165530-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 16:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23287 and previous config saved to /var/cache/conftool/dbconfig/20220327-165522-ladsgroup.json
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23286 and previous config saved to /var/cache/conftool/dbconfig/20220327-164017-ladsgroup.json
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23285 and previous config saved to /var/cache/conftool/dbconfig/20220327-162511-ladsgroup.json
  • 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23284 and previous config saved to /var/cache/conftool/dbconfig/20220327-161006-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23283 and previous config saved to /var/cache/conftool/dbconfig/20220327-154357-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 15:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 15:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 15:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 14:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 14:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23282 and previous config saved to /var/cache/conftool/dbconfig/20220327-145341-ladsgroup.json
  • 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23281 and previous config saved to /var/cache/conftool/dbconfig/20220327-143835-ladsgroup.json
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23280 and previous config saved to /var/cache/conftool/dbconfig/20220327-142330-ladsgroup.json
  • 14:20 elukey: roll restart of wqds-blazegraph-public codfw
  • 14:18 elukey: restart blazegraph on wdqs2003
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23279 and previous config saved to /var/cache/conftool/dbconfig/20220327-140825-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23278 and previous config saved to /var/cache/conftool/dbconfig/20220327-134411-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23277 and previous config saved to /var/cache/conftool/dbconfig/20220327-134358-ladsgroup.json
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23276 and previous config saved to /var/cache/conftool/dbconfig/20220327-132852-ladsgroup.json
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23275 and previous config saved to /var/cache/conftool/dbconfig/20220327-131347-ladsgroup.json
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23274 and previous config saved to /var/cache/conftool/dbconfig/20220327-125842-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23273 and previous config saved to /var/cache/conftool/dbconfig/20220327-125128-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23272 and previous config saved to /var/cache/conftool/dbconfig/20220327-125120-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23271 and previous config saved to /var/cache/conftool/dbconfig/20220327-123615-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23270 and previous config saved to /var/cache/conftool/dbconfig/20220327-122110-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23269 and previous config saved to /var/cache/conftool/dbconfig/20220327-120604-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23268 and previous config saved to /var/cache/conftool/dbconfig/20220327-114152-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23267 and previous config saved to /var/cache/conftool/dbconfig/20220327-112003-ladsgroup.json
  • 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23266 and previous config saved to /var/cache/conftool/dbconfig/20220327-110457-ladsgroup.json
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23265 and previous config saved to /var/cache/conftool/dbconfig/20220327-104952-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23264 and previous config saved to /var/cache/conftool/dbconfig/20220327-103447-ladsgroup.json
  • 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23263 and previous config saved to /var/cache/conftool/dbconfig/20220327-101022-ladsgroup.json
  • 10:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 10:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23262 and previous config saved to /var/cache/conftool/dbconfig/20220327-101014-ladsgroup.json
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23261 and previous config saved to /var/cache/conftool/dbconfig/20220327-095509-ladsgroup.json
  • 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23260 and previous config saved to /var/cache/conftool/dbconfig/20220327-094004-ladsgroup.json
  • 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23259 and previous config saved to /var/cache/conftool/dbconfig/20220327-092459-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23258 and previous config saved to /var/cache/conftool/dbconfig/20220327-085741-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23257 and previous config saved to /var/cache/conftool/dbconfig/20220327-085733-ladsgroup.json
  • 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23256 and previous config saved to /var/cache/conftool/dbconfig/20220327-084228-ladsgroup.json
  • 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23255 and previous config saved to /var/cache/conftool/dbconfig/20220327-082723-ladsgroup.json
  • 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23254 and previous config saved to /var/cache/conftool/dbconfig/20220327-081218-ladsgroup.json
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23253 and previous config saved to /var/cache/conftool/dbconfig/20220327-071203-ladsgroup.json
  • 07:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 07:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23252 and previous config saved to /var/cache/conftool/dbconfig/20220327-071156-ladsgroup.json
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23251 and previous config saved to /var/cache/conftool/dbconfig/20220327-065651-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23250 and previous config saved to /var/cache/conftool/dbconfig/20220327-064146-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23249 and previous config saved to /var/cache/conftool/dbconfig/20220327-062641-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23248 and previous config saved to /var/cache/conftool/dbconfig/20220327-055108-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23247 and previous config saved to /var/cache/conftool/dbconfig/20220327-055100-ladsgroup.json
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23246 and previous config saved to /var/cache/conftool/dbconfig/20220327-053555-ladsgroup.json
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23245 and previous config saved to /var/cache/conftool/dbconfig/20220327-052050-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23244 and previous config saved to /var/cache/conftool/dbconfig/20220327-050545-ladsgroup.json
  • 04:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23243 and previous config saved to /var/cache/conftool/dbconfig/20220327-044235-ladsgroup.json
  • 04:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 04:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 04:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 04:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23242 and previous config saved to /var/cache/conftool/dbconfig/20220327-042041-ladsgroup.json
  • 04:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23241 and previous config saved to /var/cache/conftool/dbconfig/20220327-040536-ladsgroup.json
  • 03:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23240 and previous config saved to /var/cache/conftool/dbconfig/20220327-035031-ladsgroup.json
  • 03:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23239 and previous config saved to /var/cache/conftool/dbconfig/20220327-033526-ladsgroup.json
  • 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23238 and previous config saved to /var/cache/conftool/dbconfig/20220327-031115-ladsgroup.json
  • 03:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 03:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 03:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23237 and previous config saved to /var/cache/conftool/dbconfig/20220327-031108-ladsgroup.json
  • 02:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23236 and previous config saved to /var/cache/conftool/dbconfig/20220327-025603-ladsgroup.json
  • 02:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23235 and previous config saved to /var/cache/conftool/dbconfig/20220327-024057-ladsgroup.json
  • 02:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23234 and previous config saved to /var/cache/conftool/dbconfig/20220327-022552-ladsgroup.json
  • 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23233 and previous config saved to /var/cache/conftool/dbconfig/20220327-015848-ladsgroup.json
  • 01:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 01:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23232 and previous config saved to /var/cache/conftool/dbconfig/20220327-015840-ladsgroup.json
  • 01:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23231 and previous config saved to /var/cache/conftool/dbconfig/20220327-014335-ladsgroup.json
  • 01:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23230 and previous config saved to /var/cache/conftool/dbconfig/20220327-012829-ladsgroup.json
  • 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23229 and previous config saved to /var/cache/conftool/dbconfig/20220327-011324-ladsgroup.json
  • 00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23228 and previous config saved to /var/cache/conftool/dbconfig/20220327-005010-ladsgroup.json
  • 00:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 00:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 00:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 00:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 00:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 00:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 00:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23227 and previous config saved to /var/cache/conftool/dbconfig/20220327-000023-ladsgroup.json

2022-03-26

  • 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23226 and previous config saved to /var/cache/conftool/dbconfig/20220326-234517-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23225 and previous config saved to /var/cache/conftool/dbconfig/20220326-233012-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23224 and previous config saved to /var/cache/conftool/dbconfig/20220326-231507-ladsgroup.json
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23223 and previous config saved to /var/cache/conftool/dbconfig/20220326-224955-ladsgroup.json
  • 22:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 22:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23222 and previous config saved to /var/cache/conftool/dbconfig/20220326-224947-ladsgroup.json
  • 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23221 and previous config saved to /var/cache/conftool/dbconfig/20220326-223442-ladsgroup.json
  • 22:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23220 and previous config saved to /var/cache/conftool/dbconfig/20220326-221937-ladsgroup.json
  • 22:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23219 and previous config saved to /var/cache/conftool/dbconfig/20220326-220432-ladsgroup.json
  • 21:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23218 and previous config saved to /var/cache/conftool/dbconfig/20220326-210417-ladsgroup.json
  • 21:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 21:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 21:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23217 and previous config saved to /var/cache/conftool/dbconfig/20220326-210409-ladsgroup.json
  • 20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23216 and previous config saved to /var/cache/conftool/dbconfig/20220326-204904-ladsgroup.json
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23214 and previous config saved to /var/cache/conftool/dbconfig/20220326-203359-ladsgroup.json
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23213 and previous config saved to /var/cache/conftool/dbconfig/20220326-201854-ladsgroup.json
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23212 and previous config saved to /var/cache/conftool/dbconfig/20220326-195245-ladsgroup.json
  • 19:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 19:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 19:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23211 and previous config saved to /var/cache/conftool/dbconfig/20220326-190244-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23210 and previous config saved to /var/cache/conftool/dbconfig/20220326-184739-ladsgroup.json
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23209 and previous config saved to /var/cache/conftool/dbconfig/20220326-183234-ladsgroup.json
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23208 and previous config saved to /var/cache/conftool/dbconfig/20220326-181729-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23207 and previous config saved to /var/cache/conftool/dbconfig/20220326-175315-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23206 and previous config saved to /var/cache/conftool/dbconfig/20220326-175302-ladsgroup.json
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23205 and previous config saved to /var/cache/conftool/dbconfig/20220326-173757-ladsgroup.json
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23204 and previous config saved to /var/cache/conftool/dbconfig/20220326-172250-ladsgroup.json
  • 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23203 and previous config saved to /var/cache/conftool/dbconfig/20220326-170745-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23202 and previous config saved to /var/cache/conftool/dbconfig/20220326-170047-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 17:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23201 and previous config saved to /var/cache/conftool/dbconfig/20220326-170039-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23200 and previous config saved to /var/cache/conftool/dbconfig/20220326-164534-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23199 and previous config saved to /var/cache/conftool/dbconfig/20220326-163029-ladsgroup.json
  • 16:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23198 and previous config saved to /var/cache/conftool/dbconfig/20220326-161523-ladsgroup.json
  • 16:00 Amir1: start of mwscript maintenance/migrateLinksTable.php --wiki enwiki --table templatelinks --sleep 2 on beta cluster (T299424)
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23197 and previous config saved to /var/cache/conftool/dbconfig/20220326-155025-ladsgroup.json
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23196 and previous config saved to /var/cache/conftool/dbconfig/20220326-152835-ladsgroup.json
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23195 and previous config saved to /var/cache/conftool/dbconfig/20220326-151330-ladsgroup.json
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23194 and previous config saved to /var/cache/conftool/dbconfig/20220326-145825-ladsgroup.json
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23193 and previous config saved to /var/cache/conftool/dbconfig/20220326-144320-ladsgroup.json
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23192 and previous config saved to /var/cache/conftool/dbconfig/20220326-141912-ladsgroup.json
  • 14:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23191 and previous config saved to /var/cache/conftool/dbconfig/20220326-141904-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23190 and previous config saved to /var/cache/conftool/dbconfig/20220326-140359-ladsgroup.json
  • 13:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23189 and previous config saved to /var/cache/conftool/dbconfig/20220326-134854-ladsgroup.json
  • 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23188 and previous config saved to /var/cache/conftool/dbconfig/20220326-133349-ladsgroup.json
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23187 and previous config saved to /var/cache/conftool/dbconfig/20220326-130701-ladsgroup.json
  • 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23186 and previous config saved to /var/cache/conftool/dbconfig/20220326-130653-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23185 and previous config saved to /var/cache/conftool/dbconfig/20220326-125148-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23184 and previous config saved to /var/cache/conftool/dbconfig/20220326-123643-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23183 and previous config saved to /var/cache/conftool/dbconfig/20220326-122136-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23182 and previous config saved to /var/cache/conftool/dbconfig/20220326-112122-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23181 and previous config saved to /var/cache/conftool/dbconfig/20220326-112114-ladsgroup.json
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23180 and previous config saved to /var/cache/conftool/dbconfig/20220326-110609-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23179 and previous config saved to /var/cache/conftool/dbconfig/20220326-105104-ladsgroup.json
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23178 and previous config saved to /var/cache/conftool/dbconfig/20220326-103559-ladsgroup.json
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23177 and previous config saved to /var/cache/conftool/dbconfig/20220326-100918-ladsgroup.json
  • 10:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 10:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23176 and previous config saved to /var/cache/conftool/dbconfig/20220326-100911-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23175 and previous config saved to /var/cache/conftool/dbconfig/20220326-095405-ladsgroup.json
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23174 and previous config saved to /var/cache/conftool/dbconfig/20220326-093900-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23173 and previous config saved to /var/cache/conftool/dbconfig/20220326-092355-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23172 and previous config saved to /var/cache/conftool/dbconfig/20220326-085938-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 08:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23171 and previous config saved to /var/cache/conftool/dbconfig/20220326-083731-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23170 and previous config saved to /var/cache/conftool/dbconfig/20220326-082225-ladsgroup.json
  • 08:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23169 and previous config saved to /var/cache/conftool/dbconfig/20220326-080720-ladsgroup.json
  • 07:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23168 and previous config saved to /var/cache/conftool/dbconfig/20220326-075215-ladsgroup.json
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23167 and previous config saved to /var/cache/conftool/dbconfig/20220326-072702-ladsgroup.json
  • 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23166 and previous config saved to /var/cache/conftool/dbconfig/20220326-072654-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23165 and previous config saved to /var/cache/conftool/dbconfig/20220326-071149-ladsgroup.json
  • 06:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23164 and previous config saved to /var/cache/conftool/dbconfig/20220326-065644-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23163 and previous config saved to /var/cache/conftool/dbconfig/20220326-064139-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23162 and previous config saved to /var/cache/conftool/dbconfig/20220326-062131-ladsgroup.json
  • 06:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23161 and previous config saved to /var/cache/conftool/dbconfig/20220326-062123-ladsgroup.json
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23160 and previous config saved to /var/cache/conftool/dbconfig/20220326-060618-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23159 and previous config saved to /var/cache/conftool/dbconfig/20220326-055113-ladsgroup.json
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23158 and previous config saved to /var/cache/conftool/dbconfig/20220326-053607-ladsgroup.json
  • 05:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23157 and previous config saved to /var/cache/conftool/dbconfig/20220326-051140-ladsgroup.json
  • 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 04:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 04:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 04:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 04:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 04:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 04:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 04:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 04:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 04:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23156 and previous config saved to /var/cache/conftool/dbconfig/20220326-042136-ladsgroup.json
  • 04:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23155 and previous config saved to /var/cache/conftool/dbconfig/20220326-040631-ladsgroup.json
  • 03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23154 and previous config saved to /var/cache/conftool/dbconfig/20220326-035126-ladsgroup.json
  • 03:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23153 and previous config saved to /var/cache/conftool/dbconfig/20220326-033621-ladsgroup.json
  • 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23152 and previous config saved to /var/cache/conftool/dbconfig/20220326-025754-ladsgroup.json
  • 02:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 02:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23151 and previous config saved to /var/cache/conftool/dbconfig/20220326-025746-ladsgroup.json
  • 02:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23150 and previous config saved to /var/cache/conftool/dbconfig/20220326-024241-ladsgroup.json
  • 02:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23149 and previous config saved to /var/cache/conftool/dbconfig/20220326-022736-ladsgroup.json
  • 02:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23148 and previous config saved to /var/cache/conftool/dbconfig/20220326-021231-ladsgroup.json
  • 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23147 and previous config saved to /var/cache/conftool/dbconfig/20220326-011216-ladsgroup.json
  • 01:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 01:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23146 and previous config saved to /var/cache/conftool/dbconfig/20220326-011209-ladsgroup.json
  • 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23145 and previous config saved to /var/cache/conftool/dbconfig/20220326-005704-ladsgroup.json
  • 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23144 and previous config saved to /var/cache/conftool/dbconfig/20220326-004159-ladsgroup.json
  • 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23143 and previous config saved to /var/cache/conftool/dbconfig/20220326-002653-ladsgroup.json

2022-03-25

  • 23:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23142 and previous config saved to /var/cache/conftool/dbconfig/20220325-235855-ladsgroup.json
  • 23:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 23:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 23:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 23:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 23:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23141 and previous config saved to /var/cache/conftool/dbconfig/20220325-230540-ladsgroup.json
  • 22:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23140 and previous config saved to /var/cache/conftool/dbconfig/20220325-225035-ladsgroup.json
  • 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23139 and previous config saved to /var/cache/conftool/dbconfig/20220325-223530-ladsgroup.json
  • 22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23138 and previous config saved to /var/cache/conftool/dbconfig/20220325-222025-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23137 and previous config saved to /var/cache/conftool/dbconfig/20220325-215400-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23136 and previous config saved to /var/cache/conftool/dbconfig/20220325-215346-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23135 and previous config saved to /var/cache/conftool/dbconfig/20220325-213841-ladsgroup.json
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23134 and previous config saved to /var/cache/conftool/dbconfig/20220325-212336-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23133 and previous config saved to /var/cache/conftool/dbconfig/20220325-210831-ladsgroup.json
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23132 and previous config saved to /var/cache/conftool/dbconfig/20220325-210136-ladsgroup.json
  • 21:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 21:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23131 and previous config saved to /var/cache/conftool/dbconfig/20220325-210128-ladsgroup.json
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23130 and previous config saved to /var/cache/conftool/dbconfig/20220325-204623-ladsgroup.json
  • 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23129 and previous config saved to /var/cache/conftool/dbconfig/20220325-203118-ladsgroup.json
  • 20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23128 and previous config saved to /var/cache/conftool/dbconfig/20220325-201613-ladsgroup.json
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23127 and previous config saved to /var/cache/conftool/dbconfig/20220325-195137-ladsgroup.json
  • 19:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23126 and previous config saved to /var/cache/conftool/dbconfig/20220325-192923-ladsgroup.json
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23125 and previous config saved to /var/cache/conftool/dbconfig/20220325-191416-ladsgroup.json
  • 19:10 mutante: copying dump from deploy server to dumps server: scp -3 deploy1002.eqiad.wmnet:/srv/miscweb/static-bugzilla.tar.gz labstore1006.wikimedia.org:~ (T284193)
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23124 and previous config saved to /var/cache/conftool/dbconfig/20220325-185911-ladsgroup.json
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23123 and previous config saved to /var/cache/conftool/dbconfig/20220325-184406-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23122 and previous config saved to /var/cache/conftool/dbconfig/20220325-181439-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23121 and previous config saved to /var/cache/conftool/dbconfig/20220325-181431-ladsgroup.json
  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23120 and previous config saved to /var/cache/conftool/dbconfig/20220325-175926-ladsgroup.json
  • 17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23119 and previous config saved to /var/cache/conftool/dbconfig/20220325-174421-ladsgroup.json
  • 17:42 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23118 and previous config saved to /var/cache/conftool/dbconfig/20220325-172916-ladsgroup.json
  • 17:14 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23117 and previous config saved to /var/cache/conftool/dbconfig/20220325-170154-ladsgroup.json
  • 17:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23116 and previous config saved to /var/cache/conftool/dbconfig/20220325-170146-ladsgroup.json
  • 16:57 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23115 and previous config saved to /var/cache/conftool/dbconfig/20220325-164641-ladsgroup.json
  • 16:37 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:34 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P23114 and previous config saved to /var/cache/conftool/dbconfig/20220325-163136-ladsgroup.json
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23112 and previous config saved to /var/cache/conftool/dbconfig/20220325-161631-ladsgroup.json
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23111 and previous config saved to /var/cache/conftool/dbconfig/20220325-154705-ladsgroup.json
  • 15:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 15:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23110 and previous config saved to /var/cache/conftool/dbconfig/20220325-154658-ladsgroup.json
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23109 and previous config saved to /var/cache/conftool/dbconfig/20220325-153152-ladsgroup.json
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P23108 and previous config saved to /var/cache/conftool/dbconfig/20220325-151647-ladsgroup.json
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23107 and previous config saved to /var/cache/conftool/dbconfig/20220325-150141-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298565)', diff saved to https://phabricator.wikimedia.org/P23101 and previous config saved to /var/cache/conftool/dbconfig/20220325-143545-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23100 and previous config saved to /var/cache/conftool/dbconfig/20220325-141301-ladsgroup.json
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23099 and previous config saved to /var/cache/conftool/dbconfig/20220325-140850-root.json
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23098 and previous config saved to /var/cache/conftool/dbconfig/20220325-135756-ladsgroup.json
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23097 and previous config saved to /var/cache/conftool/dbconfig/20220325-135346-root.json
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P23096 and previous config saved to /var/cache/conftool/dbconfig/20220325-134251-ladsgroup.json
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23095 and previous config saved to /var/cache/conftool/dbconfig/20220325-133842-root.json
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23094 and previous config saved to /var/cache/conftool/dbconfig/20220325-132746-ladsgroup.json
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23093 and previous config saved to /var/cache/conftool/dbconfig/20220325-132338-root.json
  • 13:22 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23092 and previous config saved to /var/cache/conftool/dbconfig/20220325-130834-root.json
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23091 and previous config saved to /var/cache/conftool/dbconfig/20220325-130146-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23090 and previous config saved to /var/cache/conftool/dbconfig/20220325-130138-ladsgroup.json
  • 12:49 hoo: Updated operations/dumps/dcat on snapshot10(08|09|11|12|13) from d4886f6 to a1f46e4
  • 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23089 and previous config saved to /var/cache/conftool/dbconfig/20220325-124633-ladsgroup.json
  • 12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P23088 and previous config saved to /var/cache/conftool/dbconfig/20220325-123128-ladsgroup.json
  • 12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23086 and previous config saved to /var/cache/conftool/dbconfig/20220325-121623-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298565)', diff saved to https://phabricator.wikimedia.org/P23085 and previous config saved to /var/cache/conftool/dbconfig/20220325-120708-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23084 and previous config saved to /var/cache/conftool/dbconfig/20220325-120701-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23083 and previous config saved to /var/cache/conftool/dbconfig/20220325-115156-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P23082 and previous config saved to /var/cache/conftool/dbconfig/20220325-113651-ladsgroup.json
  • 11:24 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23081 and previous config saved to /var/cache/conftool/dbconfig/20220325-112145-ladsgroup.json
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T302658)', diff saved to https://phabricator.wikimedia.org/P23080 and previous config saved to /var/cache/conftool/dbconfig/20220325-110217-marostegui.json
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23079 and previous config saved to /var/cache/conftool/dbconfig/20220325-104712-marostegui.json
  • 10:33 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298565)', diff saved to https://phabricator.wikimedia.org/P23078 and previous config saved to /var/cache/conftool/dbconfig/20220325-103310-ladsgroup.json
  • 10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P23077 and previous config saved to /var/cache/conftool/dbconfig/20220325-103207-marostegui.json
  • 10:22 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
  • 10:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1005.eqiad.wmnet
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T302658)', diff saved to https://phabricator.wikimedia.org/P23076 and previous config saved to /var/cache/conftool/dbconfig/20220325-101701-marostegui.json
  • 10:11 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1005.eqiad.wmnet
  • 10:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23075 and previous config saved to /var/cache/conftool/dbconfig/20220325-101016-ladsgroup.json
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23074 and previous config saved to /var/cache/conftool/dbconfig/20220325-095511-ladsgroup.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T302658)', diff saved to https://phabricator.wikimedia.org/P23073 and previous config saved to /var/cache/conftool/dbconfig/20220325-094031-marostegui.json
  • 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T302658)', diff saved to https://phabricator.wikimedia.org/P23072 and previous config saved to /var/cache/conftool/dbconfig/20220325-094023-marostegui.json
  • 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P23071 and previous config saved to /var/cache/conftool/dbconfig/20220325-094006-ladsgroup.json
  • 09:27 moritzm: updating libapache2-mod-auth-cas on moscovium/debmonitor1002
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23070 and previous config saved to /var/cache/conftool/dbconfig/20220325-092518-marostegui.json
  • 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23069 and previous config saved to /var/cache/conftool/dbconfig/20220325-092500-ladsgroup.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P23068 and previous config saved to /var/cache/conftool/dbconfig/20220325-091013-marostegui.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T302658)', diff saved to https://phabricator.wikimedia.org/P23067 and previous config saved to /var/cache/conftool/dbconfig/20220325-085508-marostegui.json
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298565)', diff saved to https://phabricator.wikimedia.org/P23066 and previous config saved to /var/cache/conftool/dbconfig/20220325-082446-ladsgroup.json
  • 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T302658)', diff saved to https://phabricator.wikimedia.org/P23065 and previous config saved to /var/cache/conftool/dbconfig/20220325-080403-marostegui.json
  • 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T302658)', diff saved to https://phabricator.wikimedia.org/P23064 and previous config saved to /var/cache/conftool/dbconfig/20220325-080355-marostegui.json
  • 08:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23063 and previous config saved to /var/cache/conftool/dbconfig/20220325-075610-ladsgroup.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23062 and previous config saved to /var/cache/conftool/dbconfig/20220325-074850-marostegui.json
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23061 and previous config saved to /var/cache/conftool/dbconfig/20220325-074105-ladsgroup.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P23060 and previous config saved to /var/cache/conftool/dbconfig/20220325-073345-marostegui.json
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P23059 and previous config saved to /var/cache/conftool/dbconfig/20220325-072559-ladsgroup.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T302658)', diff saved to https://phabricator.wikimedia.org/P23058 and previous config saved to /var/cache/conftool/dbconfig/20220325-071840-marostegui.json
  • 07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23057 and previous config saved to /var/cache/conftool/dbconfig/20220325-071054-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298565)', diff saved to https://phabricator.wikimedia.org/P23056 and previous config saved to /var/cache/conftool/dbconfig/20220325-064139-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 06:31 _joe_: deleting a couple zotero pods with excessive number of restarts
  • 06:29 marostegui: dbmaint s4@eqiad T300775
  • 06:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for schema change', diff saved to https://phabricator.wikimedia.org/P23055 and previous config saved to /var/cache/conftool/dbconfig/20220325-060723-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T302658)', diff saved to https://phabricator.wikimedia.org/P23054 and previous config saved to /var/cache/conftool/dbconfig/20220325-054705-marostegui.json
  • 05:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134 for testing', diff saved to https://phabricator.wikimedia.org/P23053 and previous config saved to /var/cache/conftool/dbconfig/20220325-053037-marostegui.json
  • 00:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host restbase2027.codfw.wmnet with OS buster

2022-03-24

  • 23:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2027.codfw.wmnet with OS buster
  • 23:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2027.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23050 and previous config saved to /var/cache/conftool/dbconfig/20220324-223031-marostegui.json
  • 22:19 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23049 and previous config saved to /var/cache/conftool/dbconfig/20220324-221526-marostegui.json
  • 22:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host restbase2027.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:10 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 22:07 ebernhardson: restart wcqs-blazegraph on wcqs2001 to resolve intermittant BlazegraphFreeAllocatorsDecreasingRapidly
  • 22:06 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P23048 and previous config saved to /var/cache/conftool/dbconfig/20220324-220021-marostegui.json
  • 21:54 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23047 and previous config saved to /var/cache/conftool/dbconfig/20220324-214515-marostegui.json
  • 21:42 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 21:38 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:11 inflatador: bking@cumin1001 restarting blazegraph on wdqs[1003-1013].eqiad.wmnet for T293862
  • 21:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4338532: fawiki: Set celebration logo for new vector (T304314; 2/2) (duration: 00m 53s)
  • 21:07 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-fawiki-new-year.png: 4338532: fawiki: Set celebration logo for new vector (T304314; 1/2) (duration: 00m 50s)
  • 21:07 thcipriani@deploy1002: Finished deploy [releng/phatality@15f8ec0]: Deploying phatality updates for opensearch 1.2.0 (duration: 00m 13s)
  • 21:07 thcipriani@deploy1002: Started deploy [releng/phatality@15f8ec0]: Deploying phatality updates for opensearch 1.2.0
  • 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:03 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 00m 50s)
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:43 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Start writing to $wmgAllServices the same value as to $wmfAllServices (T45956) (duration: 01m 17s)
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:31 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Stop writing to certain $wmf* global variables (T45956) (part 3) (duration: 00m 55s)
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:29 thcipriani@deploy1002: Synchronized docroot/noc/db.php: Config: Stop writing to certain $wmf* global variables (T45956) (part II) (duration: 00m 51s)
  • 20:28 thcipriani@deploy1002: Synchronized tests: Config: Stop writing to certain $wmf* global variables (T45956) (part I) (duration: 00m 50s)
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:23 thcipriani@deploy1002: Synchronized portals: Config: Bumping portals to master (T282012) (duration: 00m 52s)
  • 20:22 thcipriani@deploy1002: Synchronized portals/wikipedia.org/assets: Config: Bumping portals to master (T282012) (duration: 00m 52s)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 20:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23045 and previous config saved to /var/cache/conftool/dbconfig/20220324-201305-marostegui.json
  • 20:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 20:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 20:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23044 and previous config saved to /var/cache/conftool/dbconfig/20220324-201257-marostegui.json
  • 20:08 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Use $wmgUseRestbaseVRS in comment (T45956) (duration: 01m 05s)
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:03 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 19:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23043 and previous config saved to /var/cache/conftool/dbconfig/20220324-195752-marostegui.json
  • 19:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P23042 and previous config saved to /var/cache/conftool/dbconfig/20220324-194246-marostegui.json
  • 19:35 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 19:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23041 and previous config saved to /var/cache/conftool/dbconfig/20220324-192741-marostegui.json
  • 19:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1148.eqiad.wmnet with OS buster
  • 19:20 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1147.eqiad.wmnet with OS buster
  • 19:02 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1142.eqiad.wmnet with OS buster
  • 18:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1142.eqiad.wmnet with OS buster
  • 18:41 cstone: civicrm revision changed from b6ceb722 to 4e5b37c3
  • 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23040 and previous config saved to /var/cache/conftool/dbconfig/20220324-183654-root.json
  • 18:36 razzi: razzi@deneb:~$ sudo docker system prune (reclaimed 33GB)
  • 18:35 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1146.eqiad.wmnet with OS buster
  • 18:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1144.eqiad.wmnet with OS buster
  • 18:28 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1145.eqiad.wmnet with OS buster
  • 18:26 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1143.eqiad.wmnet with OS buster
  • 18:26 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1142.eqiad.wmnet with OS buster
  • 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23039 and previous config saved to /var/cache/conftool/dbconfig/20220324-182150-root.json
  • 18:17 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 18:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1148.eqiad.wmnet with OS buster
  • 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1147.eqiad.wmnet with OS buster
  • 18:06 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS buster
  • 18:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23038 and previous config saved to /var/cache/conftool/dbconfig/20220324-180646-root.json
  • 18:05 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 17:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS buster
  • 17:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1144.eqiad.wmnet with OS buster
  • 17:58 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 17:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS buster
  • 17:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1142.eqiad.wmnet with OS buster
  • 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23037 and previous config saved to /var/cache/conftool/dbconfig/20220324-175142-root.json
  • 17:44 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 17:36 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23036 and previous config saved to /var/cache/conftool/dbconfig/20220324-173638-root.json
  • 17:36 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 17:36 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 17:36 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 17:35 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23035 and previous config saved to /var/cache/conftool/dbconfig/20220324-173450-marostegui.json
  • 17:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 17:34 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 17:32 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 17:32 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 17:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:12 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:10 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:07 urbanecm@deploy1002: Synchronized logos/config.yaml: 05d55a9: fawiki: Set new year celebration (T304314; 3/3) (duration: 00m 49s)
  • 17:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:06 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 05d55a9: fawiki: Set new year celebration (T304314; 2/3) (duration: 00m 49s)
  • 17:04 urbanecm@deploy1002: Synchronized static/images/project-logos/: 05d55a9: fawiki: Set new year celebration (T304314; 1/3) (duration: 00m 50s)
  • 17:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:00 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1145.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1146.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS bullseye
  • 16:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1146.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:36 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1144.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 16:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
  • 16:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1145.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
  • 16:29 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.4 refs T300203
  • 16:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:26 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:25 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.4 refs T300203 (duration: 01m 06s)
  • 16:25 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
  • 16:24 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.4 refs T300203
  • 16:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
  • 16:20 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4 refs T300203
  • 16:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1144.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:19 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:18 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS bullseye
  • 16:13 brennen: trainsperiment (T300203): blockers clear, logs triaged, rolling 1.39.0-wmf.4 out to all wikis again
  • 16:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1143.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 16:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1142.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:07 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1001.eqiad.wmnet with OS bullseye
  • 15:56 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
  • 15:51 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
  • 15:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1142.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS bullseye
  • 15:24 XioNoX: codfw: disable BGP to DE-CIX for link move
  • 15:03 moritzm: installing openssl1.0 security updates on stretch
  • 14:39 moritzm: installing containerd updates on ml-serve*
  • 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T302658)', diff saved to https://phabricator.wikimedia.org/P23030 and previous config saved to /var/cache/conftool/dbconfig/20220324-143149-marostegui.json
  • 14:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 14:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 14:26 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23029 and previous config saved to /var/cache/conftool/dbconfig/20220324-142233-root.json
  • 14:11 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2002-dev.codfw.wmnet with OS bullseye
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23028 and previous config saved to /var/cache/conftool/dbconfig/20220324-140729-root.json
  • 14:00 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: host reimage
  • 13:57 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2002-dev.codfw.wmnet with reason: host reimage
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23026 and previous config saved to /var/cache/conftool/dbconfig/20220324-135225-root.json
  • 13:43 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2002-dev.codfw.wmnet with OS bullseye
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23025 and previous config saved to /var/cache/conftool/dbconfig/20220324-133721-root.json
  • 13:34 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudgw2001-dev.codfw.wmnet with OS bullseye
  • 13:26 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: T45956 (duration: 00m 49s)
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 reedy@deploy1002: Synchronized multiversion/: T45956 (duration: 00m 50s)
  • 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23024 and previous config saved to /var/cache/conftool/dbconfig/20220324-132217-root.json
  • 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:21 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
  • 13:18 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:15 reedy@deploy1002: Synchronized tests/: T45956 (duration: 00m 49s)
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:10 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T292802 (duration: 00m 50s)
  • 12:54 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2001-dev.codfw.wmnet with OS bullseye
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158 for schema change', diff saved to https://phabricator.wikimedia.org/P23023 and previous config saved to /var/cache/conftool/dbconfig/20220324-125225-marostegui.json
  • 11:47 jynus: updating eqiad swift-commonswiki backups of originals T299764
  • 11:26 mmandere: pool cp1076 with HAProxy as TLS termination layer - T290005
  • 11:22 jbond: puppet cert clean rendering.svc.eqiad.wmnet
  • 11:21 jbond: removing old api.svc.codfw.wmnet.pem and appservers.svc.codfw.wmnet.pem from root@puppetmaster1001:/var/lib/puppet/server/ssl/ca/signed#
  • 11:15 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1017.eqiad.wmnet with OS bullseye
  • 11:14 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 11:10 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1076.eqiad.wmnet with OS buster
  • 11:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1017.eqiad.wmnet with reason: host reimage
  • 11:00 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1017.eqiad.wmnet with reason: host reimage
  • 10:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet
  • 10:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet
  • 10:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet
  • 10:46 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
  • 10:45 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1017.eqiad.wmnet with OS bullseye
  • 10:43 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1076.eqiad.wmnet with reason: host reimage
  • 10:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet
  • 10:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet
  • 10:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1014.eqiad.wmnet with OS bullseye
  • 10:34 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet
  • 10:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet
  • 10:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: host reimage
  • 10:27 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1076.eqiad.wmnet with OS buster
  • 10:26 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet
  • 10:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: host reimage
  • 10:20 mmandere: depool cp1076 for reimage - T290005
  • 10:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet
  • 10:09 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1014.eqiad.wmnet with OS bullseye
  • 10:01 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet
  • 09:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet
  • 09:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1096.eqiad.wmnet
  • 09:31 mmandere: pool cp1078 with HAProxy as TLS termination layer - T290005
  • 09:30 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1078.eqiad.wmnet with OS buster
  • 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:28 jnuche@deploy1002: Synchronized php-1.39.0-wmf.4/includes/Linker.php: (no justification provided) (duration: 00m 50s)
  • 09:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:08 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
  • 09:05 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1078.eqiad.wmnet with reason: host reimage
  • 09:00 oblivian@puppetmaster1001: conftool action : set/enabled=true; selector: name=parameter_q,cluster=cache-text
  • 08:48 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1078.eqiad.wmnet with OS buster
  • 08:45 oblivian@puppetmaster1001: conftool action : set/enabled=false; selector: name=parameter_q,cluster=cache-text
  • 08:44 marostegui: dbmaint s7@eqiad T302658
  • 08:43 oblivian@puppetmaster1001: conftool action : set/enabled=true; selector: name=parameter_q,cluster=cache-text
  • 08:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1013.eqiad.wmnet with OS bullseye
  • 08:36 mmandere: depool cp1078 for reimage - T290005
  • 08:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: host reimage
  • 08:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: host reimage
  • 08:12 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1013.eqiad.wmnet with OS bullseye
  • 08:11 marostegui: dbmaint s7@codfw T302658
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After testing', diff saved to https://phabricator.wikimedia.org/P23022 and previous config saved to /var/cache/conftool/dbconfig/20220324-080528-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After testing', diff saved to https://phabricator.wikimedia.org/P23021 and previous config saved to /var/cache/conftool/dbconfig/20220324-075024-root.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P23020 and previous config saved to /var/cache/conftool/dbconfig/20220324-074841-root.json
  • 07:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1012.eqiad.wmnet with OS bullseye
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After testing', diff saved to https://phabricator.wikimedia.org/P23019 and previous config saved to /var/cache/conftool/dbconfig/20220324-073520-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P23018 and previous config saved to /var/cache/conftool/dbconfig/20220324-073337-root.json
  • 07:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: host reimage
  • 07:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: host reimage
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After testing', diff saved to https://phabricator.wikimedia.org/P23017 and previous config saved to /var/cache/conftool/dbconfig/20220324-072017-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P23016 and previous config saved to /var/cache/conftool/dbconfig/20220324-071832-root.json
  • 07:08 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1012.eqiad.wmnet with OS bullseye
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After testing', diff saved to https://phabricator.wikimedia.org/P23015 and previous config saved to /var/cache/conftool/dbconfig/20220324-070513-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P23014 and previous config saved to /var/cache/conftool/dbconfig/20220324-070327-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166 for testing', diff saved to https://phabricator.wikimedia.org/P23013 and previous config saved to /var/cache/conftool/dbconfig/20220324-065940-marostegui.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P23012 and previous config saved to /var/cache/conftool/dbconfig/20220324-064823-root.json
  • 06:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 01:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 01:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bullseye
  • 00:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bullseye
  • 00:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bullseye
  • 00:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1045.eqiad.wmnet with OS bullseye
  • 00:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
  • 00:07 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 00:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
  • 00:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
  • 00:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 00:02 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage

2022-03-23

  • 23:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bullseye
  • 23:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bullseye
  • 23:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bullseye
  • 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:38 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.3 refs T300203
  • 23:34 brennen: trainsperiment (T300203): reverting to 1.39.0-wmf.3 on all wikis for T304564; will move forward again after a fix.
  • 23:25 cwhite: remove openjdk-8-jre from codfw logstash nodes T301770
  • 23:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bullseye
  • 22:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 22:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 22:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1042.eqiad.wmnet with OS bullseye
  • 22:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bullseye
  • 22:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bullseye
  • 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 22:23 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 22:19 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 22:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 22:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bullseye
  • 22:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bullseye
  • 21:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bullseye
  • 21:42 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - T301956
  • 21:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 21:31 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 21:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:24 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Enable split A/B testing on beta cluster (T301584) (duration: 00m 50s)
  • 21:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bullseye
  • 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:15 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Allow autoconfirmed users to view basic IP information (T303858) and Enable IPInfo on testwiki (T260598) (duration: 00m 50s)
  • 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bullseye
  • 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bullseye
  • 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bullseye
  • 20:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:40 catrope@deploy1002: Synchronized wmf-config/extension-list: Config: DynamicSidebar: remove unused extension (T304006) (duration: 00m 49s)
  • 20:34 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: DynamicSidebar: remove from InitialiseSettings (duration: 00m 51s)
  • 20:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 20:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 20:32 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bullseye
  • 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 20:28 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:18 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - T301956
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bullseye
  • 20:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bullseye
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:13 catrope@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: DynamicSidebar: remove from CommonSettings (T304006) (duration: 00m 50s)
  • 20:10 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wikitech: Remove DynamicSidebar (T304006) (duration: 00m 52s)
  • 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:01 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - T301956
  • 19:53 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - T301956
  • 19:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:37 brennen: trainsperiment (T300203): 1.39.0-wmf.4 on all wikis; logs seem clean - end of train deployment activities for the week, unless bugs emerge
  • 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:23 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.4 refs T300203
  • 19:23 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - T301956
  • 19:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bullseye
  • 19:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bullseye
  • 19:10 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic ES 6.8 upgrade - bking@cumin1001 - T301956
  • 19:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:09 brennen@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.4 refs T300203 (duration: 00m 52s)
  • 19:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:08 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.4 refs T300203
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:59 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4 refs T300203
  • 18:56 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 18:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 18:53 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.4 refs T300203
  • 18:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:51 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 18:50 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 18:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:47 brennen: trainsperiment (T300203): 1.39.0-wmf.4 on testwikis; proceeding to groups 0-2 with 15 minute intervals for watching logs
  • 18:46 brennen@deploy1002: Pruned MediaWiki: 1.38.0-wmf.26 (duration: 02m 05s)
  • 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:42 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.4 refs T300203 (duration: 49m 41s)
  • 18:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bullseye
  • 18:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bullseye
  • 18:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:52 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.4 refs T300203
  • 17:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bullseye
  • 17:48 brennen: trainsperiment (T300203): starting prep for 1.39.0-wmf.4
  • 17:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bullseye
  • 17:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1028.eqiad.wmnet with OS bullseye
  • 17:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 17:22 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 17:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 17:14 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 17:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
  • 17:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
  • 17:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bullseye
  • 16:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bullseye
  • 16:58 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 16:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS bullseye
  • 16:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 16:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1011.eqiad.wmnet with OS bullseye
  • 16:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1032.eqiad.wmnet with OS bullseye
  • 16:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
  • 16:25 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
  • 16:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1011.eqiad.wmnet with OS bullseye
  • 16:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 16:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 15:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bullseye
  • 15:39 urbanecm: foreachwikiindblist wikipedia extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments # T304052
  • 15:38 urbanecm: Created shnwikivoyage and guwwiki
  • 15:31 mmandere: pool cp1080 with HAProxy as TLS termination layer - T290005
  • 15:28 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1080.eqiad.wmnet with OS buster
  • 15:27 urbanecm@deploy1002: Synchronized langlist: Creating guwwiki (T303727) (duration: 01m 04s)
  • 15:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating guwwiki (T303727) (duration: 01m 07s)
  • 15:25 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating guwwiki (T303727) (duration: 01m 05s)
  • 15:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1031.eqiad.wmnet with OS bullseye
  • 15:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating guwwiki (T303727) (duration: 01m 06s)
  • 15:23 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating guwwiki (T303727)
  • 15:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:21 urbanecm@deploy1002: Synchronized dblists: Creating guwwiki (T303727) (duration: 01m 10s)
  • 15:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:19 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating guwwiki (T303727) (duration: 01m 05s)
  • 15:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating shnwikivoyage (T302797) (duration: 01m 05s)
  • 15:14 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating shnwikivoyage (T302797) (duration: 01m 05s)
  • 15:13 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating shnwikivoyage (T302797) (duration: 01m 05s)
  • 15:12 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating shnwikivoyage (T302797)
  • 15:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:09 urbanecm@deploy1002: Synchronized dblists: Creating shnwikivoyage (T302797) (duration: 01m 05s)
  • 15:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:08 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating shnwikivoyage (T302797) (duration: 01m 05s)
  • 15:05 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
  • 15:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 15:01 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1080.eqiad.wmnet with reason: host reimage
  • 15:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1030.eqiad.wmnet with OS bullseye
  • 14:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 14:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 14:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1031.eqiad.wmnet with OS bullseye
  • 14:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1030.eqiad.wmnet with reason: host reimage
  • 14:44 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1030.eqiad.wmnet with reason: host reimage
  • 14:44 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1080.eqiad.wmnet with OS buster
  • 14:41 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.3/extensions/WikimediaMaintenance/addWiki.php: 9a0aed0: addWiki: Create GrowthExperiment tables for all new Wikipedias (T304052) (duration: 01m 06s)
  • 14:38 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp1085.eqiad.wmnet
  • 14:37 mmandere: depool cp1080 for reimage - T290005
  • 14:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1030.eqiad.wmnet with OS bullseye
  • 14:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:28 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 14:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 14:23 bblack: reboot cp1085 (downtimed)
  • 14:20 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 14:19 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wcqs1002.eqiad.wmnet
  • 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1029.eqiad.wmnet with OS bullseye
  • 14:11 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1027.eqiad.wmnet with OS bullseye
  • 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:06 mmandere: pool cp1082 with HAProxy as TLS termination layer - T290005
  • 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 14:04 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:04 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 14:00 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1082.eqiad.wmnet with OS buster
  • 14:00 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1029.eqiad.wmnet with reason: host reimage
  • 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:57 bking@cumin1001: START - Cookbook sre.wdqs.reboot
  • 13:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1029.eqiad.wmnet with reason: host reimage
  • 13:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:51 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1010.eqiad.wmnet with OS bullseye
  • 13:50 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
  • 13:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:48 Lucas_WMDE: UTC afternoon backport window done
  • 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Enable Wikibase REST API on beta wikidata (T302959) (2/2, production no-op) (duration: 01m 05s)
  • 13:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Enable Wikibase REST API on beta wikidata (T302959) (1/2, production no-op) (duration: 01m 07s)
  • 13:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1027.eqiad.wmnet with reason: host reimage
  • 13:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1029.eqiad.wmnet with OS bullseye
  • 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T300775)', diff saved to https://phabricator.wikimedia.org/P23010 and previous config saved to /var/cache/conftool/dbconfig/20220323-134153-marostegui.json
  • 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300775)', diff saved to https://phabricator.wikimedia.org/P23009 and previous config saved to /var/cache/conftool/dbconfig/20220323-134140-marostegui.json
  • 13:39 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Write "unexpectedUnconnectedPage" page prop on Test Wikidata clients (duration: 01m 10s)
  • 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
  • 13:38 moritzm: restarting superset for OpenSSL update
  • 13:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
  • 13:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1027.eqiad.wmnet with OS bullseye
  • 13:34 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
  • 13:33 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1082.eqiad.wmnet with reason: host reimage
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23008 and previous config saved to /var/cache/conftool/dbconfig/20220323-132635-marostegui.json
  • 13:19 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1010.eqiad.wmnet with OS bullseye
  • 13:16 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1082.eqiad.wmnet with OS buster
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P23005 and previous config saved to /var/cache/conftool/dbconfig/20220323-131130-marostegui.json
  • 13:07 mmandere: depool cp1082 for reimage - T290005
  • 12:58 moritzm: installing bind security updates
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300775)', diff saved to https://phabricator.wikimedia.org/P23004 and previous config saved to /var/cache/conftool/dbconfig/20220323-125625-marostegui.json
  • 12:29 moritzm: restarting Turnilo for OpenSSL update
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 after testing', diff saved to https://phabricator.wikimedia.org/P23003 and previous config saved to /var/cache/conftool/dbconfig/20220323-120749-marostegui.json
  • 11:34 jbond: upload new puppetboard_3.1.0-1+deb11u1_all.deb
  • 11:33 moritzm: installing apache security updates on stretch
  • 11:00 mmandere: pool cp1081 with HAProxy as TLS termination layer - T290005
  • 10:58 moritzm: restarting apache on matomo1002/piwik.wikimedia.org
  • 10:52 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1081.eqiad.wmnet with OS buster
  • 10:30 moritzm: restarting ntpd
  • 10:28 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
  • 10:24 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1081.eqiad.wmnet with reason: host reimage
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 some more weight T301879', diff saved to https://phabricator.wikimedia.org/P23002 and previous config saved to /var/cache/conftool/dbconfig/20220323-101816-marostegui.json
  • 10:07 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1081.eqiad.wmnet with OS buster
  • 09:56 mmandere: depool cp1081 for reimage - T290005
  • 09:43 mmandere: pool cp1079 with HAProxy as TLS termination layer - T290005
  • 09:36 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1079.eqiad.wmnet with OS buster
  • 09:24 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:17 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:15 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
  • 09:11 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1079.eqiad.wmnet with reason: host reimage
  • 09:06 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:54 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS buster
  • 08:54 moritzm: restarting spamassassin/clamav on otrs1001/ticket.wikimedia.org
  • 08:51 mmandere@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1079.eqiad.wmnet with OS buster
  • 08:47 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1079.eqiad.wmnet with OS buster
  • 08:43 moritzm: installing openssl security updates
  • 08:36 mmandere: depool cp1079 for reimage - T290005
  • 08:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1009.eqiad.wmnet with OS bullseye
  • 08:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
  • 08:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
  • 07:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1009.eqiad.wmnet with OS bullseye
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P23001 and previous config saved to /var/cache/conftool/dbconfig/20220323-074408-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P23000 and previous config saved to /var/cache/conftool/dbconfig/20220323-072904-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P22999 and previous config saved to /var/cache/conftool/dbconfig/20220323-071400-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P22998 and previous config saved to /var/cache/conftool/dbconfig/20220323-065856-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P22997 and previous config saved to /var/cache/conftool/dbconfig/20220323-064353-root.json
  • 06:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1112.eqiad.wmnet with OS bullseye
  • 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1112.eqiad.wmnet with reason: host reimage
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1112.eqiad.wmnet with reason: host reimage
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1112.eqiad.wmnet with OS bullseye
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 for reimage', diff saved to https://phabricator.wikimedia.org/P22996 and previous config saved to /var/cache/conftool/dbconfig/20220323-060533-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 with low weight T301879', diff saved to https://phabricator.wikimedia.org/P22995 and previous config saved to /var/cache/conftool/dbconfig/20220323-060351-marostegui.json
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:20 ejegg: updated payments-wiki from 3048f0aa to 28e24856
  • 00:11 cjming: end running skin preference update script T299104

2022-03-22

  • 23:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 23:39 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1024.eqiad.wmnet with reason: host reimage
  • 23:35 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1024.eqiad.wmnet with reason: host reimage
  • 23:23 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 23:11 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:41 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:41 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 22:27 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 22:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 22:25 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:24 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 22:24 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:24 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 22:22 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 22:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 22:20 ryankemper: T301511 Mutated cirrus codfw cluster settings to what [I think] they should be, see https://phabricator.wikimedia.org/T301511#7798415; forcing re-check
  • 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T300775)', diff saved to https://phabricator.wikimedia.org/P22993 and previous config saved to /var/cache/conftool/dbconfig/20220322-221503-marostegui.json
  • 22:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 22:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 22:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300775)', diff saved to https://phabricator.wikimedia.org/P22992 and previous config saved to /var/cache/conftool/dbconfig/20220322-221455-marostegui.json
  • 22:09 ryankemper: T301511 Forcing recheck of codfw cirrus setting check
  • 22:04 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
  • 22:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1026.eqiad.wmnet with reason: host reimage
  • 21:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22991 and previous config saved to /var/cache/conftool/dbconfig/20220322-215950-marostegui.json
  • 21:59 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 21:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
  • 21:58 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1026.eqiad.wmnet with reason: host reimage
  • 21:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 21:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22990 and previous config saved to /var/cache/conftool/dbconfig/20220322-214445-marostegui.json
  • 21:39 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 21:35 ryankemper: T301511 Fixed elastic* eqiad cross-cluster search settings (see https://phabricator.wikimedia.org/T301511#7798267) to resolve the `ElasticSearch setting check` alerts in eqiad
  • 21:33 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300775)', diff saved to https://phabricator.wikimedia.org/P22989 and previous config saved to /var/cache/conftool/dbconfig/20220322-212939-marostegui.json
  • 21:21 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:18 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:05 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:37 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:32 urbanecm: UTC late backport window done
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ce18d4e: testwiki: enable testing of topics match mode for GLAM events (T301825) (duration: 01m 06s)
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 17caf03: Enable EventGate logging for WikipediaPortal schema (T271163) (duration: 01m 54s)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22986 and previous config saved to /var/cache/conftool/dbconfig/20220322-191049-marostegui.json
  • 19:04 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 19:02 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22985 and previous config saved to /var/cache/conftool/dbconfig/20220322-185542-marostegui.json
  • 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22984 and previous config saved to /var/cache/conftool/dbconfig/20220322-184037-marostegui.json
  • 18:30 razzi: remove old karapace1001 known hosts following reimage: `razzi@puppetmaster1001:~$ ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "karapace1001.eqiad.wmnet"`
  • 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22982 and previous config saved to /var/cache/conftool/dbconfig/20220322-182531-marostegui.json
  • 18:01 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@c4d0736]: (no justification provided) (duration: 05m 16s)
  • 17:55 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@c4d0736]: (no justification provided)
  • 17:50 dcausse@deploy1002: Started scap: (no justification provided)
  • 17:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1004.eqiad.wmnet with OS bullseye
  • 17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22981 and previous config saved to /var/cache/conftool/dbconfig/20220322-173301-marostegui.json
  • 17:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298557)', diff saved to https://phabricator.wikimedia.org/P22980 and previous config saved to /var/cache/conftool/dbconfig/20220322-173253-marostegui.json
  • 17:25 brennen: trainsperiment (T300203): with 1.39.0-wmf.3 on all wikis, we're paused for a planned catchup window - nothing to do at the moment, we'll deploy 1.39.0-wmf.4 tomorrow (2022-03-23).
  • 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22979 and previous config saved to /var/cache/conftool/dbconfig/20220322-171748-marostegui.json
  • 17:15 taavi: deploy security patch for T304354
  • 17:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1004.eqiad.wmnet with reason: host reimage
  • 17:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1004.eqiad.wmnet with reason: host reimage
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22978 and previous config saved to /var/cache/conftool/dbconfig/20220322-170243-marostegui.json
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298557)', diff saved to https://phabricator.wikimedia.org/P22974 and previous config saved to /var/cache/conftool/dbconfig/20220322-164738-marostegui.json
  • 16:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1004.eqiad.wmnet with OS bullseye
  • 16:35 ebernhardson: T303548 start wikidatawiki reindexing on eqiad codfw and cloudelastic cirrus clusters
  • 16:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298557)', diff saved to https://phabricator.wikimedia.org/P22973 and previous config saved to /var/cache/conftool/dbconfig/20220322-162917-marostegui.json
  • 16:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298557)', diff saved to https://phabricator.wikimedia.org/P22972 and previous config saved to /var/cache/conftool/dbconfig/20220322-162904-marostegui.json
  • 16:27 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
  • 16:27 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
  • 16:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
  • 16:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
  • 16:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 16:17 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 16:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
  • 16:16 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22971 and previous config saved to /var/cache/conftool/dbconfig/20220322-161359-marostegui.json
  • 16:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
  • 16:13 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:11 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:09 btullis@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 16:07 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 16:00 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 15:59 moritzm: imported jvmquake 1.0.1 for stretch/buster (JDK8) and bullseye (JDK11)
  • 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22970 and previous config saved to /var/cache/conftool/dbconfig/20220322-155854-marostegui.json
  • 15:56 btullis@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298557)', diff saved to https://phabricator.wikimedia.org/P22969 and previous config saved to /var/cache/conftool/dbconfig/20220322-154349-marostegui.json
  • 15:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
  • 15:29 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1003.eqiad.wmnet with reason: host reimage
  • 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298557)', diff saved to https://phabricator.wikimedia.org/P22968 and previous config saved to /var/cache/conftool/dbconfig/20220322-152508-marostegui.json
  • 15:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22967 and previous config saved to /var/cache/conftool/dbconfig/20220322-152247-root.json
  • 15:17 hashar: Gerrit 3.3.10 up and running T304226
  • 15:14 hashar: Stopping Gerrit for security update T304226
  • 15:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit1001 T304226 (duration: 00m 10s)
  • 15:13 hashar@deploy1002: Started deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit1001 T304226
  • 15:10 hashar: Upgrading and starting Gerrit on gerrit2001 (replica)
  • 15:06 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 15:06 hashar@deploy1002: Finished deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit2001 T304226 (duration: 00m 12s)
  • 15:06 hashar@deploy1002: Started deploy [gerrit/gerrit@967b0d7]: Gerrit to 3.3.10 on gerrit2001 T304226
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298557)', diff saved to https://phabricator.wikimedia.org/P22965 and previous config saved to /var/cache/conftool/dbconfig/20220322-144855-marostegui.json
  • 14:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 14:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298557)', diff saved to https://phabricator.wikimedia.org/P22964 and previous config saved to /var/cache/conftool/dbconfig/20220322-144847-marostegui.json
  • 14:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22963 and previous config saved to /var/cache/conftool/dbconfig/20220322-143341-marostegui.json
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22962 and previous config saved to /var/cache/conftool/dbconfig/20220322-141836-marostegui.json
  • 13:52 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 13:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1001.eqiad.wmnet
  • 13:44 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1004.eqiad.wmnet
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298557)', diff saved to https://phabricator.wikimedia.org/P22960 and previous config saved to /var/cache/conftool/dbconfig/20220322-134148-marostegui.json
  • 13:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:40 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 13:40 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.3 refs T300203
  • 13:36 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1004.eqiad.wmnet
  • 13:35 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudmetrics1003.eqiad.wmnet
  • 13:33 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2001-dev.codfw.wmnet
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:27 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudmetrics1003.eqiad.wmnet
  • 13:27 jnuche@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.3 refs T300203 (duration: 00m 52s)
  • 13:26 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.3 refs T300203
  • 13:26 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2001-dev.codfw.wmnet
  • 13:25 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2002-dev.codfw.wmnet
  • 13:20 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
  • 13:19 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudgw2002-dev.codfw.wmnet
  • 13:19 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
  • 12:54 moritzm: installing 5.10.103 kernels on servers running a kernel from buster backports T303179
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 12:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 12:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 12:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 12:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22959 and previous config saved to /var/cache/conftool/dbconfig/20220322-124117-root.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22958 and previous config saved to /var/cache/conftool/dbconfig/20220322-124109-root.json
  • 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132 after testing', diff saved to https://phabricator.wikimedia.org/P22957 and previous config saved to /var/cache/conftool/dbconfig/20220322-123056-marostegui.json
  • 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22956 and previous config saved to /var/cache/conftool/dbconfig/20220322-122613-root.json
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22955 and previous config saved to /var/cache/conftool/dbconfig/20220322-122605-root.json
  • 12:24 marostegui: dbmaint s3@eqiad T300600
  • 12:24 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable WRITE BOTH on rest of s6 for templatelinks normalization (T299421) (duration: 00m 54s)
  • 12:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:21 marostegui: dbmaint s7@eqiad T300992
  • 12:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:18 marostegui: dbmaint s6@eqiad T300992
  • 12:17 marostegui: dbmaint s5@eqiad T300992
  • 12:16 marostegui: dbmaint s8@eqiad T300992
  • 12:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:12 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable WRITE BOTH for templatelinks normalization in wikitech (T299421) (duration: 01m 41s)
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22954 and previous config saved to /var/cache/conftool/dbconfig/20220322-121110-root.json
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22953 and previous config saved to /var/cache/conftool/dbconfig/20220322-121101-root.json
  • 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22952 and previous config saved to /var/cache/conftool/dbconfig/20220322-120123-marostegui.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22951 and previous config saved to /var/cache/conftool/dbconfig/20220322-115606-root.json
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22950 and previous config saved to /var/cache/conftool/dbconfig/20220322-115557-root.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22949 and previous config saved to /var/cache/conftool/dbconfig/20220322-114618-marostegui.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22948 and previous config saved to /var/cache/conftool/dbconfig/20220322-114102-root.json
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22946 and previous config saved to /var/cache/conftool/dbconfig/20220322-114051-root.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22945 and previous config saved to /var/cache/conftool/dbconfig/20220322-113113-marostegui.json
  • 11:31 marostegui: Reboot db1100 and db1123 for kernel upgrade before master swap
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 for reboot', diff saved to https://phabricator.wikimedia.org/P22944 and previous config saved to /var/cache/conftool/dbconfig/20220322-113003-marostegui.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1100 for reboot', diff saved to https://phabricator.wikimedia.org/P22943 and previous config saved to /var/cache/conftool/dbconfig/20220322-112931-marostegui.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22942 and previous config saved to /var/cache/conftool/dbconfig/20220322-111607-marostegui.json
  • 11:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:46 mmandere: pool cp1077 with HAProxy as TLS termination layer - T290005
  • 10:41 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1077.eqiad.wmnet with OS buster
  • 10:26 _joe_: running check-restart-php on api appservers
  • 10:22 _joe_: running check-and-restart on mw-eqiad-appservers
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22940 and previous config saved to /var/cache/conftool/dbconfig/20220322-101354-marostegui.json
  • 10:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22939 and previous config saved to /var/cache/conftool/dbconfig/20220322-101346-marostegui.json
  • 10:03 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.3 refs T300203
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22938 and previous config saved to /var/cache/conftool/dbconfig/20220322-095841-marostegui.json
  • 09:54 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
  • 09:54 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.3 refs T300203 (duration: 62m 07s)
  • 09:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1077.eqiad.wmnet with reason: host reimage
  • 09:46 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cloudcontrol1005.wikimedia.org with reason: dcaro testing backups
  • 09:46 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cloudcontrol1005.wikimedia.org with reason: dcaro testing backups
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22937 and previous config saved to /var/cache/conftool/dbconfig/20220322-094335-marostegui.json
  • 09:34 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1077.eqiad.wmnet with OS buster
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22936 and previous config saved to /var/cache/conftool/dbconfig/20220322-092830-marostegui.json
  • 09:25 mmandere: depool cp1077 for reimage - T290005
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P22935 and previous config saved to /var/cache/conftool/dbconfig/20220322-091718-root.json
  • 09:11 dcausse: restarted blazegraph on wdqs2002 (deadlocked)
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P22934 and previous config saved to /var/cache/conftool/dbconfig/20220322-090214-root.json
  • 08:59 XioNoX: drmrs propagate LVS med to core routers
  • 08:52 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.3 refs T300203
  • 08:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1008.eqiad.wmnet with OS bullseye
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P22933 and previous config saved to /var/cache/conftool/dbconfig/20220322-084710-root.json
  • 08:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
  • 08:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P22932 and previous config saved to /var/cache/conftool/dbconfig/20220322-083206-root.json
  • 08:19 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1008.eqiad.wmnet with OS bullseye
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298557)', diff saved to https://phabricator.wikimedia.org/P22931 and previous config saved to /var/cache/conftool/dbconfig/20220322-081806-marostegui.json
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298557)', diff saved to https://phabricator.wikimedia.org/P22930 and previous config saved to /var/cache/conftool/dbconfig/20220322-081758-marostegui.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P22929 and previous config saved to /var/cache/conftool/dbconfig/20220322-081702-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 some more weight T301879', diff saved to https://phabricator.wikimedia.org/P22928 and previous config saved to /var/cache/conftool/dbconfig/20220322-080713-marostegui.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22927 and previous config saved to /var/cache/conftool/dbconfig/20220322-080253-marostegui.json
  • 07:57 urbanecm: UTC morning backport window completed
  • 07:57 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.2/extensions/GrowthExperiments/modules/ext.growthExperiments.MentorDashboard/MenteeOverview/MenteeOverviewPresets.js: 84877bd: MenteeOverviewPresets.getUsersToShow: Fix typo (T304353) (duration: 00m 49s)
  • 07:53 elukey: restart php-fpm on mw1449 - opcache full after deployment
  • 07:49 elukey: restart php-fpm on mw1448 - high cpu usage right after yesterday's deployment at 21 UTC
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22925 and previous config saved to /var/cache/conftool/dbconfig/20220322-074748-marostegui.json
  • 07:47 elukey: depool mw1448 manually on the node (high cpu usage from php-fpm)
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298557)', diff saved to https://phabricator.wikimedia.org/P22924 and previous config saved to /var/cache/conftool/dbconfig/20220322-073243-marostegui.json
  • 07:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8151bf2: Allow flooders to remove the group from themselves in viwiki (T303578) (duration: 00m 50s)
  • 07:21 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1007.eqiad.wmnet with OS bullseye
  • 07:17 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: caad5a4: wgCrossSiteAJAXdomains: Add foundationwiki and {ee,ge,punjabi}wikimedia (T300978) (duration: 00m 49s)
  • 07:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b4a9935: Create "editautopatrolprotected" protection level for viwiki (T303579) (duration: 00m 57s)
  • 07:08 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
  • 07:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
  • 06:54 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1007.eqiad.wmnet with OS bullseye
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T300775)', diff saved to https://phabricator.wikimedia.org/P22923 and previous config saved to /var/cache/conftool/dbconfig/20220322-064230-marostegui.json
  • 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300775)', diff saved to https://phabricator.wikimedia.org/P22922 and previous config saved to /var/cache/conftool/dbconfig/20220322-064222-marostegui.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298557)', diff saved to https://phabricator.wikimedia.org/P22921 and previous config saved to /var/cache/conftool/dbconfig/20220322-063223-marostegui.json
  • 06:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 06:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22920 and previous config saved to /var/cache/conftool/dbconfig/20220322-062717-marostegui.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 to s1 with minimal weight T301879', diff saved to https://phabricator.wikimedia.org/P22919 and previous config saved to /var/cache/conftool/dbconfig/20220322-062310-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1132 to dbctl T301879', diff saved to https://phabricator.wikimedia.org/P22918 and previous config saved to /var/cache/conftool/dbconfig/20220322-062140-marostegui.json
  • 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1175.eqiad.wmnet with OS bullseye
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22917 and previous config saved to /var/cache/conftool/dbconfig/20220322-061212-marostegui.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300775)', diff saved to https://phabricator.wikimedia.org/P22916 and previous config saved to /var/cache/conftool/dbconfig/20220322-055707-marostegui.json
  • 05:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage
  • 05:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: host reimage
  • 05:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye
  • 03:47 eileen: civicrm revision changed from 457adec4 to b6ceb722
  • 02:56 eileen: civicrm revision changed from 30c55f51 to 457adec4
  • 02:56 eileen: revision changed from 30c55f51 to 457adec4
  • 02:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 02:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 01:35 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 00:35 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye

2022-03-21

  • 23:52 eileen: civicrm revision changed from 52c45874 to 30c55f51
  • 22:29 ryankemper: T301955 Lifted downtime on relforge now that cluster upgrade is complete and cluster is back to green status
  • 22:26 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:04 reedy@deploy1002: Synchronized php-1.39.0-wmf.2/extensions/OATHAuth/: T304350 (duration: 00m 49s)
  • 22:03 reedy@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/OATHAuth/: T304350 (duration: 00m 49s)
  • 21:59 ryankemper: T301955 Downtimed relforge for 2 days; stuck in yellow status during upgrade b/c replica shards cannot be scheduled to a host of lower elasticsearch version than primary shards. Working on patch for our `rolling-operation` cookbook to disable replication during operation
  • 21:46 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 21:46 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 21:46 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 21:45 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 21:45 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 21:45 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 21:44 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 21:44 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 21:43 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 21:43 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 21:41 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 21:41 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 21:41 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 21:41 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 21:40 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 21:36 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 21:33 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 21:33 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 21:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 21:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 21:30 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 21:30 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 21:28 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 21:28 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 21:27 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 21:27 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 21:26 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 21:25 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 21:25 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 21:25 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 21:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 21:10 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 21:03 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:56 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.2 refs T300203
  • 20:52 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:49 dduvall@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.2 refs T300203 (duration: 00m 51s)
  • 20:49 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.2 refs T300203
  • 20:31 urbanecm: UTC late backport window completed
  • 20:29 urbanecm@deploy1002: Synchronized docroot/noc/db.php: 3bcccdc: Migrate away from $wmfDbconfigFromEtcd (T45956; 2/2) (duration: 00m 50s)
  • 20:29 urbanecm@deploy1002: Synchronized wmf-config/etcd.php: 3bcccdc: Migrate away from $wmfDbconfigFromEtcd (T45956; 1/2) (duration: 00m 50s)
  • 20:19 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 8347de5: ExtensionDistributor: Add REL1_38 (T304185) (duration: 00m 51s)
  • 19:48 brennen: mw1416: sudo -i /usr/local/sbin/restart-php7.2-fpm
  • 19:42 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.2 refs T300203
  • 19:26 brennen@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.2 refs T300203 (duration: 64m 33s)
  • 18:54 ebernhardson: T303548 start commonswiki reindexing on eqiad codfw and cloudelastic cirrus clusters
  • 18:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22906 and previous config saved to /var/cache/conftool/dbconfig/20220321-185042-marostegui.json
  • 18:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22905 and previous config saved to /var/cache/conftool/dbconfig/20220321-183537-marostegui.json
  • 18:22 brennen@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.2 refs T300203
  • 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22904 and previous config saved to /var/cache/conftool/dbconfig/20220321-182032-marostegui.json
  • 18:19 otto@deploy1002: Finished deploy [analytics/refinery@2175d63]: gobblin prometheus metrics for all jobs - T294420 (duration: 04m 41s)
  • 18:19 brennen: trainsperiment (T300203): 1.39.0-wmf.1 on all wikis; starting prep of wmf.2, will abort if needed
  • 18:15 otto@deploy1002: Started deploy [analytics/refinery@2175d63]: gobblin prometheus metrics for all jobs - T294420
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22903 and previous config saved to /var/cache/conftool/dbconfig/20220321-180526-marostegui.json
  • 18:04 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.1 refs T300203
  • 18:03 otto@deploy1002: Finished deploy [analytics/refinery@2175d63] (hadoop-test): gobblin prometheus metrics for all jobs - T294420 (duration: 07m 19s)
  • 17:59 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1021 for T302233
  • 17:59 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1020 for T302233
  • 17:57 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1016 for T302233
  • 17:55 otto@deploy1002: Started deploy [analytics/refinery@2175d63] (hadoop-test): gobblin prometheus metrics for all jobs - T294420
  • 17:53 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 17:51 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1017 for T302233
  • 17:49 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1013 for T302233
  • 17:49 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1018 for T302233
  • 17:46 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1014
  • 17:41 ryankemper: [WCQS Deploy] Test query passed on commons-query.wikimedia.org; WCQS deploy complete
  • 17:40 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@2b67de7] (wcqs): Deploy 0.3.107 to WCQS (duration: 02m 12s)
  • 17:38 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.107` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
  • 17:37 ryankemper@deploy1002: Started deploy [wdqs/wdqs@2b67de7] (wcqs): Deploy 0.3.107 to WCQS
  • 17:35 razzi: `sudo maintain-views --all-databases --replace-all --table flaggedrevs` on clouddb1018 after same command without `--table` argument timed out waiting for `zhwiki_p.page`
  • 17:32 ryankemper: [Maps] Running puppet agent on rest of `maps*`: `ryankemper@cumin1001:~$ sudo -E cumin -b 4 'maps*' 'run-puppet-agent'`
  • 17:31 ryankemper: [Maps] Ran puppet agent on maps master `maps1009` to verify puppet patch works; looks like osm import was disabled as intended `Notice: /Stage[main]/Osm::Imposm3/Systemd::Service[imposm]/Service[imposm]/ensure: ensure changed 'running' to 'stopped'`
  • 17:26 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 17:25 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 17:25 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 17:22 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@2b67de7]: 0.3.107 (duration: 08m 26s)
  • 17:15 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.107` on canary `wdqs1003`; proceeding to rest of fleet
  • 17:14 ryankemper@deploy1002: Started deploy [wdqs/wdqs@2b67de7]: 0.3.107
  • 17:13 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.107`. Pre-deploy tests passing on canary `wdqs1003`
  • 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22902 and previous config saved to /var/cache/conftool/dbconfig/20220321-170731-marostegui.json
  • 17:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:58 brennen: trainsperiment (T300203): blockers currently cleared, will hold wmf.1 -> group2 until 18:00 UTC, per deployment calendar
  • 16:55 taavi@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/WikimediaEvents/includes/PageSplitter/PageSplitterHooks.php: Backport: PageSplitter: check for OutputPage::getTitle() returning null (T304331) (duration: 00m 50s)
  • 16:53 taavi@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaEvents/includes/PageSplitter/PageSplitterHooks.php: Backport: PageSplitter: check for OutputPage::getTitle() returning null (T304331) (duration: 00m 51s)
  • 16:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 16:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 16:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:46 razzi@cumin1001: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 16:44 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/Wikibase/repo/: Backport: Add display to wbsearchentities response even if empty (T104344) (duration: 00m 53s)
  • 16:15 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 16:14 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 16:13 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 16:13 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 16:12 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 16:12 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 16:11 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 16:11 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:11 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:11 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 16:11 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 16:10 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 16:10 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 16:10 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 16:10 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 16:09 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 16:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 16:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298557)', diff saved to https://phabricator.wikimedia.org/P22901 and previous config saved to /var/cache/conftool/dbconfig/20220321-160557-marostegui.json
  • 16:05 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 16:04 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 16:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 16:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 16:00 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 16:00 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 15:59 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 15:59 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 15:58 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 15:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:57 otto@deploy1002: Finished deploy [analytics/refinery@cd7bf7a] (hadoop-test): fix prometheus pushgateway url - T294420 (duration: 07m 18s)
  • 15:57 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:57 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 15:57 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 15:57 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 15:56 reedy@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: T304111 (duration: 00m 50s)
  • 15:56 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 15:56 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 15:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 15:54 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 15:54 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22900 and previous config saved to /var/cache/conftool/dbconfig/20220321-155052-marostegui.json
  • 15:50 otto@deploy1002: Started deploy [analytics/refinery@cd7bf7a] (hadoop-test): fix prometheus pushgateway url - T294420
  • 15:50 otto@deploy1002: Finished deploy [analytics/refinery@33f66db] (hadoop-test): fix prometheus pushgateway url - T294420 (duration: 00m 03s)
  • 15:50 otto@deploy1002: Started deploy [analytics/refinery@33f66db] (hadoop-test): fix prometheus pushgateway url - T294420
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T300775)', diff saved to https://phabricator.wikimedia.org/P22899 and previous config saved to /var/cache/conftool/dbconfig/20220321-154607-marostegui.json
  • 15:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300775)', diff saved to https://phabricator.wikimedia.org/P22898 and previous config saved to /var/cache/conftool/dbconfig/20220321-154559-marostegui.json
  • 15:44 reedy@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/Flow/maintenance: T304318 (duration: 00m 49s)
  • 15:44 razzi@cumin1001: START - Cookbook sre.wikireplicas.update-views
  • 15:43 reedy@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/Flow/maintenance: T304318 (duration: 00m 51s)
  • 15:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 15:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 53s)
  • 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22897 and previous config saved to /var/cache/conftool/dbconfig/20220321-153547-marostegui.json
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22896 and previous config saved to /var/cache/conftool/dbconfig/20220321-153054-marostegui.json
  • 15:27 mmandere: pool cp1075 with HAProxy as TLS termination layer - T290005
  • 15:23 jnuche@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.1 refs T300203 (duration: 01m 54s)
  • 15:21 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.1 refs T300203
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298557)', diff saved to https://phabricator.wikimedia.org/P22895 and previous config saved to /var/cache/conftool/dbconfig/20220321-152041-marostegui.json
  • 15:19 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1075.eqiad.wmnet with OS buster
  • 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22894 and previous config saved to /var/cache/conftool/dbconfig/20220321-151549-marostegui.json
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T298557)', diff saved to https://phabricator.wikimedia.org/P22893 and previous config saved to /var/cache/conftool/dbconfig/20220321-150417-marostegui.json
  • 15:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 15:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 15:01 otto@deploy1002: Finished deploy [analytics/refinery@33f66db] (hadoop-test): gobblin-wmf-core-1.0.1 - T297939 (duration: 07m 10s)
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300775)', diff saved to https://phabricator.wikimedia.org/P22891 and previous config saved to /var/cache/conftool/dbconfig/20220321-150044-marostegui.json
  • 14:58 XioNoX: asw2-b-eqiad> request virtual-chassis vc-port delete pic-slot 1 port 2 member 5 - T304316
  • 14:54 otto@deploy1002: Started deploy [analytics/refinery@33f66db] (hadoop-test): gobblin-wmf-core-1.0.1 - T297939
  • 14:49 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
  • 14:47 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1075.eqiad.wmnet with reason: host reimage
  • 14:43 oblivian@puppetmaster1001: conftool action : set/enabled=false; selector: name=parameter_q,cluster=cache-text
  • 14:35 hashar: Restarting CI Zuul server
  • 14:31 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.1 refs T300203
  • 14:31 hashar: restarting Apache on gerrit2001 and gerrit1001
  • 14:30 mmandere@cumin1001: START - Cookbook sre.hosts.reimage for host cp1075.eqiad.wmnet with OS buster
  • 14:28 hashar@deploy1002: Synchronized php-1.39.0-wmf.1/extensions/Echo: Revert "Call IDatabase::timestamp before inserting rows" - T304307 (duration: 00m 52s)
  • 14:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 14:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298557)', diff saved to https://phabricator.wikimedia.org/P22890 and previous config saved to /var/cache/conftool/dbconfig/20220321-141922-marostegui.json
  • 14:11 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 14:09 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 14:07 mmandere: depool cp1075 - T290005
  • 14:07 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 14:05 mmandere: depool cp1074 - T290005
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22889 and previous config saved to /var/cache/conftool/dbconfig/20220321-140417-marostegui.json
  • 14:04 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 14:03 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 14:03 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 14:02 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 14:01 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 13:55 otto@deploy1002: Finished deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - T297939 (duration: 00m 06s)
  • 13:55 otto@deploy1002: Started deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - T297939
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22888 and previous config saved to /var/cache/conftool/dbconfig/20220321-134912-marostegui.json
  • 13:34 Lucas_WMDE: UTC afternoon backport window done
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298557)', diff saved to https://phabricator.wikimedia.org/P22887 and previous config saved to /var/cache/conftool/dbconfig/20220321-133407-marostegui.json
  • 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "ptwiki: Disable Growth's image recommendation" (T304095) (duration: 00m 49s)
  • 13:29 otto@deploy1002: Finished deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - T297939 (duration: 08m 53s)
  • 13:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused wgWMESearchRelevancePages config variable (duration: 00m 50s)
  • 13:20 otto@deploy1002: Started deploy [analytics/refinery@11909fa] (hadoop-test): gobblin-wmf-core-1.0.1 - T297939
  • 13:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Create "pagemover" group at azwiki (T303752) (duration: 00m 50s)
  • 13:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove changetags right from users on wikidatawiki and testwikidatawiki (T303682) (while keeping applychangetags right) (duration: 00m 49s)
  • 12:58 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
  • 12:48 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
  • 12:42 jnuche@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.38.0-wmf.26"
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T298557)', diff saved to https://phabricator.wikimedia.org/P22885 and previous config saved to /var/cache/conftool/dbconfig/20220321-123055-marostegui.json
  • 12:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298557)', diff saved to https://phabricator.wikimedia.org/P22884 and previous config saved to /var/cache/conftool/dbconfig/20220321-123042-marostegui.json
  • 12:22 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.1 refs T300203
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22883 and previous config saved to /var/cache/conftool/dbconfig/20220321-121537-marostegui.json
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22882 and previous config saved to /var/cache/conftool/dbconfig/20220321-120032-marostegui.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298557)', diff saved to https://phabricator.wikimedia.org/P22881 and previous config saved to /var/cache/conftool/dbconfig/20220321-114527-marostegui.json
  • 11:41 jnuche@deploy1002: Pruned MediaWiki: 1.38.0-wmf.25 (duration: 01m 32s)
  • 11:38 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@cbc85d3] (eqiad): Update kartotherian to 2ef5c2d (duration: 01m 40s)
  • 11:36 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@cbc85d3] (eqiad): Update kartotherian to 2ef5c2d
  • 11:36 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@cbc85d3] (codfw): Update kartotherian to 2ef5c2d (duration: 02m 51s)
  • 11:35 jnuche@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.1 refs T300203 (duration: 81m 15s)
  • 11:33 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@cbc85d3] (codfw): Update kartotherian to 2ef5c2d
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T298557)', diff saved to https://phabricator.wikimedia.org/P22880 and previous config saved to /var/cache/conftool/dbconfig/20220321-112217-marostegui.json
  • 11:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22879 and previous config saved to /var/cache/conftool/dbconfig/20220321-112210-marostegui.json
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22878 and previous config saved to /var/cache/conftool/dbconfig/20220321-110705-marostegui.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22877 and previous config saved to /var/cache/conftool/dbconfig/20220321-105159-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22876 and previous config saved to /var/cache/conftool/dbconfig/20220321-103654-marostegui.json
  • 10:13 jnuche@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.1 refs T300203
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22875 and previous config saved to /var/cache/conftool/dbconfig/20220321-094614-marostegui.json
  • 09:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:32 hashar: 1.39.0-wmf.1 train is delayed due to a CI / npm build failure which is being resolved T300203
  • 09:08 dcausse: restarting blazegraph on wdqs2001 (stuck)
  • 09:07 moritzm: restarting FPM
  • 09:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 09:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298557)', diff saved to https://phabricator.wikimedia.org/P22874 and previous config saved to /var/cache/conftool/dbconfig/20220321-090250-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22873 and previous config saved to /var/cache/conftool/dbconfig/20220321-084745-marostegui.json
  • 08:43 hashar: Train blocked due to a npm checksum mismatch preventing CI from merging in the mediawiki/core 1.39.0-wmf.1 change which create the branch. T304286
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22872 and previous config saved to /var/cache/conftool/dbconfig/20220321-083240-marostegui.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22871 and previous config saved to /var/cache/conftool/dbconfig/20220321-083050-root.json
  • 08:23 dcausse: restarting blazegraph on wdqs2003 (stuck for 16 hours)
  • 08:19 moritzm: installing openssl security updates
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298557)', diff saved to https://phabricator.wikimedia.org/P22870 and previous config saved to /var/cache/conftool/dbconfig/20220321-081735-marostegui.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22869 and previous config saved to /var/cache/conftool/dbconfig/20220321-081546-root.json
  • 08:09 dcausse: restarting blazegraph on wdqs2004 and wdqs2002 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22868 and previous config saved to /var/cache/conftool/dbconfig/20220321-080042-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22867 and previous config saved to /var/cache/conftool/dbconfig/20220321-074538-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22866 and previous config saved to /var/cache/conftool/dbconfig/20220321-073033-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298557)', diff saved to https://phabricator.wikimedia.org/P22865 and previous config saved to /var/cache/conftool/dbconfig/20220321-072902-marostegui.json
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 07:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22864 and previous config saved to /var/cache/conftool/dbconfig/20220321-072854-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22863 and previous config saved to /var/cache/conftool/dbconfig/20220321-071349-marostegui.json
  • 07:12 urbanecm: UTC morning B&C done
  • 07:08 urbanecm: Create `wikishared.cx_significant_edits` and `wikishared.cx_section_translation` at x1 (T302371; `mwscript sql.php --wiki=aawiki --wikidb=wikishared --cluster=extension1 /srv/mediawiki-staging/php-1.38.0-wmf.26/extensions/ContentTranslation/sql/{section-translations,significant-edits}.sql)`)
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22862 and previous config saved to /var/cache/conftool/dbconfig/20220321-065844-marostegui.json
  • 06:43 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1175.eqiad.wmnet with OS bullseye
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22861 and previous config saved to /var/cache/conftool/dbconfig/20220321-064339-marostegui.json
  • 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye
  • 06:19 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1175.eqiad.wmnet with OS bullseye
  • 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1175.eqiad.wmnet with OS bullseye
  • 05:52 marostegui: dbmaint s5@eqiad T300600
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175 reimage T300600', diff saved to https://phabricator.wikimedia.org/P22860 and previous config saved to /var/cache/conftool/dbconfig/20220321-055202-marostegui.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298557)', diff saved to https://phabricator.wikimedia.org/P22859 and previous config saved to /var/cache/conftool/dbconfig/20220321-054838-marostegui.json
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P22858 and previous config saved to /var/cache/conftool/dbconfig/20220321-054358-marostegui.json
  • 05:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 05:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance

2022-03-20

  • 23:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T300775)', diff saved to https://phabricator.wikimedia.org/P22857 and previous config saved to /var/cache/conftool/dbconfig/20220320-234358-marostegui.json
  • 23:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300775)', diff saved to https://phabricator.wikimedia.org/P22856 and previous config saved to /var/cache/conftool/dbconfig/20220320-234350-marostegui.json
  • 23:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22855 and previous config saved to /var/cache/conftool/dbconfig/20220320-232845-marostegui.json
  • 23:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22854 and previous config saved to /var/cache/conftool/dbconfig/20220320-231340-marostegui.json
  • 22:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300775)', diff saved to https://phabricator.wikimedia.org/P22853 and previous config saved to /var/cache/conftool/dbconfig/20220320-225835-marostegui.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T300775)', diff saved to https://phabricator.wikimedia.org/P22850 and previous config saved to /var/cache/conftool/dbconfig/20220320-081713-marostegui.json
  • 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300775)', diff saved to https://phabricator.wikimedia.org/P22849 and previous config saved to /var/cache/conftool/dbconfig/20220320-081705-marostegui.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22848 and previous config saved to /var/cache/conftool/dbconfig/20220320-080200-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22847 and previous config saved to /var/cache/conftool/dbconfig/20220320-074655-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300775)', diff saved to https://phabricator.wikimedia.org/P22846 and previous config saved to /var/cache/conftool/dbconfig/20220320-073150-marostegui.json

2022-03-19

  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T300775)', diff saved to https://phabricator.wikimedia.org/P22845 and previous config saved to /var/cache/conftool/dbconfig/20220319-171757-marostegui.json
  • 17:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 17:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300775)', diff saved to https://phabricator.wikimedia.org/P22844 and previous config saved to /var/cache/conftool/dbconfig/20220319-171749-marostegui.json
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22843 and previous config saved to /var/cache/conftool/dbconfig/20220319-170244-marostegui.json
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22842 and previous config saved to /var/cache/conftool/dbconfig/20220319-164739-marostegui.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300775)', diff saved to https://phabricator.wikimedia.org/P22841 and previous config saved to /var/cache/conftool/dbconfig/20220319-163234-marostegui.json
  • 13:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 13:48 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 13:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 13:35 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 13:34 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 13:23 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=piwiki --move-talk --fix # T304201
  • 13:20 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 04:26 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 04:05 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 04:01 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 03:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 03:51 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 03:29 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 03:28 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 03:28 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 03:28 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 03:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 02:52 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 02:27 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 02:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T300775)', diff saved to https://phabricator.wikimedia.org/P22839 and previous config saved to /var/cache/conftool/dbconfig/20220319-015847-marostegui.json
  • 01:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300775)', diff saved to https://phabricator.wikimedia.org/P22838 and previous config saved to /var/cache/conftool/dbconfig/20220319-015839-marostegui.json
  • 01:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 01:46 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 01:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22837 and previous config saved to /var/cache/conftool/dbconfig/20220319-014334-marostegui.json
  • 01:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 01:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22836 and previous config saved to /var/cache/conftool/dbconfig/20220319-012829-marostegui.json
  • 01:23 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300775)', diff saved to https://phabricator.wikimedia.org/P22835 and previous config saved to /var/cache/conftool/dbconfig/20220319-011324-marostegui.json
  • 00:58 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 00:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye

2022-03-18

  • 21:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 21:12 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 21:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 15:38 jayme: powercycle kubernetes1002
  • 14:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:26 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/FlaggedRevs/backend/FlaggedRevs.php: Backport: Don't pass the revision to PO access service (T304127) (duration: 00m 49s)
  • 14:12 XioNoX: configure NAT for civi1002 - T304098
  • 14:02 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 14:02 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 14:01 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 14:01 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 13:59 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 13:59 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 13:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test sync - jbond@cumin1001"
  • 13:07 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test sync - jbond@cumin1001"
  • 13:02 moritzm: imported python3.5 3.5.3-1+deb9u5+wmf1 to component/python35 T303801
  • 12:35 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 11:35 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 11:33 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 11:32 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 11:30 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 11:29 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 11:28 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 11:09 vgutierrez: rolling restart of nginx on ncredir instances to catch up on OpenSSL updates
  • 11:05 vgutierrez: restarting acme-chief and acme-chief API services to catch up on OpenSSL updates
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 10:54 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 10:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 10:52 akosiaris: drain kubernetes200[1-4] T303045
  • 10:51 akosiaris: depool kubernetes200[1-4] T303045
  • 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2004.codfw.wmnet
  • 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2003.codfw.wmnet
  • 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2002.codfw.wmnet
  • 10:50 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2001.codfw.wmnet
  • 10:01 akosiaris: drain kubernetes100[1-4] T303044
  • 09:54 akosiaris: depool kubernetes100[1-4] from pybal T303044
  • 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1004.eqiad.wmnet
  • 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1003.eqiad.wmnet
  • 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1002.eqiad.wmnet
  • 09:52 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1001.eqiad.wmnet
  • 09:42 akosiaris: uncordon kubernetes1018-1022. T293728. Nodes are live, ready to receive workloads and traffic.
  • 09:37 akosiaris: pool kubernetes1018-1022 in pybal. T293728
  • 09:37 akosiaris: pool kubernetes1018-1022 in pybal.
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1022.eqiad.wmnet
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1021.eqiad.wmnet
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1020.eqiad.wmnet
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1019.eqiad.wmnet
  • 09:37 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1018.eqiad.wmnet
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T300775)', diff saved to https://phabricator.wikimedia.org/P22827 and previous config saved to /var/cache/conftool/dbconfig/20220318-093543-marostegui.json
  • 09:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1022.eqiad.wmnet
  • 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1021.eqiad.wmnet
  • 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1020.eqiad.wmnet
  • 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1019.eqiad.wmnet
  • 09:35 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1018.eqiad.wmnet
  • 09:10 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:08 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298557)', diff saved to https://phabricator.wikimedia.org/P22826 and previous config saved to /var/cache/conftool/dbconfig/20220318-085517-marostegui.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22825 and previous config saved to /var/cache/conftool/dbconfig/20220318-084012-marostegui.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22824 and previous config saved to /var/cache/conftool/dbconfig/20220318-082507-marostegui.json
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298557)', diff saved to https://phabricator.wikimedia.org/P22823 and previous config saved to /var/cache/conftool/dbconfig/20220318-081002-marostegui.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22822 and previous config saved to /var/cache/conftool/dbconfig/20220318-072852-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298557)', diff saved to https://phabricator.wikimedia.org/P22821 and previous config saved to /var/cache/conftool/dbconfig/20220318-071758-marostegui.json
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298557)', diff saved to https://phabricator.wikimedia.org/P22820 and previous config saved to /var/cache/conftool/dbconfig/20220318-071750-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22819 and previous config saved to /var/cache/conftool/dbconfig/20220318-071348-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22818 and previous config saved to /var/cache/conftool/dbconfig/20220318-070245-marostegui.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22817 and previous config saved to /var/cache/conftool/dbconfig/20220318-065844-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22816 and previous config saved to /var/cache/conftool/dbconfig/20220318-064740-marostegui.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22815 and previous config saved to /var/cache/conftool/dbconfig/20220318-064340-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22814 and previous config saved to /var/cache/conftool/dbconfig/20220318-063631-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22813 and previous config saved to /var/cache/conftool/dbconfig/20220318-063524-root.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298557)', diff saved to https://phabricator.wikimedia.org/P22812 and previous config saved to /var/cache/conftool/dbconfig/20220318-063235-marostegui.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22811 and previous config saved to /var/cache/conftool/dbconfig/20220318-062836-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22810 and previous config saved to /var/cache/conftool/dbconfig/20220318-062127-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22809 and previous config saved to /var/cache/conftool/dbconfig/20220318-062020-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22808 and previous config saved to /var/cache/conftool/dbconfig/20220318-061332-root.json
  • 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1179.eqiad.wmnet with OS bullseye
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22807 and previous config saved to /var/cache/conftool/dbconfig/20220318-060623-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22806 and previous config saved to /var/cache/conftool/dbconfig/20220318-060516-root.json
  • 05:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1179.eqiad.wmnet with reason: host reimage
  • 05:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1179.eqiad.wmnet with reason: host reimage
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22805 and previous config saved to /var/cache/conftool/dbconfig/20220318-055119-root.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22804 and previous config saved to /var/cache/conftool/dbconfig/20220318-055012-root.json
  • 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1179.eqiad.wmnet with OS bullseye
  • 05:39 marostegui: dbmaint on s3@eqiad T300600
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 reimage T300600', diff saved to https://phabricator.wikimedia.org/P22803 and previous config saved to /var/cache/conftool/dbconfig/20220318-053832-marostegui.json
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 10%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22802 and previous config saved to /var/cache/conftool/dbconfig/20220318-053615-root.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22801 and previous config saved to /var/cache/conftool/dbconfig/20220318-053508-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298557)', diff saved to https://phabricator.wikimedia.org/P22800 and previous config saved to /var/cache/conftool/dbconfig/20220318-053443-marostegui.json
  • 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 05:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 01:23 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 01:16 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 01:14 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye

2022-03-17

  • 22:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:36 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.26 refs T300202
  • 22:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:28 derick@deploy1002: Synchronized wmf-config/MetaContactPages.php: Config: Add new field to capture application URL link on Meta (duration: 00m 50s)
  • 22:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:17 derick@deploy1002: Finished scap: Backport: Add & improve message for the chapter/thorg application contact form (duration: 11m 37s)
  • 22:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:05 derick@deploy1002: Started scap: Backport: Add & improve message for the chapter/thorg application contact form
  • 22:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:00 brennen@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "Revert "Revert "Enable Parsoid API everywhere""" (duration: 00m 51s)
  • 21:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:48 brennen@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "Revert "Enable Parsoid API everywhere"" (duration: 00m 51s)
  • 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:45 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 21:44 rzl@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 21:44 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 21:42 rzl@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 21:42 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 21:42 rzl@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 21:42 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 21:41 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 21:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 21:41 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 21:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 21:40 rzl@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 21:35 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 21:26 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 21:26 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 21:26 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:26 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 21:25 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 21:25 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:25 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 21:25 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 21:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 21:24 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 21:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:23 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 21:21 cjming@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/T299104.php: Backport: Update invalid skin preference update script (T299104) (duration: 00m 51s)
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:11 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.26 refs T300202 (duration: 00m 50s)
  • 21:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.26 refs T300202
  • 20:57 ladsgroup@deploy1002: Finished scap: Revert "rdbms: Followups to automatic connection recovery patch" (duration: 11m 50s)
  • 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:45 ladsgroup@deploy1002: Started scap: Revert "rdbms: Followups to automatic connection recovery patch"
  • 20:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300775)', diff saved to https://phabricator.wikimedia.org/P22798 and previous config saved to /var/cache/conftool/dbconfig/20220317-204128-marostegui.json
  • 20:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:39 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-cache1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:35 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:29 thcipriani@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "Enable Parsoid API everywhere" (T302081) (duration: 00m 50s)
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:28 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22797 and previous config saved to /var/cache/conftool/dbconfig/20220317-202623-marostegui.json
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22796 and previous config saved to /var/cache/conftool/dbconfig/20220317-201118-marostegui.json
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300775)', diff saved to https://phabricator.wikimedia.org/P22795 and previous config saved to /var/cache/conftool/dbconfig/20220317-195613-marostegui.json
  • 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:55 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 18:53 jhuneidi@deploy1002: Synchronized php-1.38.0-wmf.26/skins/Vector/includes/Hooks.php: Backport: Fix updateUserLinksDropdownItems not being called (T304002) (duration: 00m 50s)
  • 18:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:27 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 18:18 akosiaris: cordon kubernetes10{18..22} T293728
  • 18:12 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 18:01 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:50 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:47 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:46 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:41 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:41 arturo: uploaded prometheus-openstack-exporter 0.0.8-4~wmf1 to bullseye-wikimedia (T302178)
  • 17:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 17:36 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS bullseye
  • 17:35 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1019.eqiad.wmnet with OS bullseye
  • 17:34 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephmon1003.eqiad.wmnet on all recursors
  • 17:34 dcaro@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcephmon1003.eqiad.wmnet on all recursors
  • 17:33 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1020.eqiad.wmnet with OS bullseye
  • 17:30 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1018.eqiad.wmnet with OS bullseye
  • 17:28 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:28 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:28 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: sync
  • 17:28 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: sync
  • 17:27 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:25 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
  • 17:25 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
  • 17:25 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:24 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:24 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:23 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
  • 17:22 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
  • 17:22 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:21 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:21 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:21 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
  • 17:21 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:21 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:20 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:20 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
  • 17:20 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
  • 17:20 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
  • 17:18 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
  • 17:18 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:18 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
  • 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:15 dancy@deploy1002: Synchronized README: testing mediawiki image build (duration: 02m 11s)
  • 17:11 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:10 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 17:09 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1020.eqiad.wmnet with OS bullseye
  • 17:09 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1020.eqiad.wmnet with OS bullseye
  • 17:09 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 17:08 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 17:08 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS bullseye
  • 17:08 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1021.eqiad.wmnet with OS bullseye
  • 17:07 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1019.eqiad.wmnet with OS bullseye
  • 17:06 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1019.eqiad.wmnet with OS bullseye
  • 17:06 bblack: geodns - Cyprus routed to new drmrs edge DC (first live users!) - will phase in over the standard 10 minute DNS TTL
  • 17:05 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1018.eqiad.wmnet with OS bullseye
  • 17:04 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1018.eqiad.wmnet with OS bullseye
  • 17:03 volans: restart atftp on install1003
  • 17:01 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:00 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:00 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:50 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:48 XioNoX: disable BGP to Lumen in codfw for fiber move
  • 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298557)', diff saved to https://phabricator.wikimedia.org/P22794 and previous config saved to /var/cache/conftool/dbconfig/20220317-164228-marostegui.json
  • 16:42 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 16:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:40 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:36 moritzm: restarting LDAP replicas for openssl update
  • 16:35 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephmon1003.eqiad.wmnet on all recursors
  • 16:35 dcaro@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcephmon1003.eqiad.wmnet on all recursors
  • 16:35 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephmon1003.eqiad.wmnet on all recursors
  • 16:35 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache cloudcephmon1003.eqiad.wmnet on all recursors
  • 16:34 ryankemper: [WDQS] Pooled `wdqs2001` (caught up on lag)
  • 16:31 andrewbogott: sudo service networking restart on puppetmaster1003
  • 16:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22793 and previous config saved to /var/cache/conftool/dbconfig/20220317-162723-marostegui.json
  • 16:15 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22792 and previous config saved to /var/cache/conftool/dbconfig/20220317-161218-marostegui.json
  • 16:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:10 XioNoX: pfw3-codfw move traffic to cr2 uplink
  • 16:05 oblivian@puppetmaster1001: conftool action : edit; selector: name=random_q
  • 16:04 ryankemper: [WDQS] Depooled `wdqs2001` (~4.85 hours of lag to catch up)
  • 16:03 ryankemper: [WDQS] `ryankemper@wdqs2001:~$ sudo systemctl restart wdqs-blazegraph.service`
  • 16:03 ryankemper: [WDQS] Pooled `wdqs2003` (caught up on lag)
  • 16:00 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:00 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:00 moritzm: restarting apache on logstash*
  • 15:57 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 60980ce: ptwiki: Disable Growth image recommendation (T302828) (duration: 00m 53s)
  • 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298557)', diff saved to https://phabricator.wikimedia.org/P22790 and previous config saved to /var/cache/conftool/dbconfig/20220317-155713-marostegui.json
  • 15:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:46 XioNoX: cr1-codfw move xe-5/2/0 to xe-1/0/1:1 - T289241
  • 15:42 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:34 moritzm: restarting FPM on mw canaries
  • 15:31 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 15:31 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS bullseye
  • 15:30 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1020.eqiad.wmnet with OS bullseye
  • 15:07 XioNoX: disable BGP to Telia in codfw for fiber move - T289241
  • 15:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1019.eqiad.wmnet with OS bullseye
  • 15:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1018.eqiad.wmnet with OS bullseye
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298557)', diff saved to https://phabricator.wikimedia.org/P22789 and previous config saved to /var/cache/conftool/dbconfig/20220317-145716-marostegui.json
  • 14:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 14:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298557)', diff saved to https://phabricator.wikimedia.org/P22788 and previous config saved to /var/cache/conftool/dbconfig/20220317-145708-marostegui.json
  • 14:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22785 and previous config saved to /var/cache/conftool/dbconfig/20220317-144203-marostegui.json
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22784 and previous config saved to /var/cache/conftool/dbconfig/20220317-142658-marostegui.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298557)', diff saved to https://phabricator.wikimedia.org/P22783 and previous config saved to /var/cache/conftool/dbconfig/20220317-141152-marostegui.json
  • 14:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1067.eqiad.wmnet with reason: T303151
  • 14:05 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1067.eqiad.wmnet with reason: T303151
  • 14:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1063.eqiad.wmnet with reason: T303151
  • 14:05 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1063.eqiad.wmnet with reason: T303151
  • 13:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:43 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:17 Lucas_WMDE: UTC afternoon backport window done
  • 13:16 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: commonswiki: Add pictures.snsb.info to wgCopyUploadsDomains allowlist (T303929) (duration: 00m 50s)
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298557)', diff saved to https://phabricator.wikimedia.org/P22782 and previous config saved to /var/cache/conftool/dbconfig/20220317-131227-marostegui.json
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298557)', diff saved to https://phabricator.wikimedia.org/P22781 and previous config saved to /var/cache/conftool/dbconfig/20220317-131220-marostegui.json
  • 13:09 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Write "unexpectedUnconnectedPage" page prop on Beta – no expected behavior change in production (3/3) (duration: 00m 49s)
  • 13:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Write "unexpectedUnconnectedPage" page prop on Beta – no expected behavior change in production (2/3) (duration: 00m 49s)
  • 13:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Write "unexpectedUnconnectedPage" page prop on Beta – no expected behavior change in production (1/3) (duration: 00m 53s)
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22780 and previous config saved to /var/cache/conftool/dbconfig/20220317-125715-marostegui.json
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22779 and previous config saved to /var/cache/conftool/dbconfig/20220317-124209-marostegui.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298557)', diff saved to https://phabricator.wikimedia.org/P22778 and previous config saved to /var/cache/conftool/dbconfig/20220317-122704-marostegui.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22777 and previous config saved to /var/cache/conftool/dbconfig/20220317-120700-root.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22776 and previous config saved to /var/cache/conftool/dbconfig/20220317-115156-root.json
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22775 and previous config saved to /var/cache/conftool/dbconfig/20220317-115012-root.json
  • 11:42 volans: upgrades spicerack on cumin hosts to v2.3.3
  • 11:41 volans: uploaded spicerack_2.3.3 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22774 and previous config saved to /var/cache/conftool/dbconfig/20220317-113652-root.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22773 and previous config saved to /var/cache/conftool/dbconfig/20220317-113508-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298557)', diff saved to https://phabricator.wikimedia.org/P22772 and previous config saved to /var/cache/conftool/dbconfig/20220317-112921-marostegui.json
  • 11:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 11:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298557)', diff saved to https://phabricator.wikimedia.org/P22771 and previous config saved to /var/cache/conftool/dbconfig/20220317-112913-marostegui.json
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22770 and previous config saved to /var/cache/conftool/dbconfig/20220317-112148-root.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22769 and previous config saved to /var/cache/conftool/dbconfig/20220317-112004-root.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22768 and previous config saved to /var/cache/conftool/dbconfig/20220317-111408-marostegui.json
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22767 and previous config saved to /var/cache/conftool/dbconfig/20220317-110645-root.json
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298556)', diff saved to https://phabricator.wikimedia.org/P22766 and previous config saved to /var/cache/conftool/dbconfig/20220317-110536-marostegui.json
  • 11:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 11:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22765 and previous config saved to /var/cache/conftool/dbconfig/20220317-105903-marostegui.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22764 and previous config saved to /var/cache/conftool/dbconfig/20220317-105349-marostegui.json
  • 10:50 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-fe[1005-1008].eqiad.wmnet
  • 10:47 marostegui: dbmaint on s3@eqiad T298556
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298557)', diff saved to https://phabricator.wikimedia.org/P22763 and previous config saved to /var/cache/conftool/dbconfig/20220317-104358-marostegui.json
  • 10:40 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298556)', diff saved to https://phabricator.wikimedia.org/P22762 and previous config saved to /var/cache/conftool/dbconfig/20220317-103844-marostegui.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T298556)', diff saved to https://phabricator.wikimedia.org/P22761 and previous config saved to /var/cache/conftool/dbconfig/20220317-103726-marostegui.json
  • 10:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 10:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298556)', diff saved to https://phabricator.wikimedia.org/P22760 and previous config saved to /var/cache/conftool/dbconfig/20220317-103719-marostegui.json
  • 10:31 mvernon@cumin1001: START - Cookbook sre.dns.netbox
  • 10:26 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-fe[1005-1008].eqiad.wmnet
  • 10:24 marostegui: dbmaint on s3@codfw T298556
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22759 and previous config saved to /var/cache/conftool/dbconfig/20220317-102214-marostegui.json
  • 10:10 marostegui: dbmaint on s7@eqiad T298556
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22758 and previous config saved to /var/cache/conftool/dbconfig/20220317-100709-marostegui.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298556)', diff saved to https://phabricator.wikimedia.org/P22757 and previous config saved to /var/cache/conftool/dbconfig/20220317-095204-marostegui.json
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T298556)', diff saved to https://phabricator.wikimedia.org/P22756 and previous config saved to /var/cache/conftool/dbconfig/20220317-095044-marostegui.json
  • 09:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298557)', diff saved to https://phabricator.wikimedia.org/P22755 and previous config saved to /var/cache/conftool/dbconfig/20220317-094025-marostegui.json
  • 09:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298557)', diff saved to https://phabricator.wikimedia.org/P22754 and previous config saved to /var/cache/conftool/dbconfig/20220317-094017-marostegui.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22752 and previous config saved to /var/cache/conftool/dbconfig/20220317-092512-marostegui.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T297189)', diff saved to https://phabricator.wikimedia.org/P22751 and previous config saved to /var/cache/conftool/dbconfig/20220317-091911-marostegui.json
  • 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22750 and previous config saved to /var/cache/conftool/dbconfig/20220317-091007-marostegui.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298557)', diff saved to https://phabricator.wikimedia.org/P22749 and previous config saved to /var/cache/conftool/dbconfig/20220317-085502-marostegui.json
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Clarakosi out of all services on: 1881 hosts
  • 08:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Clarakosi out of all services on: 1881 hosts
  • 08:24 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: 0da40c2: throttle: Remove expired rules (duration: 00m 50s)
  • 08:23 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: 980ea35: Throttle: Increase limit for English Wikipedia (T304016) (duration: 00m 51s)
  • 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ppchelko out of all services on: 1881 hosts
  • 08:12 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ppchelko out of all services on: 1881 hosts
  • 08:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Accraze out of all services on: 1881 hosts
  • 08:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Accraze out of all services on: 1881 hosts
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22748 and previous config saved to /var/cache/conftool/dbconfig/20220317-080705-root.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298557)', diff saved to https://phabricator.wikimedia.org/P22747 and previous config saved to /var/cache/conftool/dbconfig/20220317-075350-marostegui.json
  • 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22746 and previous config saved to /var/cache/conftool/dbconfig/20220317-075201-root.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22745 and previous config saved to /var/cache/conftool/dbconfig/20220317-073658-root.json
  • 07:31 marostegui: dbmaint on s5@eqiad T297189
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22744 and previous config saved to /var/cache/conftool/dbconfig/20220317-072154-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22743 and previous config saved to /var/cache/conftool/dbconfig/20220317-071200-root.json
  • 07:11 ryankemper: [WDQS] Depooled `wdqs2003` (8 hours of lag to catch up on)
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22742 and previous config saved to /var/cache/conftool/dbconfig/20220317-070650-root.json
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 06:57 ryankemper: [WDQS] Also of note is the spiking thread counts on the affected hosts: https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1647457172391&to=1647500081971&viewPanel=22
  • 06:57 ryankemper: [WDQS] Note that per https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1647457172391&to=1647500081971&viewPanel=7 `wdqs2003` has been offline for ~6 hours, `wdqs2001` for 1.5 hours and `wdqs2004` just recently.
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22741 and previous config saved to /var/cache/conftool/dbconfig/20220317-065656-root.json
  • 06:54 ryankemper: [WDQS] `ryankemper@wdqs2003:~$ sudo systemctl restart wdqs-blazegraph.service`
  • 06:53 ryankemper: [WDQS] `ryankemper@wdqs2001:~$ sudo systemctl restart wdqs-blazegraph.service`
  • 06:50 elukey: restart blazegraph on wdqs2004
  • 06:46 elukey: kill remaining hanging processes for ppche*lko and accra*ze on an-test-client1001 to allow users offboard (puppet broken)
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22740 and previous config saved to /var/cache/conftool/dbconfig/20220317-064152-root.json
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22739 and previous config saved to /var/cache/conftool/dbconfig/20220317-062648-root.json
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After buffer pool testing', diff saved to https://phabricator.wikimedia.org/P22738 and previous config saved to /var/cache/conftool/dbconfig/20220317-061144-root.json
  • 04:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T300775)', diff saved to https://phabricator.wikimedia.org/P22737 and previous config saved to /var/cache/conftool/dbconfig/20220317-040634-marostegui.json
  • 04:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 04:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 02:57 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 02:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 02:07 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 01:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye

2022-03-16

  • 23:52 tzatziki: Removing two files for legal compliance
  • 21:17 cjming: end running skin update preference maintenance script
  • 20:52 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:40 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: [no-op] 8efa537: GrowthExperiments: Set GEWelcomeSurveyShowMailingListQuestion (T303240) (duration: 00m 53s)
  • 20:38 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:35 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/: 9ba157b: Add insert option for update skin preferences script (T299104) (duration: 00m 50s)
  • 20:34 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/WikimediaMaintenance/: ebfc516: Add script to update vector skin preferences (T299104) (duration: 00m 51s)
  • 20:32 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:24 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 20:13 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:13 urbanecm@deploy1002: Synchronized docroot/noc/db.php: f649199: Migrate wmfDatacenter(s) to wmgDatacenter(s) (T45956; 3/3) (duration: 00m 49s)
  • 20:12 urbanecm@deploy1002: Synchronized multiversion/: f649199: Migrate wmfDatacenter(s) to wmgDatacenter(s) (T45956; 2/3) (duration: 00m 50s)
  • 20:11 urbanecm@deploy1002: Synchronized wmf-config/: f649199: Migrate wmfDatacenter(s) to wmgDatacenter(s) (T45956; 1/3) (duration: 00m 50s)
  • 19:22 otto@deploy1002: Finished deploy [analytics/refinery@2d2056a] (hadoop-test): (no justification provided) (duration: 07m 50s)
  • 19:14 otto@deploy1002: Started deploy [analytics/refinery@2d2056a] (hadoop-test): (no justification provided)
  • 18:32 sukhe: running: homer "cr*-drmrs*" commit "Gerrit 771359: Set up BGP peering in drmrs for Wikidough."
  • 18:09 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics_test@257960f] (duration: 00m 08s)
  • 18:09 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics_test@257960f]
  • 18:02 aqu@deploy1002: Finished deploy [airflow-dags/analytics@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics@257960f] (duration: 00m 08s)
  • 18:02 aqu@deploy1002: Started deploy [airflow-dags/analytics@257960f]: Migrate session_length/daily from Oozie to Airflow [airflow-dags/analytics@257960f]
  • 18:00 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
  • 18:00 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on karapace1001.eqiad.wmnet with reason: Setting up karapace for the first time
  • 17:36 dancy@deploy1002: Synchronized multiversion/MWMultiVersion.php: Config: mwscript: Support --force-version flag (T303878) (duration: 00m 57s)
  • 17:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 17:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 17:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 17:13 aqu@deploy1002: Finished deploy [analytics/refinery@d039471] (hadoop-test): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471] (duration: 07m 23s)
  • 17:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster
  • 17:06 aqu@deploy1002: Started deploy [analytics/refinery@d039471] (hadoop-test): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471]
  • 17:06 aqu@deploy1002: Finished deploy [analytics/refinery@d039471] (thin): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471] (duration: 00m 07s)
  • 17:06 aqu@deploy1002: Started deploy [analytics/refinery@d039471] (thin): Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471]
  • 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:48 aqu@deploy1002: Finished deploy [analytics/refinery@d039471]: Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471] (duration: 25m 49s)
  • 16:45 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 16:37 Emperor: rolling restart of ms-fe10[09-12] so they know about removal of older proxies T303733
  • 16:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
  • 16:28 Emperor: moving swiftrepl and stats reporter host from ms-fe1005 to ms-fe1009 T303733
  • 16:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
  • 16:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22734 and previous config saved to /var/cache/conftool/dbconfig/20220316-162721-marostegui.json
  • 16:22 aqu@deploy1002: Started deploy [analytics/refinery@d039471]: Migrate session_length/daily from Oozie to Airflow [analytics/refinery@d039471]
  • 16:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22733 and previous config saved to /var/cache/conftool/dbconfig/20220316-161216-marostegui.json
  • 16:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS buster
  • 16:02 aqu: analytics/refinery - scap deply "Migrate session_length/daily from Oozie to Airflow"
  • 15:59 dancy@deploy1002: Synchronized README: testing mediawiki image build (duration: 02m 11s)
  • 15:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22732 and previous config saved to /var/cache/conftool/dbconfig/20220316-155711-marostegui.json
  • 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298557)', diff saved to https://phabricator.wikimedia.org/P22731 and previous config saved to /var/cache/conftool/dbconfig/20220316-155300-marostegui.json
  • 15:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 15:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 15:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 15:52 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS buster
  • 15:51 moritzm: restarting exim/spamasassin on MXes to pick up new OpenSSL
  • 15:49 urbanecm@deploy1002: Synchronized wmf-config/logos.php: cswiki celebration logo (duration: 00m 49s)
  • 15:46 urbanecm@deploy1002: Synchronized static/images/project-logos/: cswiki celebration logos (duration: 00m 50s)
  • 15:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:43 dancy@deploy1002: scap failed: RuntimeError dictionary changed size during iteration (duration: 25m 55s)
  • 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22730 and previous config saved to /var/cache/conftool/dbconfig/20220316-154206-marostegui.json
  • 15:38 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 15:37 ryankemper: [WCQS] Restarted updater across fleet to get out jvm sec upgrades: `ryankemper@cumin1001:~$ sudo -E cumin 'wcqs*' 'systemctl restart wcqs-updater.service'`
  • 15:35 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 15:35 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 15:17 dancy@deploy1002: Started scap: testing mediawiki image build
  • 15:15 dancy@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/bin/make -C /srv/mwbuilder/release/make-container-image -f Makefile build-and-push-all-images http_proxy=http://webproxy.eqiad.wmnet:8080 https_proxy=http://webproxy.eqiad.wmnet:8080 GIT_BASE=https://gerrit.wikimedia.org/r/ BRANCH=master workdir_volume=/srv/mediawiki-staging mv_image_name=docker-registry.discovery.wmnet/restricted/mediaw
  • 15:12 dancy@deploy1002: Started scap: (no justification provided)
  • 15:11 dancy: Testing mediawiki image build on deploy server again
  • 15:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
  • 15:08 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
  • 15:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 15:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 15:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 15:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298557)', diff saved to https://phabricator.wikimedia.org/P22729 and previous config saved to /var/cache/conftool/dbconfig/20220316-150433-marostegui.json
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22728 and previous config saved to /var/cache/conftool/dbconfig/20220316-145946-marostegui.json
  • 14:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 14:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 14:55 sukhe: rolling restart of nginx.service on durum* hosts for OpenSSL updates
  • 14:55 cjming@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/WikimediaMaintenance/T299104.php: Backport: Add script to update vector skin preferences (T299104) (duration: 00m 51s)
  • 14:53 moritzm: restarting nginx/dhcpd on install/apt servers
  • 14:53 sukhe: rolling restart of pdns-recursor.service and dnsdist.service on doh* hosts for OpenSSL updates
  • 14:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:52 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 14:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22727 and previous config saved to /var/cache/conftool/dbconfig/20220316-144928-marostegui.json
  • 14:47 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 14:46 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS buster
  • 14:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 14:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 14:45 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 14:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS buster
  • 14:35 XioNoX: add anycast6 peers in drmrs
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22726 and previous config saved to /var/cache/conftool/dbconfig/20220316-143423-marostegui.json
  • 14:25 Emperor: depooling ms-fe100[5-8] prior to decommissioning T303733
  • 14:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298557)', diff saved to https://phabricator.wikimedia.org/P22725 and previous config saved to /var/cache/conftool/dbconfig/20220316-141918-marostegui.json
  • 14:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
  • 14:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 14:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T297189)', diff saved to https://phabricator.wikimedia.org/P22724 and previous config saved to /var/cache/conftool/dbconfig/20220316-141708-marostegui.json
  • 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:12 taavi@deploy1002: Synchronized php-1.38.0-wmf.26/extensions/CentralAuth/includes/User/CentralAuthUser.php: Backport: Replace use of deprecated RecentChange::getEngine (T303861) (duration: 00m 51s)
  • 14:10 herron: grafana1002:~# systemctl restart grafana-ldap-users-sync.service T303064
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22723 and previous config saved to /var/cache/conftool/dbconfig/20220316-140203-marostegui.json
  • 13:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS buster
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22722 and previous config saved to /var/cache/conftool/dbconfig/20220316-134658-marostegui.json
  • 13:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 13:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 13:44 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22721 and previous config saved to /var/cache/conftool/dbconfig/20220316-133458-marostegui.json
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T297189)', diff saved to https://phabricator.wikimedia.org/P22720 and previous config saved to /var/cache/conftool/dbconfig/20220316-133153-marostegui.json
  • 13:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS buster
  • 13:25 krinkle@deploy1002: Synchronized w/static.php: 159dfd21d (duration: 00m 50s)
  • 13:24 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS buster
  • 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22718 and previous config saved to /var/cache/conftool/dbconfig/20220316-131953-marostegui.json
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298557)', diff saved to https://phabricator.wikimedia.org/P22717 and previous config saved to /var/cache/conftool/dbconfig/20220316-131429-marostegui.json
  • 13:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 13:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298557)', diff saved to https://phabricator.wikimedia.org/P22716 and previous config saved to /var/cache/conftool/dbconfig/20220316-131421-marostegui.json
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:07 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Deploy template features to enwiki (T302857) (duration: 00m 50s)
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22715 and previous config saved to /var/cache/conftool/dbconfig/20220316-130448-marostegui.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22714 and previous config saved to /var/cache/conftool/dbconfig/20220316-125916-marostegui.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T297189)', diff saved to https://phabricator.wikimedia.org/P22713 and previous config saved to /var/cache/conftool/dbconfig/20220316-125803-marostegui.json
  • 12:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 12:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T297189)', diff saved to https://phabricator.wikimedia.org/P22712 and previous config saved to /var/cache/conftool/dbconfig/20220316-125755-marostegui.json
  • 12:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22711 and previous config saved to /var/cache/conftool/dbconfig/20220316-124943-marostegui.json
  • 12:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22710 and previous config saved to /var/cache/conftool/dbconfig/20220316-124742-marostegui.json
  • 12:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298294)', diff saved to https://phabricator.wikimedia.org/P22709 and previous config saved to /var/cache/conftool/dbconfig/20220316-124734-marostegui.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22708 and previous config saved to /var/cache/conftool/dbconfig/20220316-124411-marostegui.json
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22707 and previous config saved to /var/cache/conftool/dbconfig/20220316-124250-marostegui.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22705 and previous config saved to /var/cache/conftool/dbconfig/20220316-123229-marostegui.json
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298557)', diff saved to https://phabricator.wikimedia.org/P22704 and previous config saved to /var/cache/conftool/dbconfig/20220316-122906-marostegui.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22703 and previous config saved to /var/cache/conftool/dbconfig/20220316-122745-marostegui.json
  • 12:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS buster
  • 12:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 12:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 12:25 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22702 and previous config saved to /var/cache/conftool/dbconfig/20220316-121724-marostegui.json
  • 12:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS buster
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T297189)', diff saved to https://phabricator.wikimedia.org/P22701 and previous config saved to /var/cache/conftool/dbconfig/20220316-121240-marostegui.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298294)', diff saved to https://phabricator.wikimedia.org/P22700 and previous config saved to /var/cache/conftool/dbconfig/20220316-120219-marostegui.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298294)', diff saved to https://phabricator.wikimedia.org/P22699 and previous config saved to /var/cache/conftool/dbconfig/20220316-120100-marostegui.json
  • 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298294)', diff saved to https://phabricator.wikimedia.org/P22698 and previous config saved to /var/cache/conftool/dbconfig/20220316-120047-marostegui.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22697 and previous config saved to /var/cache/conftool/dbconfig/20220316-114542-marostegui.json
  • 11:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T297189)', diff saved to https://phabricator.wikimedia.org/P22695 and previous config saved to /var/cache/conftool/dbconfig/20220316-113200-marostegui.json
  • 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22694 and previous config saved to /var/cache/conftool/dbconfig/20220316-113152-marostegui.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298557)', diff saved to https://phabricator.wikimedia.org/P22693 and previous config saved to /var/cache/conftool/dbconfig/20220316-113057-marostegui.json
  • 11:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 11:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22692 and previous config saved to /var/cache/conftool/dbconfig/20220316-113037-marostegui.json
  • 11:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22691 and previous config saved to /var/cache/conftool/dbconfig/20220316-111647-marostegui.json
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298294)', diff saved to https://phabricator.wikimedia.org/P22690 and previous config saved to /var/cache/conftool/dbconfig/20220316-111532-marostegui.json
  • 11:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS buster
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298294)', diff saved to https://phabricator.wikimedia.org/P22689 and previous config saved to /var/cache/conftool/dbconfig/20220316-110411-marostegui.json
  • 11:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298294)', diff saved to https://phabricator.wikimedia.org/P22688 and previous config saved to /var/cache/conftool/dbconfig/20220316-110403-marostegui.json
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22687 and previous config saved to /var/cache/conftool/dbconfig/20220316-110142-marostegui.json
  • 10:55 vgutierrez: rolling upgrade to HAProxy 2.4.15 on cache nodes
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22686 and previous config saved to /var/cache/conftool/dbconfig/20220316-104858-marostegui.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22685 and previous config saved to /var/cache/conftool/dbconfig/20220316-104637-marostegui.json
  • 10:42 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 10:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22684 and previous config saved to /var/cache/conftool/dbconfig/20220316-103353-marostegui.json
  • 10:28 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298294)', diff saved to https://phabricator.wikimedia.org/P22683 and previous config saved to /var/cache/conftool/dbconfig/20220316-101848-marostegui.json
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298294)', diff saved to https://phabricator.wikimedia.org/P22682 and previous config saved to /var/cache/conftool/dbconfig/20220316-101729-marostegui.json
  • 10:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 10 hosts with reason: Maintenance
  • 10:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 10 hosts with reason: Maintenance
  • 10:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 10:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 10:15 vgutierrez: rolling restart of ats-tls and ats-backend to catch up on OpenSSL updates
  • 10:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22681 and previous config saved to /var/cache/conftool/dbconfig/20220316-101502-marostegui.json
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22680 and previous config saved to /var/cache/conftool/dbconfig/20220316-100527-marostegui.json
  • 10:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T297189)', diff saved to https://phabricator.wikimedia.org/P22679 and previous config saved to /var/cache/conftool/dbconfig/20220316-100519-marostegui.json
  • 10:04 vgutierrez: vgutierrez@apt1001:~$ sudo -i reprepro --component thirdparty/haproxy24 update buster-wikimedia
  • 10:01 moritzm: installing openssl security updates
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22678 and previous config saved to /var/cache/conftool/dbconfig/20220316-095957-marostegui.json
  • 09:56 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1069.eqiad.wmnet with OS stretch
  • 09:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1070.eqiad.wmnet with OS stretch
  • 09:55 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1071.eqiad.wmnet with OS buster
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22677 and previous config saved to /var/cache/conftool/dbconfig/20220316-095014-marostegui.json
  • 09:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22676 and previous config saved to /var/cache/conftool/dbconfig/20220316-094452-marostegui.json
  • 09:36 dcausse: T293862: manually restarted blazegraph on wdqs1010 with "-agentpath:/usr/lib/libjvmquake.so=1000,1,0,warn=30,touch=/tmp/jvmquake"
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22675 and previous config saved to /var/cache/conftool/dbconfig/20220316-093509-marostegui.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22674 and previous config saved to /var/cache/conftool/dbconfig/20220316-092947-marostegui.json
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22673 and previous config saved to /var/cache/conftool/dbconfig/20220316-092742-marostegui.json
  • 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22672 and previous config saved to /var/cache/conftool/dbconfig/20220316-092735-marostegui.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T297189)', diff saved to https://phabricator.wikimedia.org/P22671 and previous config saved to /var/cache/conftool/dbconfig/20220316-092004-marostegui.json
  • 09:16 moritzm: revert mx1001/mx2001 to the Bullseye version of Exim T303738
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 T303498', diff saved to https://phabricator.wikimedia.org/P22670 and previous config saved to /var/cache/conftool/dbconfig/20220316-091533-marostegui.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22669 and previous config saved to /var/cache/conftool/dbconfig/20220316-091229-marostegui.json
  • 09:09 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 08:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 08:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22668 and previous config saved to /var/cache/conftool/dbconfig/20220316-085724-marostegui.json
  • 08:55 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 08:52 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22667 and previous config saved to /var/cache/conftool/dbconfig/20220316-084219-marostegui.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T297189)', diff saved to https://phabricator.wikimedia.org/P22666 and previous config saved to /var/cache/conftool/dbconfig/20220316-084140-marostegui.json
  • 08:41 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22665 and previous config saved to /var/cache/conftool/dbconfig/20220316-084127-marostegui.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298294)', diff saved to https://phabricator.wikimedia.org/P22664 and previous config saved to /var/cache/conftool/dbconfig/20220316-084011-marostegui.json
  • 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298294)', diff saved to https://phabricator.wikimedia.org/P22663 and previous config saved to /var/cache/conftool/dbconfig/20220316-084003-marostegui.json
  • 08:35 hashar: Restarting CI Jenkins
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22662 and previous config saved to /var/cache/conftool/dbconfig/20220316-082622-marostegui.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22661 and previous config saved to /var/cache/conftool/dbconfig/20220316-082458-marostegui.json
  • 08:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:11 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Change A/V player to videojs in the first batch of production wiki (T248418) (duration: 00m 49s)
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22660 and previous config saved to /var/cache/conftool/dbconfig/20220316-081117-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22659 and previous config saved to /var/cache/conftool/dbconfig/20220316-080953-marostegui.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22658 and previous config saved to /var/cache/conftool/dbconfig/20220316-075612-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T297189)', diff saved to https://phabricator.wikimedia.org/P22657 and previous config saved to /var/cache/conftool/dbconfig/20220316-075502-marostegui.json
  • 07:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298294)', diff saved to https://phabricator.wikimedia.org/P22656 and previous config saved to /var/cache/conftool/dbconfig/20220316-075448-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298294)', diff saved to https://phabricator.wikimedia.org/P22655 and previous config saved to /var/cache/conftool/dbconfig/20220316-075248-marostegui.json
  • 07:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298563)', diff saved to https://phabricator.wikimedia.org/P22654 and previous config saved to /var/cache/conftool/dbconfig/20220316-075007-marostegui.json
  • 07:49 Amir1: dbmaint on master of s4@eqiad (T298743)
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22653 and previous config saved to /var/cache/conftool/dbconfig/20220316-073502-marostegui.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22652 and previous config saved to /var/cache/conftool/dbconfig/20220316-071957-marostegui.json
  • 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Maintenance
  • 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Maintenance
  • 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298557)', diff saved to https://phabricator.wikimedia.org/P22651 and previous config saved to /var/cache/conftool/dbconfig/20220316-071859-marostegui.json
  • 07:18 urbanecm: UTC morning B&C window done
  • 07:15 urbanecm: Create `testwiki.cx_significant_edits` and `testwiki.cx_section_translation` at s3 (T302371; `mwscript sql.php --wiki=testwiki /srv/mediawiki-staging/php-1.38.0-wmf.26/extensions/ContentTranslation/sql/{section-translations,significant-edits}.sql)`)
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4558951: Disable ContentTranslation for non-extended confirmed users on viwiki (T299636) (duration: 00m 51s)
  • 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298563)', diff saved to https://phabricator.wikimedia.org/P22650 and previous config saved to /var/cache/conftool/dbconfig/20220316-070452-marostegui.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22649 and previous config saved to /var/cache/conftool/dbconfig/20220316-070354-marostegui.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312', diff saved to https://phabricator.wikimedia.org/P22648 and previous config saved to /var/cache/conftool/dbconfig/20220316-070033-marostegui.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22647 and previous config saved to /var/cache/conftool/dbconfig/20220316-065918-marostegui.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22646 and previous config saved to /var/cache/conftool/dbconfig/20220316-064849-marostegui.json
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298557)', diff saved to https://phabricator.wikimedia.org/P22644 and previous config saved to /var/cache/conftool/dbconfig/20220316-063344-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298563)', diff saved to https://phabricator.wikimedia.org/P22643 and previous config saved to /var/cache/conftool/dbconfig/20220316-060008-marostegui.json
  • 06:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 06:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22642 and previous config saved to /var/cache/conftool/dbconfig/20220316-055903-marostegui.json
  • 05:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 05:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T298557)', diff saved to https://phabricator.wikimedia.org/P22641 and previous config saved to /var/cache/conftool/dbconfig/20220316-055805-marostegui.json
  • 05:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 05:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 05:36 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 05:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1068.eqiad.wmnet with OS stretch
  • 05:14 ryankemper: [WCQS Deploy] Test query passed on commons-query.wikimedia.org ; WCQS deploy complete
  • 05:13 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@38de611] (wcqs): Deploy 0.3.106 to WCQS (duration: 01m 53s)
  • 05:12 ryankemper: [WCQS Deploy] Tests look good following deploy of `0.3.106` to canary `wcqs1002.eqiad.wmnet`, proceeding to rest of fleet
  • 05:11 ryankemper@deploy1002: Started deploy [wdqs/wdqs@38de611] (wcqs): Deploy 0.3.106 to WCQS
  • 05:11 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 05:11 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 05:11 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 05:09 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@38de611]: 0.3.106 (duration: 06m 36s)
  • 05:03 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.106` on canary `wdqs1003`; proceeding to rest of fleet
  • 05:02 ryankemper@deploy1002: Started deploy [wdqs/wdqs@38de611]: 0.3.106
  • 05:01 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.106`. Pre-deploy tests passing on canary `wdqs1003`
  • 02:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22640 and previous config saved to /var/cache/conftool/dbconfig/20220316-025347-marostegui.json
  • 02:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22639 and previous config saved to /var/cache/conftool/dbconfig/20220316-023842-marostegui.json
  • 02:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22638 and previous config saved to /var/cache/conftool/dbconfig/20220316-022336-marostegui.json
  • 02:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22637 and previous config saved to /var/cache/conftool/dbconfig/20220316-020831-marostegui.json
  • 01:43 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 01:37 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 01:37 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 01:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 01:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 01:29 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 01:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS buster
  • 00:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
  • 00:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
  • 00:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS buster
  • 00:03 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye

2022-03-15

  • 22:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 22:07 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 22:07 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 22:06 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 22:05 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 22:04 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 22:03 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 22:02 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 22:01 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 22:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 22:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 21:59 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 21:56 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 21:55 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 21:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22635 and previous config saved to /var/cache/conftool/dbconfig/20220315-214729-marostegui.json
  • 21:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 21:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300775)', diff saved to https://phabricator.wikimedia.org/P22634 and previous config saved to /var/cache/conftool/dbconfig/20220315-214721-marostegui.json
  • 21:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298743)', diff saved to https://phabricator.wikimedia.org/P22633 and previous config saved to /var/cache/conftool/dbconfig/20220315-214133-ladsgroup.json
  • 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 21:36 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS buster
  • 21:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22632 and previous config saved to /var/cache/conftool/dbconfig/20220315-213216-marostegui.json
  • 21:27 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.26/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Backport: rdbms: provide $owner argument in LoadBalancer::flushPrimarySessions() (T303885) (duration: 00m 53s)
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22631 and previous config saved to /var/cache/conftool/dbconfig/20220315-212628-ladsgroup.json
  • 21:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 21:17 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22630 and previous config saved to /var/cache/conftool/dbconfig/20220315-211711-marostegui.json
  • 21:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22629 and previous config saved to /var/cache/conftool/dbconfig/20220315-211123-ladsgroup.json
  • 21:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1026.eqiad.wmnet with OS bullseye
  • 21:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T300775)', diff saved to https://phabricator.wikimedia.org/P22628 and previous config saved to /var/cache/conftool/dbconfig/20220315-210204-marostegui.json
  • 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22627 and previous config saved to /var/cache/conftool/dbconfig/20220315-205702-marostegui.json
  • 20:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298743)', diff saved to https://phabricator.wikimedia.org/P22626 and previous config saved to /var/cache/conftool/dbconfig/20220315-205618-ladsgroup.json
  • 20:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
  • 20:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 20:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 20:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298557)', diff saved to https://phabricator.wikimedia.org/P22625 and previous config saved to /var/cache/conftool/dbconfig/20220315-204912-marostegui.json
  • 20:47 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
  • 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22624 and previous config saved to /var/cache/conftool/dbconfig/20220315-204157-marostegui.json
  • 20:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22623 and previous config saved to /var/cache/conftool/dbconfig/20220315-203407-marostegui.json
  • 20:27 bd808: Toolhub: running post-deploy database migrations
  • 20:27 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS buster
  • 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22622 and previous config saved to /var/cache/conftool/dbconfig/20220315-202652-marostegui.json
  • 20:26 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:21 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 20:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22621 and previous config saved to /var/cache/conftool/dbconfig/20220315-201902-marostegui.json
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Add another entry to GECampaignPatterns (T302738) (duration: 02m 22s)
  • 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22620 and previous config saved to /var/cache/conftool/dbconfig/20220315-201147-marostegui.json
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298557)', diff saved to https://phabricator.wikimedia.org/P22619 and previous config saved to /var/cache/conftool/dbconfig/20220315-200357-marostegui.json
  • 19:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T298743)', diff saved to https://phabricator.wikimedia.org/P22618 and previous config saved to /var/cache/conftool/dbconfig/20220315-195934-ladsgroup.json
  • 19:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298743)', diff saved to https://phabricator.wikimedia.org/P22617 and previous config saved to /var/cache/conftool/dbconfig/20220315-195657-ladsgroup.json
  • 19:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-tls
  • 19:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=varnish-fe
  • 19:52 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=drmrs,cluster=cache_text,service=ats-be
  • 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22616 and previous config saved to /var/cache/conftool/dbconfig/20220315-194152-ladsgroup.json
  • 19:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T298557)', diff saved to https://phabricator.wikimedia.org/P22615 and previous config saved to /var/cache/conftool/dbconfig/20220315-193029-marostegui.json
  • 19:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 19:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22614 and previous config saved to /var/cache/conftool/dbconfig/20220315-192647-ladsgroup.json
  • 19:24 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 19:22 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 19:19 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 19:18 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 19:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22612 and previous config saved to /var/cache/conftool/dbconfig/20220315-191234-marostegui.json
  • 19:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298563)', diff saved to https://phabricator.wikimedia.org/P22611 and previous config saved to /var/cache/conftool/dbconfig/20220315-191226-marostegui.json
  • 19:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS buster
  • 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298743)', diff saved to https://phabricator.wikimedia.org/P22610 and previous config saved to /var/cache/conftool/dbconfig/20220315-191140-ladsgroup.json
  • 19:01 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
  • 19:00 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
  • 18:59 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
  • 18:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22609 and previous config saved to /var/cache/conftool/dbconfig/20220315-185721-marostegui.json
  • 18:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
  • 18:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
  • 18:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
  • 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T298743)', diff saved to https://phabricator.wikimedia.org/P22608 and previous config saved to /var/cache/conftool/dbconfig/20220315-185413-ladsgroup.json
  • 18:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 18:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298743)', diff saved to https://phabricator.wikimedia.org/P22607 and previous config saved to /var/cache/conftool/dbconfig/20220315-185405-ladsgroup.json
  • 18:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1069.eqiad.wmnet with OS stretch
  • 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1070.eqiad.wmnet with OS stretch
  • 18:43 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1071.eqiad.wmnet with OS buster
  • 18:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22606 and previous config saved to /var/cache/conftool/dbconfig/20220315-184216-marostegui.json
  • 18:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22605 and previous config saved to /var/cache/conftool/dbconfig/20220315-183900-ladsgroup.json
  • 18:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
  • 18:32 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1071.eqiad.wmnet with OS buster
  • 18:30 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1069.eqiad.wmnet with OS buster
  • 18:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1070.eqiad.wmnet with OS buster
  • 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T298563)', diff saved to https://phabricator.wikimedia.org/P22604 and previous config saved to /var/cache/conftool/dbconfig/20220315-182711-marostegui.json
  • 18:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
  • 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22603 and previous config saved to /var/cache/conftool/dbconfig/20220315-182355-ladsgroup.json
  • 18:23 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
  • 18:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1071.eqiad.wmnet with reason: host reimage
  • 18:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1070.eqiad.wmnet with reason: host reimage
  • 18:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1069.eqiad.wmnet with reason: host reimage
  • 18:13 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS buster
  • 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:09 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1071.eqiad.wmnet with OS buster
  • 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1070.eqiad.wmnet with OS buster
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298743)', diff saved to https://phabricator.wikimedia.org/P22602 and previous config saved to /var/cache/conftool/dbconfig/20220315-180850-ladsgroup.json
  • 18:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1069.eqiad.wmnet with OS buster
  • 18:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298557)', diff saved to https://phabricator.wikimedia.org/P22601 and previous config saved to /var/cache/conftool/dbconfig/20220315-180542-marostegui.json
  • 18:04 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.26 refs T300202
  • 17:57 XioNoX: power down mr1-ulsfo for replacement
  • 17:52 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T298563)', diff saved to https://phabricator.wikimedia.org/P22600 and previous config saved to /var/cache/conftool/dbconfig/20220315-175143-marostegui.json
  • 17:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 17:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 17:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298563)', diff saved to https://phabricator.wikimedia.org/P22599 and previous config saved to /var/cache/conftool/dbconfig/20220315-175130-marostegui.json
  • 17:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22598 and previous config saved to /var/cache/conftool/dbconfig/20220315-175037-marostegui.json
  • 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22597 and previous config saved to /var/cache/conftool/dbconfig/20220315-173625-marostegui.json
  • 17:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22596 and previous config saved to /var/cache/conftool/dbconfig/20220315-173532-marostegui.json
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T298743)', diff saved to https://phabricator.wikimedia.org/P22595 and previous config saved to /var/cache/conftool/dbconfig/20220315-172616-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298743)', diff saved to https://phabricator.wikimedia.org/P22594 and previous config saved to /var/cache/conftool/dbconfig/20220315-172608-ladsgroup.json
  • 17:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:25 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 17:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P22593 and previous config saved to /var/cache/conftool/dbconfig/20220315-172119-marostegui.json
  • 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298557)', diff saved to https://phabricator.wikimedia.org/P22592 and previous config saved to /var/cache/conftool/dbconfig/20220315-172027-marostegui.json
  • 17:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:12 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22591 and previous config saved to /var/cache/conftool/dbconfig/20220315-171103-ladsgroup.json
  • 17:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T298563)', diff saved to https://phabricator.wikimedia.org/P22590 and previous config saved to /var/cache/conftool/dbconfig/20220315-170614-marostegui.json
  • 17:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T300775)', diff saved to https://phabricator.wikimedia.org/P22589 and previous config saved to /var/cache/conftool/dbconfig/20220315-170201-marostegui.json
  • 17:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 17:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 17:01 jhuneidi@deploy1002: Pruned MediaWiki: 1.38.0-wmf.24 (duration: 01m 32s)
  • 16:59 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.26 refs T300202 (duration: 38m 54s)
  • 16:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22588 and previous config saved to /var/cache/conftool/dbconfig/20220315-165558-ladsgroup.json
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T298557)', diff saved to https://phabricator.wikimedia.org/P22587 and previous config saved to /var/cache/conftool/dbconfig/20220315-164751-marostegui.json
  • 16:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 16:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298557)', diff saved to https://phabricator.wikimedia.org/P22586 and previous config saved to /var/cache/conftool/dbconfig/20220315-164743-marostegui.json
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298743)', diff saved to https://phabricator.wikimedia.org/P22585 and previous config saved to /var/cache/conftool/dbconfig/20220315-164053-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T298743)', diff saved to https://phabricator.wikimedia.org/P22584 and previous config saved to /var/cache/conftool/dbconfig/20220315-163626-ladsgroup.json
  • 16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298743)', diff saved to https://phabricator.wikimedia.org/P22583 and previous config saved to /var/cache/conftool/dbconfig/20220315-163618-ladsgroup.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22582 and previous config saved to /var/cache/conftool/dbconfig/20220315-163238-marostegui.json
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T298563)', diff saved to https://phabricator.wikimedia.org/P22581 and previous config saved to /var/cache/conftool/dbconfig/20220315-163134-marostegui.json
  • 16:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298563)', diff saved to https://phabricator.wikimedia.org/P22580 and previous config saved to /var/cache/conftool/dbconfig/20220315-163126-marostegui.json
  • 16:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22579 and previous config saved to /var/cache/conftool/dbconfig/20220315-162113-ladsgroup.json
  • 16:20 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.26 refs T300202
  • 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22578 and previous config saved to /var/cache/conftool/dbconfig/20220315-161732-marostegui.json
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22577 and previous config saved to /var/cache/conftool/dbconfig/20220315-161621-marostegui.json
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22576 and previous config saved to /var/cache/conftool/dbconfig/20220315-160607-ladsgroup.json
  • 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298557)', diff saved to https://phabricator.wikimedia.org/P22575 and previous config saved to /var/cache/conftool/dbconfig/20220315-160226-marostegui.json
  • 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P22574 and previous config saved to /var/cache/conftool/dbconfig/20220315-160116-marostegui.json
  • 15:53 moritzm: updating Exim on mx1001 T303738
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298743)', diff saved to https://phabricator.wikimedia.org/P22573 and previous config saved to /var/cache/conftool/dbconfig/20220315-155102-ladsgroup.json
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T298743)', diff saved to https://phabricator.wikimedia.org/P22572 and previous config saved to /var/cache/conftool/dbconfig/20220315-154639-ladsgroup.json
  • 15:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298743)', diff saved to https://phabricator.wikimedia.org/P22571 and previous config saved to /var/cache/conftool/dbconfig/20220315-154631-ladsgroup.json
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T298563)', diff saved to https://phabricator.wikimedia.org/P22570 and previous config saved to /var/cache/conftool/dbconfig/20220315-154610-marostegui.json
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22569 and previous config saved to /var/cache/conftool/dbconfig/20220315-153126-ladsgroup.json
  • 15:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T298557)', diff saved to https://phabricator.wikimedia.org/P22568 and previous config saved to /var/cache/conftool/dbconfig/20220315-152916-marostegui.json
  • 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 15:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 15:18 moritzm: installing Java updates on wcqs*/wdqs* hosts
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22567 and previous config saved to /var/cache/conftool/dbconfig/20220315-151621-ladsgroup.json
  • 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T298563)', diff saved to https://phabricator.wikimedia.org/P22566 and previous config saved to /var/cache/conftool/dbconfig/20220315-151206-marostegui.json
  • 15:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 15:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 15:09 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@f01214c]: (no justification provided) (duration: 00m 07s)
  • 15:09 ebysans@deploy1002: Started deploy [airflow-dags/analytics@f01214c]: (no justification provided)
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22565 and previous config saved to /var/cache/conftool/dbconfig/20220315-150649-root.json
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298743)', diff saved to https://phabricator.wikimedia.org/P22564 and previous config saved to /var/cache/conftool/dbconfig/20220315-150116-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T298743)', diff saved to https://phabricator.wikimedia.org/P22563 and previous config saved to /var/cache/conftool/dbconfig/20220315-145246-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298743)', diff saved to https://phabricator.wikimedia.org/P22562 and previous config saved to /var/cache/conftool/dbconfig/20220315-145238-ladsgroup.json
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22561 and previous config saved to /var/cache/conftool/dbconfig/20220315-145146-root.json
  • 14:50 moritzm: installing postgresql-11 security updates
  • 14:49 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@88d5618]: (no justification provided) (duration: 00m 07s)
  • 14:49 ntsako@deploy1002: Started deploy [airflow-dags/analytics@88d5618]: (no justification provided)
  • 14:43 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 14:42 ottomata: I read the cumin output wrong, kafka-jumbo1001 and 1002 restarted successfully before accidental ctrl-c on cumin command. Restarting the full jumbo roll-restart to thoroughly do them all - T303324
  • 14:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1068.eqiad.wmnet with reason: host reimage
  • 14:39 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:38 ottomata: all brokers except kafka-jumbo1001 were succesffully roll restarted, doing kafka-jumbo1001 manually - T303324
  • 14:37 ottomata: accidental cancel of roll restart brokers, re-doing - T303324
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22560 and previous config saved to /var/cache/conftool/dbconfig/20220315-143733-ladsgroup.json
  • 14:37 otto@cumin1001: END (ERROR) - Cookbook sre.kafka.roll-restart-brokers (exit_code=97) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 14:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1068.eqiad.wmnet with reason: host reimage
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22559 and previous config saved to /var/cache/conftool/dbconfig/20220315-143642-root.json
  • 14:32 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@2924232]: (no justification provided) (duration: 00m 08s)
  • 14:32 ntsako@deploy1002: Started deploy [airflow-dags/analytics@2924232]: (no justification provided)
  • 14:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be1068.eqiad.wmnet with OS stretch
  • 14:23 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 14:22 inflatador: T303256 bking@cumin1001 restarting wdqs services `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-blazegraph`
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22558 and previous config saved to /var/cache/conftool/dbconfig/20220315-142228-ladsgroup.json
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22557 and previous config saved to /var/cache/conftool/dbconfig/20220315-142138-root.json
  • 14:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 10 hosts with reason: Maintenance
  • 14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 10 hosts with reason: Maintenance
  • 14:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298743)', diff saved to https://phabricator.wikimedia.org/P22556 and previous config saved to /var/cache/conftool/dbconfig/20220315-140723-ladsgroup.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: After schema change ', diff saved to https://phabricator.wikimedia.org/P22555 and previous config saved to /var/cache/conftool/dbconfig/20220315-140634-root.json
  • 14:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298557)', diff saved to https://phabricator.wikimedia.org/P22554 and previous config saved to /var/cache/conftool/dbconfig/20220315-140520-marostegui.json
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T298743)', diff saved to https://phabricator.wikimedia.org/P22553 and previous config saved to /var/cache/conftool/dbconfig/20220315-140259-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298743)', diff saved to https://phabricator.wikimedia.org/P22552 and previous config saved to /var/cache/conftool/dbconfig/20220315-140252-ladsgroup.json
  • 14:01 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:00 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 13:59 ottomata: roll restarting kafka jumbo brokers to set max.incremental.fetch.session.cache.slots=2000 - T303324
  • 13:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
  • 13:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22551 and previous config saved to /var/cache/conftool/dbconfig/20220315-135015-marostegui.json
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22550 and previous config saved to /var/cache/conftool/dbconfig/20220315-134747-ladsgroup.json
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:37 awight: EU deployment complete
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22549 and previous config saved to /var/cache/conftool/dbconfig/20220315-133510-marostegui.json
  • 13:34 awight@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [beta] Disable improved template search (T286991, T302857) (take 2) (duration: 00m 50s)
  • 13:32 awight@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: [beta] Disable improved template search (T286991, T302857) (duration: 00m 48s)
  • 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22548 and previous config saved to /var/cache/conftool/dbconfig/20220315-133241-ladsgroup.json
  • 13:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:31 awight@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: [beta] Remove unused config overrides (duration: 00m 49s)
  • 13:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22547 and previous config saved to /var/cache/conftool/dbconfig/20220315-132857-marostegui.json
  • 13:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298557)', diff saved to https://phabricator.wikimedia.org/P22546 and previous config saved to /var/cache/conftool/dbconfig/20220315-132005-marostegui.json
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298743)', diff saved to https://phabricator.wikimedia.org/P22545 and previous config saved to /var/cache/conftool/dbconfig/20220315-131736-ladsgroup.json
  • 13:15 awight@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/TemplateWizard/resources/ext.TemplateWizard.SearchField.js: Backport: Fix copy-paste mistake in template search widget (T303524) (duration: 00m 49s)
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1095:3315', diff saved to https://phabricator.wikimedia.org/P22544 and previous config saved to /var/cache/conftool/dbconfig/20220315-131436-marostegui.json
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22543 and previous config saved to /var/cache/conftool/dbconfig/20220315-131352-marostegui.json
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T298743)', diff saved to https://phabricator.wikimedia.org/P22542 and previous config saved to /var/cache/conftool/dbconfig/20220315-131311-ladsgroup.json
  • 13:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298743)', diff saved to https://phabricator.wikimedia.org/P22541 and previous config saved to /var/cache/conftool/dbconfig/20220315-131303-ladsgroup.json
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22540 and previous config saved to /var/cache/conftool/dbconfig/20220315-130936-marostegui.json
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:07 Amir1: removed 440 more corrupt rows in flaggedtemplates in dewiki (T297189)
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22539 and previous config saved to /var/cache/conftool/dbconfig/20220315-125847-marostegui.json
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22538 and previous config saved to /var/cache/conftool/dbconfig/20220315-125758-ladsgroup.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22537 and previous config saved to /var/cache/conftool/dbconfig/20220315-125431-marostegui.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T298557)', diff saved to https://phabricator.wikimedia.org/P22536 and previous config saved to /var/cache/conftool/dbconfig/20220315-125228-marostegui.json
  • 12:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:48 Amir1: removed 170 corrupt rows in flaggedtemplates in dewiki (T297189)
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22535 and previous config saved to /var/cache/conftool/dbconfig/20220315-124342-marostegui.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22534 and previous config saved to /var/cache/conftool/dbconfig/20220315-124253-ladsgroup.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22533 and previous config saved to /var/cache/conftool/dbconfig/20220315-123926-marostegui.json
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298743)', diff saved to https://phabricator.wikimedia.org/P22532 and previous config saved to /var/cache/conftool/dbconfig/20220315-122748-ladsgroup.json
  • 12:24 moritzm: updating Exim on mx2001 T303738
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22531 and previous config saved to /var/cache/conftool/dbconfig/20220315-122421-marostegui.json
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T298743)', diff saved to https://phabricator.wikimedia.org/P22530 and previous config saved to /var/cache/conftool/dbconfig/20220315-121317-ladsgroup.json
  • 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298743)', diff saved to https://phabricator.wikimedia.org/P22529 and previous config saved to /var/cache/conftool/dbconfig/20220315-121309-ladsgroup.json
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22528 and previous config saved to /var/cache/conftool/dbconfig/20220315-115804-ladsgroup.json
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22527 and previous config saved to /var/cache/conftool/dbconfig/20220315-114259-ladsgroup.json
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298743)', diff saved to https://phabricator.wikimedia.org/P22526 and previous config saved to /var/cache/conftool/dbconfig/20220315-112754-ladsgroup.json
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T298743)', diff saved to https://phabricator.wikimedia.org/P22525 and previous config saved to /var/cache/conftool/dbconfig/20220315-112308-ladsgroup.json
  • 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22524 and previous config saved to /var/cache/conftool/dbconfig/20220315-110423-marostegui.json
  • 11:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22523 and previous config saved to /var/cache/conftool/dbconfig/20220315-110416-marostegui.json
  • 10:50 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22522 and previous config saved to /var/cache/conftool/dbconfig/20220315-104910-marostegui.json
  • 10:49 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22521 and previous config saved to /var/cache/conftool/dbconfig/20220315-103405-marostegui.json
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22520 and previous config saved to /var/cache/conftool/dbconfig/20220315-101922-root.json
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22519 and previous config saved to /var/cache/conftool/dbconfig/20220315-101900-marostegui.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22518 and previous config saved to /var/cache/conftool/dbconfig/20220315-101449-root.json
  • 10:13 Amir1: start of foreachwikiindblist all maintenance/refreshImageMetadata.php --force --verbose --mediatype=AUDIO --sleep 2 --oldimage (T226311)
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22517 and previous config saved to /var/cache/conftool/dbconfig/20220315-100418-root.json
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22516 and previous config saved to /var/cache/conftool/dbconfig/20220315-095945-root.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22515 and previous config saved to /var/cache/conftool/dbconfig/20220315-094914-root.json
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22514 and previous config saved to /var/cache/conftool/dbconfig/20220315-094441-root.json
  • 09:38 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22513 and previous config saved to /var/cache/conftool/dbconfig/20220315-093410-root.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22512 and previous config saved to /var/cache/conftool/dbconfig/20220315-092937-root.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P22511 and previous config saved to /var/cache/conftool/dbconfig/20220315-091906-root.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22510 and previous config saved to /var/cache/conftool/dbconfig/20220315-091850-root.json
  • 09:14 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22509 and previous config saved to /var/cache/conftool/dbconfig/20220315-091433-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22507 and previous config saved to /var/cache/conftool/dbconfig/20220315-090346-root.json
  • 09:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22506 and previous config saved to /var/cache/conftool/dbconfig/20220315-085929-root.json
  • 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:57 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@a85cf25] (eqiad): Switchover to eqiad tegola on eqiad env (duration: 01m 55s)
  • 08:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@a85cf25] (eqiad): Switchover to eqiad tegola on eqiad env
  • 08:53 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@a85cf25] (codfw): Switchover to eqiad tegola on eqiad env (duration: 03m 22s)
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T298563)', diff saved to https://phabricator.wikimedia.org/P22505 and previous config saved to /var/cache/conftool/dbconfig/20220315-085214-marostegui.json
  • 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298563)', diff saved to https://phabricator.wikimedia.org/P22504 and previous config saved to /var/cache/conftool/dbconfig/20220315-085206-marostegui.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1095:3315', diff saved to https://phabricator.wikimedia.org/P22503 and previous config saved to /var/cache/conftool/dbconfig/20220315-085026-marostegui.json
  • 08:50 marostegui: dbmaint on s5@eqiad T297189
  • 08:49 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@a85cf25] (codfw): Switchover to eqiad tegola on eqiad env
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22502 and previous config saved to /var/cache/conftool/dbconfig/20220315-084842-root.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22501 and previous config saved to /var/cache/conftool/dbconfig/20220315-084425-root.json
  • 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1161.eqiad.wmnet with OS bullseye
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22500 and previous config saved to /var/cache/conftool/dbconfig/20220315-083925-marostegui.json
  • 08:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300775)', diff saved to https://phabricator.wikimedia.org/P22499 and previous config saved to /var/cache/conftool/dbconfig/20220315-083917-marostegui.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22498 and previous config saved to /var/cache/conftool/dbconfig/20220315-083701-marostegui.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22497 and previous config saved to /var/cache/conftool/dbconfig/20220315-083338-root.json
  • 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1161.eqiad.wmnet with reason: host reimage
  • 08:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1161.eqiad.wmnet with reason: host reimage
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22496 and previous config saved to /var/cache/conftool/dbconfig/20220315-082412-marostegui.json
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22495 and previous config saved to /var/cache/conftool/dbconfig/20220315-082401-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22493 and previous config saved to /var/cache/conftool/dbconfig/20220315-082157-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22492 and previous config saved to /var/cache/conftool/dbconfig/20220315-081835-root.json
  • 08:13 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1161.eqiad.wmnet with OS bullseye
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22491 and previous config saved to /var/cache/conftool/dbconfig/20220315-080907-marostegui.json
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22490 and previous config saved to /var/cache/conftool/dbconfig/20220315-080857-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T298563)', diff saved to https://phabricator.wikimedia.org/P22489 and previous config saved to /var/cache/conftool/dbconfig/20220315-080651-marostegui.json
  • 08:05 marostegui: dbmaint on s5@eqiad T300473
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P22488 and previous config saved to /var/cache/conftool/dbconfig/20220315-080329-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P22487 and previous config saved to /var/cache/conftool/dbconfig/20220315-080128-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T300775)', diff saved to https://phabricator.wikimedia.org/P22486 and previous config saved to /var/cache/conftool/dbconfig/20220315-075402-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22485 and previous config saved to /var/cache/conftool/dbconfig/20220315-075353-root.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P22484 and previous config saved to /var/cache/conftool/dbconfig/20220315-074825-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22483 and previous config saved to /var/cache/conftool/dbconfig/20220315-074650-root.json
  • 07:43 elukey: restart kube-api server on ml-serve-ctrl2002 - 504 responses registered, corresponding to high custom resource definition requests
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22482 and previous config saved to /var/cache/conftool/dbconfig/20220315-073849-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22481 and previous config saved to /var/cache/conftool/dbconfig/20220315-073146-root.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22480 and previous config saved to /var/cache/conftool/dbconfig/20220315-072345-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22479 and previous config saved to /var/cache/conftool/dbconfig/20220315-071642-root.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22478 and previous config saved to /var/cache/conftool/dbconfig/20220315-070841-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T298563)', diff saved to https://phabricator.wikimedia.org/P22477 and previous config saved to /var/cache/conftool/dbconfig/20220315-070635-marostegui.json
  • 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22476 and previous config saved to /var/cache/conftool/dbconfig/20220315-070138-root.json
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1166.eqiad.wmnet with OS bullseye
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P22475 and previous config saved to /var/cache/conftool/dbconfig/20220315-065337-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22474 and previous config saved to /var/cache/conftool/dbconfig/20220315-064634-root.json
  • 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: host reimage
  • 06:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1166.eqiad.wmnet with reason: host reimage
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P22473 and previous config saved to /var/cache/conftool/dbconfig/20220315-063130-root.json
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1166.eqiad.wmnet with OS bullseye
  • 06:26 marostegui: dbmaint on s3@eqiad T300600
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P22472 and previous config saved to /var/cache/conftool/dbconfig/20220315-062543-marostegui.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P22471 and previous config saved to /var/cache/conftool/dbconfig/20220315-061626-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T300775)', diff saved to https://phabricator.wikimedia.org/P22470 and previous config saved to /var/cache/conftool/dbconfig/20220315-061458-marostegui.json
  • 06:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300775)', diff saved to https://phabricator.wikimedia.org/P22469 and previous config saved to /var/cache/conftool/dbconfig/20220315-061450-marostegui.json
  • 06:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22468 and previous config saved to /var/cache/conftool/dbconfig/20220315-055945-marostegui.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22467 and previous config saved to /var/cache/conftool/dbconfig/20220315-054440-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T300775)', diff saved to https://phabricator.wikimedia.org/P22466 and previous config saved to /var/cache/conftool/dbconfig/20220315-052935-marostegui.json
  • 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T300775)', diff saved to https://phabricator.wikimedia.org/P22465 and previous config saved to /var/cache/conftool/dbconfig/20220315-013013-marostegui.json
  • 01:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 01:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 01:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22464 and previous config saved to /var/cache/conftool/dbconfig/20220315-013000-marostegui.json
  • 01:26 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/maintenance/populateGlobalEditCount.php: fix script bug gerrit 770058 (duration: 00m 50s)
  • 01:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22463 and previous config saved to /var/cache/conftool/dbconfig/20220315-011455-marostegui.json
  • 00:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22462 and previous config saved to /var/cache/conftool/dbconfig/20220315-005950-marostegui.json
  • 00:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22461 and previous config saved to /var/cache/conftool/dbconfig/20220315-004445-marostegui.json
  • 00:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)

2022-03-14

  • 23:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T300775)', diff saved to https://phabricator.wikimedia.org/P22460 and previous config saved to /var/cache/conftool/dbconfig/20220314-234430-marostegui.json
  • 23:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 23:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 23:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:28 ryankemper: T301108 `ryankemper@cumin1001:~$ sudo cookbook sre.wdqs.data-transfer --source wdqs1009.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "moving away from legacy updater" --blazegraph_instance wikidata --without-lvs --task-id T301108` on tmux `wdqs`
  • 22:27 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
  • 22:16 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
  • 22:04 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 22:03 bking@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs-internal,name=eqiad
  • 22:03 bking@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
  • 22:03 inflatador: T302494 bking@puppetmaster1001 depooling eqiad in DNS-discovery for wdqs and wdqs-internal services
  • 21:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:39 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:38 inflatador: T302494 bking@puppetmaster1001 conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal,name=codfw
  • 21:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:37 bking@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=codfw
  • 21:36 inflatador: bking@cumin pooling codfw in DNS-discovery for wdqs and wdqs-internal services
  • 21:31 sbassett: Deployed security fix for T160800
  • 21:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 20:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:54 urbanecm: UTC late B&C completed
  • 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bca9c94: liwiktionary: Change timezone to CET/CEST (T303734) (duration: 00m 49s)
  • 20:45 ebernhardson@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CirrusSearch/profiles/SaneitizeProfiles.config.php: Backport: Cut saneitizer re-indexing rate in half (T302733) (duration: 00m 49s)
  • 20:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
  • 20:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1023.eqiad.wmnet with reason: host reimage
  • 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:31 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:31 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:30 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:22 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1024.eqiad.wmnet with OS bullseye
  • 20:22 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 19:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300775)', diff saved to https://phabricator.wikimedia.org/P22457 and previous config saved to /var/cache/conftool/dbconfig/20220314-194404-marostegui.json
  • 19:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22456 and previous config saved to /var/cache/conftool/dbconfig/20220314-192859-marostegui.json
  • 19:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1022.eqiad.wmnet with OS bullseye
  • 19:22 ejegg: updated civicrm from 252269c8 to 52c45874
  • 19:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22455 and previous config saved to /var/cache/conftool/dbconfig/20220314-191354-marostegui.json
  • 19:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1022.eqiad.wmnet with reason: host reimage
  • 19:04 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1022.eqiad.wmnet with reason: host reimage
  • 19:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298294)', diff saved to https://phabricator.wikimedia.org/P22454 and previous config saved to /var/cache/conftool/dbconfig/20220314-190224-marostegui.json
  • 18:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T300775)', diff saved to https://phabricator.wikimedia.org/P22453 and previous config saved to /var/cache/conftool/dbconfig/20220314-185849-marostegui.json
  • 18:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1022.eqiad.wmnet with OS bullseye
  • 18:51 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 18:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22452 and previous config saved to /var/cache/conftool/dbconfig/20220314-184719-marostegui.json
  • 18:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P22451 and previous config saved to /var/cache/conftool/dbconfig/20220314-183214-marostegui.json
  • 18:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1021.eqiad.wmnet with reason: host reimage
  • 18:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1021.eqiad.wmnet with reason: host reimage
  • 18:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T298294)', diff saved to https://phabricator.wikimedia.org/P22450 and previous config saved to /var/cache/conftool/dbconfig/20220314-181709-marostegui.json
  • 18:14 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T298294)', diff saved to https://phabricator.wikimedia.org/P22449 and previous config saved to /var/cache/conftool/dbconfig/20220314-175352-marostegui.json
  • 17:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:47 Amir1: start of foreachwikiindblist all maintenance/refreshImageMetadata.php --force --verbose --mediatype=AUDIO --sleep 2 (T226311)
  • 17:45 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@63af538] (eqiad): Enable 100% traffic mirroring on eqiad (duration: 01m 04s)
  • 17:44 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@63af538] (eqiad): Enable 100% traffic mirroring on eqiad
  • 17:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298294)', diff saved to https://phabricator.wikimedia.org/P22448 and previous config saved to /var/cache/conftool/dbconfig/20220314-173442-marostegui.json
  • 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22446 and previous config saved to /var/cache/conftool/dbconfig/20220314-171937-marostegui.json
  • 17:18 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P22445 and previous config saved to /var/cache/conftool/dbconfig/20220314-170432-marostegui.json
  • 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T298294)', diff saved to https://phabricator.wikimedia.org/P22444 and previous config saved to /var/cache/conftool/dbconfig/20220314-164927-marostegui.json
  • 16:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T298294)', diff saved to https://phabricator.wikimedia.org/P22442 and previous config saved to /var/cache/conftool/dbconfig/20220314-162509-marostegui.json
  • 16:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298294)', diff saved to https://phabricator.wikimedia.org/P22441 and previous config saved to /var/cache/conftool/dbconfig/20220314-162501-marostegui.json
  • 16:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
  • 16:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
  • 16:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 16:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 16:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298563)', diff saved to https://phabricator.wikimedia.org/P22440 and previous config saved to /var/cache/conftool/dbconfig/20220314-161943-marostegui.json
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22439 and previous config saved to /var/cache/conftool/dbconfig/20220314-160955-marostegui.json
  • 16:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22438 and previous config saved to /var/cache/conftool/dbconfig/20220314-160438-marostegui.json
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P22437 and previous config saved to /var/cache/conftool/dbconfig/20220314-155450-marostegui.json
  • 15:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22436 and previous config saved to /var/cache/conftool/dbconfig/20220314-154933-marostegui.json
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T298294)', diff saved to https://phabricator.wikimedia.org/P22435 and previous config saved to /var/cache/conftool/dbconfig/20220314-153945-marostegui.json
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298563)', diff saved to https://phabricator.wikimedia.org/P22434 and previous config saved to /var/cache/conftool/dbconfig/20220314-153428-marostegui.json
  • 15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T298294)', diff saved to https://phabricator.wikimedia.org/P22432 and previous config saved to /var/cache/conftool/dbconfig/20220314-151025-marostegui.json
  • 15:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 15:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 15:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298294)', diff saved to https://phabricator.wikimedia.org/P22431 and previous config saved to /var/cache/conftool/dbconfig/20220314-151017-marostegui.json
  • 14:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22430 and previous config saved to /var/cache/conftool/dbconfig/20220314-145512-marostegui.json
  • 14:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T300775)', diff saved to https://phabricator.wikimedia.org/P22429 and previous config saved to /var/cache/conftool/dbconfig/20220314-145345-marostegui.json
  • 14:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 14:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T298563)', diff saved to https://phabricator.wikimedia.org/P22428 and previous config saved to /var/cache/conftool/dbconfig/20220314-145109-marostegui.json
  • 14:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 14:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P22427 and previous config saved to /var/cache/conftool/dbconfig/20220314-144007-marostegui.json
  • 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T298294)', diff saved to https://phabricator.wikimedia.org/P22426 and previous config saved to /var/cache/conftool/dbconfig/20220314-142502-marostegui.json
  • 14:01 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1010.eqiad.wmnet
  • 14:01 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1010.eqiad.wmnet
  • 14:01 mvernon@cumin1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1010.eqiad.wmnet
  • 14:01 mvernon@cumin1001: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1010.eqiad.wmnet
  • 13:59 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1017.eqiad.wmnet with reason: host reimage
  • 13:58 herron: grafana1002:~# systemctl restart grafana-ldap-users-sync.service T303064
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T298294)', diff saved to https://phabricator.wikimedia.org/P22425 and previous config saved to /var/cache/conftool/dbconfig/20220314-135744-marostegui.json
  • 13:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298294)', diff saved to https://phabricator.wikimedia.org/P22424 and previous config saved to /var/cache/conftool/dbconfig/20220314-135736-marostegui.json
  • 13:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 13:57 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1011.eqiad.wmnet
  • 13:57 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1011.eqiad.wmnet
  • 13:56 mvernon@cumin1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1011.eqiad.wmnet
  • 13:56 mvernon@cumin1001: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1011.eqiad.wmnet
  • 13:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1017.eqiad.wmnet with reason: host reimage
  • 13:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:53 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1016.eqiad.wmnet with reason: host reimage
  • 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:50 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1012.eqiad.wmnet
  • 13:50 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1012.eqiad.wmnet
  • 13:50 mvernon@cumin1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1012.eqiad.wmnet
  • 13:49 mvernon@cumin1001: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1012.eqiad.wmnet
  • 13:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:45 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1009.eqiad.wmnet
  • 13:45 mvernon@cumin1001: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1009.eqiad.wmnet
  • 13:45 mvernon@cumin1001: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1009.eqiad.wmnet
  • 13:45 mvernon@cumin1001: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1009.eqiad.wmnet
  • 13:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 13:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298563)', diff saved to https://phabricator.wikimedia.org/P22423 and previous config saved to /var/cache/conftool/dbconfig/20220314-134356-marostegui.json
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:43 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22422 and previous config saved to /var/cache/conftool/dbconfig/20220314-134231-marostegui.json
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 Emperor: restarting swift-proxy on ms-fe100[5-8] to update config to know about new eqiad frontends T303698
  • 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22421 and previous config saved to /var/cache/conftool/dbconfig/20220314-132849-marostegui.json
  • 13:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P22420 and previous config saved to /var/cache/conftool/dbconfig/20220314-132726-marostegui.json
  • 13:25 dcausse: restarting blazegraph on wdqs1006 (jvm stuck for 10hours)
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:20 urbanecm@deploy1002: Synchronized static/images/project-logos/: 3fa9683: Delete huwiki 500k milestone logo files (T301923) (duration: 00m 49s)
  • 13:18 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 3c2c8b0: Stop using huwiki 500k milestone logo (T301923) (duration: 00m 48s)
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22419 and previous config saved to /var/cache/conftool/dbconfig/20220314-131344-marostegui.json
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T298294)', diff saved to https://phabricator.wikimedia.org/P22418 and previous config saved to /var/cache/conftool/dbconfig/20220314-131220-marostegui.json
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:10 urbanecm@deploy1002: Synchronized wmf-config/wikitech.php: 95f376a: wikitech: migrate wmf* to wmg* (T45956) (duration: 00m 48s)
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298563)', diff saved to https://phabricator.wikimedia.org/P22417 and previous config saved to /var/cache/conftool/dbconfig/20220314-125839-marostegui.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T298294)', diff saved to https://phabricator.wikimedia.org/P22416 and previous config saved to /var/cache/conftool/dbconfig/20220314-124911-marostegui.json
  • 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298294)', diff saved to https://phabricator.wikimedia.org/P22415 and previous config saved to /var/cache/conftool/dbconfig/20220314-124902-marostegui.json
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22414 and previous config saved to /var/cache/conftool/dbconfig/20220314-123357-marostegui.json
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T298563)', diff saved to https://phabricator.wikimedia.org/P22413 and previous config saved to /var/cache/conftool/dbconfig/20220314-121937-marostegui.json
  • 12:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P22412 and previous config saved to /var/cache/conftool/dbconfig/20220314-121852-marostegui.json
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T298294)', diff saved to https://phabricator.wikimedia.org/P22411 and previous config saved to /var/cache/conftool/dbconfig/20220314-120347-marostegui.json
  • 11:55 moritzm: restarting nginx on archiva1002 to pick up security updates
  • 11:53 moritzm: restarting apache2 on matomo1002 to pick up security updates
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T298294)', diff saved to https://phabricator.wikimedia.org/P22410 and previous config saved to /var/cache/conftool/dbconfig/20220314-114312-marostegui.json
  • 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298294)', diff saved to https://phabricator.wikimedia.org/P22409 and previous config saved to /var/cache/conftool/dbconfig/20220314-114305-marostegui.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22408 and previous config saved to /var/cache/conftool/dbconfig/20220314-112759-marostegui.json
  • 11:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298563)', diff saved to https://phabricator.wikimedia.org/P22407 and previous config saved to /var/cache/conftool/dbconfig/20220314-112117-marostegui.json
  • 11:18 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c8a9efd] (eqiad): Enable mirroring on eqiad with 50% of the traffic (duration: 02m 38s)
  • 11:15 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c8a9efd] (eqiad): Enable mirroring on eqiad with 50% of the traffic
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P22406 and previous config saved to /var/cache/conftool/dbconfig/20220314-111255-marostegui.json
  • 11:12 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@594f1d5] (eqiad): Revert "Revert "Mirror 100% of request to tegola in eqiad"" (duration: 07m 01s)
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22405 and previous config saved to /var/cache/conftool/dbconfig/20220314-110612-marostegui.json
  • 11:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@594f1d5] (eqiad): Revert "Revert "Mirror 100% of request to tegola in eqiad""
  • 11:04 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@594f1d5] (codfw): Revert "Revert "Mirror 100% of request to tegola in eqiad"" (duration: 01m 30s)
  • 11:03 mbsantos@deploy1002: Started deploy [kartotherian/deploy@594f1d5] (codfw): Revert "Revert "Mirror 100% of request to tegola in eqiad""
  • 10:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T298294)', diff saved to https://phabricator.wikimedia.org/P22404 and previous config saved to /var/cache/conftool/dbconfig/20220314-105749-marostegui.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22403 and previous config saved to /var/cache/conftool/dbconfig/20220314-105107-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298563)', diff saved to https://phabricator.wikimedia.org/P22402 and previous config saved to /var/cache/conftool/dbconfig/20220314-103602-marostegui.json
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T298294)', diff saved to https://phabricator.wikimedia.org/P22401 and previous config saved to /var/cache/conftool/dbconfig/20220314-103532-marostegui.json
  • 10:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 10:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298294)', diff saved to https://phabricator.wikimedia.org/P22400 and previous config saved to /var/cache/conftool/dbconfig/20220314-103525-marostegui.json
  • 10:29 _joe_: running puppet on all cp hosts, to introduce the cloud netmapping
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22399 and previous config saved to /var/cache/conftool/dbconfig/20220314-102020-marostegui.json
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P22398 and previous config saved to /var/cache/conftool/dbconfig/20220314-100515-marostegui.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T298563)', diff saved to https://phabricator.wikimedia.org/P22397 and previous config saved to /var/cache/conftool/dbconfig/20220314-095353-marostegui.json
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298563)', diff saved to https://phabricator.wikimedia.org/P22396 and previous config saved to /var/cache/conftool/dbconfig/20220314-095346-marostegui.json
  • 09:53 Emperor: rebooting ms-fe10[09-12] as part of bringing into service T303698
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T298294)', diff saved to https://phabricator.wikimedia.org/P22395 and previous config saved to /var/cache/conftool/dbconfig/20220314-095009-marostegui.json
  • 09:48 Amir1: dbmaint on s2@eqiad (T298743)
  • 09:46 Amir1: dbmaint on s8@eqiad (T298743)
  • 09:46 Amir1: dbmaint on s1@eqiad (T298743)
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22394 and previous config saved to /var/cache/conftool/dbconfig/20220314-093840-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T298294)', diff saved to https://phabricator.wikimedia.org/P22393 and previous config saved to /var/cache/conftool/dbconfig/20220314-092559-marostegui.json
  • 09:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 09:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298294)', diff saved to https://phabricator.wikimedia.org/P22392 and previous config saved to /var/cache/conftool/dbconfig/20220314-092551-marostegui.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22391 and previous config saved to /var/cache/conftool/dbconfig/20220314-092335-marostegui.json
  • 09:18 moritzm: installing vim security updates
  • 09:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2017.codfw.wmnet with OS bullseye
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22390 and previous config saved to /var/cache/conftool/dbconfig/20220314-091046-marostegui.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298563)', diff saved to https://phabricator.wikimedia.org/P22389 and previous config saved to /var/cache/conftool/dbconfig/20220314-090830-marostegui.json
  • 09:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2017.codfw.wmnet with reason: host reimage
  • 09:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2017.codfw.wmnet with reason: host reimage
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22388 and previous config saved to /var/cache/conftool/dbconfig/20220314-085541-marostegui.json
  • 08:46 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2017.codfw.wmnet with OS bullseye
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T298294)', diff saved to https://phabricator.wikimedia.org/P22387 and previous config saved to /var/cache/conftool/dbconfig/20220314-084036-marostegui.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T298563)', diff saved to https://phabricator.wikimedia.org/P22386 and previous config saved to /var/cache/conftool/dbconfig/20220314-082846-marostegui.json
  • 08:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298563)', diff saved to https://phabricator.wikimedia.org/P22385 and previous config saved to /var/cache/conftool/dbconfig/20220314-082838-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T298294)', diff saved to https://phabricator.wikimedia.org/P22384 and previous config saved to /var/cache/conftool/dbconfig/20220314-081836-marostegui.json
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 08:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298294)', diff saved to https://phabricator.wikimedia.org/P22383 and previous config saved to /var/cache/conftool/dbconfig/20220314-081828-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22382 and previous config saved to /var/cache/conftool/dbconfig/20220314-081333-marostegui.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22381 and previous config saved to /var/cache/conftool/dbconfig/20220314-080323-marostegui.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22380 and previous config saved to /var/cache/conftool/dbconfig/20220314-075828-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22379 and previous config saved to /var/cache/conftool/dbconfig/20220314-074818-marostegui.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298563)', diff saved to https://phabricator.wikimedia.org/P22378 and previous config saved to /var/cache/conftool/dbconfig/20220314-074323-marostegui.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T298294)', diff saved to https://phabricator.wikimedia.org/P22377 and previous config saved to /var/cache/conftool/dbconfig/20220314-073313-marostegui.json
  • 07:18 elukey: restart varnishkafka-webrequest on cp6001 to test a metric issue
  • 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 07:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:11 marostegui: dbmaint on s7@eqiad T300775
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T298563)', diff saved to https://phabricator.wikimedia.org/P22376 and previous config saved to /var/cache/conftool/dbconfig/20220314-070721-marostegui.json
  • 07:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T298294)', diff saved to https://phabricator.wikimedia.org/P22375 and previous config saved to /var/cache/conftool/dbconfig/20220314-070404-marostegui.json
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 06:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 12 hosts with reason: Maintenance
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 12 hosts with reason: Maintenance
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 06:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 06:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance

2022-03-11

  • 15:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2014.codfw.wmnet with OS bullseye
  • 15:44 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2014.codfw.wmnet with reason: host reimage
  • 15:42 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2014.codfw.wmnet with reason: host reimage
  • 15:39 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:38 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:37 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:36 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:36 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:35 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:33 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 15:33 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 15:27 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2014.codfw.wmnet with OS bullseye
  • 15:07 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host kubernetes2013.codfw.wmnet with OS bullseye
  • 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After reboot', diff saved to https://phabricator.wikimedia.org/P22374 and previous config saved to /var/cache/conftool/dbconfig/20220311-150702-root.json
  • 15:02 XioNoX: cr1/2-eqiad AVOID-PATHS as-path TI "6762 .*"
  • 15:02 XioNoX: cr2-esams AVOID-PATHS as-path TI "6762 .*" <- rolled back
  • 14:57 XioNoX: cr2-esams AVOID-PATHS as-path TI "6762 .*"
  • 14:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2013.codfw.wmnet with reason: host reimage
  • 14:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After reboot', diff saved to https://phabricator.wikimedia.org/P22373 and previous config saved to /var/cache/conftool/dbconfig/20220311-145159-root.json
  • 14:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2013.codfw.wmnet with reason: host reimage
  • 14:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After reboot', diff saved to https://phabricator.wikimedia.org/P22372 and previous config saved to /var/cache/conftool/dbconfig/20220311-143652-root.json
  • 14:35 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2013.codfw.wmnet with OS bullseye
  • 14:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After reboot', diff saved to https://phabricator.wikimedia.org/P22371 and previous config saved to /var/cache/conftool/dbconfig/20220311-142147-root.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: After reboot', diff saved to https://phabricator.wikimedia.org/P22370 and previous config saved to /var/cache/conftool/dbconfig/20220311-140641-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P22369 and previous config saved to /var/cache/conftool/dbconfig/20220311-140549-marostegui.json
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: After reboot', diff saved to https://phabricator.wikimedia.org/P22368 and previous config saved to /var/cache/conftool/dbconfig/20220311-135137-root.json
  • 13:49 marostegui: dbmaint on s8@eqiad T300775
  • 13:49 marostegui: dbmaint on s1@eqiad T298294
  • 13:43 jelto: update pcc facts
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 1%: After reboot', diff saved to https://phabricator.wikimedia.org/P22367 and previous config saved to /var/cache/conftool/dbconfig/20220311-133633-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123', diff saved to https://phabricator.wikimedia.org/P22366 and previous config saved to /var/cache/conftool/dbconfig/20220311-133407-marostegui.json
  • 12:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cumin2001.codfw.wmnet
  • 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts cumin2001.codfw.wmnet
  • 11:18 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2012.codfw.wmnet with OS bullseye
  • 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 11:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 11:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2012.codfw.wmnet with reason: host reimage
  • 11:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2012.codfw.wmnet with reason: host reimage
  • 10:59 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
  • 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2110.codfw.wmnet with OS bullseye
  • 10:46 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2012.codfw.wmnet with OS bullseye
  • 10:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2011.codfw.wmnet with OS bullseye
  • 10:39 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
  • 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2110.codfw.wmnet with reason: host reimage
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2110.codfw.wmnet with reason: host reimage
  • 10:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2011.codfw.wmnet with reason: host reimage
  • 10:25 vgutierrez: disable certspotter - T303593
  • 10:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2011.codfw.wmnet with reason: host reimage
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 10:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2110.codfw.wmnet with OS bullseye
  • 10:16 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 10:09 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2011.codfw.wmnet with OS bullseye
  • 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 10:03 dcausse: manually installed jvmquake to wdqs1010 (test machine) from https://people.wikimedia.org/~jmm/jvmquake/
  • 09:54 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 09:47 vgutierrez: stopping certspotter on alert1001
  • 09:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 09:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 09:36 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 09:35 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 09:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 09:15 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 09:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 09:00 jayme: kubernetes2011:~# systemctl restart rsyslog.service - T289766
  • 08:52 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 08:51 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1017.eqiad.wmnet
  • 08:43 dcausse: restarting blazegraph on wdqs1012 (jvm stuck for 5hours)
  • 08:42 jynus: upgrade and restart db2139
  • 08:41 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host cloudvirt1017.eqiad.wmnet
  • 08:30 jynus: upgrade and restart db1145
  • 08:23 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1017.eqiad.wmnet
  • 08:21 ayounsi@cumin1001: START - Cookbook sre.hosts.dhcp for host cloudvirt1017.eqiad.wmnet
  • 08:19 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22364 and previous config saved to /var/cache/conftool/dbconfig/20220311-063921-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22363 and previous config saved to /var/cache/conftool/dbconfig/20220311-062417-root.json
  • 06:13 marostegui: Reboot dbproxy1014 T303174
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22362 and previous config saved to /var/cache/conftool/dbconfig/20220311-060913-root.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22361 and previous config saved to /var/cache/conftool/dbconfig/20220311-055409-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P22360 and previous config saved to /var/cache/conftool/dbconfig/20220311-054514-marostegui.json
  • 02:54 eileen: revision changed from 9fb68b24 to 252269c8
  • 01:56 eileen: civicrm revision changed from 8501c38c to 9fb68b24
  • 01:31 eileen: civicrm changed from 4cb2bdbc to 8501c38c
  • 00:33 TimStarling: on mwmaint1002 running populateGlobalEditCount.php
  • 00:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 00:01 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply

2022-03-10

  • 23:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 23:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 23:08 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 23:07 rzl@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 22:42 tstarling@deploy1002: Finished scap: global_edit_count gerrit 769561 (duration: 15m 12s)
  • 22:27 tstarling@deploy1002: Started scap: global_edit_count gerrit 769561
  • 22:24 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/User/CentralAuthUser.php: global_edit_count gerrit 769561 (duration: 00m 47s)
  • 22:24 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/Hooks/Handlers/UserEditCountUpdateHookHandler.php: global_edit_count gerrit 769561 (duration: 00m 47s)
  • 22:23 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/CentralAuthServices.php: global_edit_count gerrit 769561 (duration: 00m 47s)
  • 22:22 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/ServiceWiring.php: global_edit_count gerrit 769561 (duration: 00m 48s)
  • 22:21 tstarling@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/CentralAuth/includes/CentralAuthEditCounter.php: global_edit_count gerrit 769561 (duration: 00m 48s)
  • 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:08 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:05 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:04 bking@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:04 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:02 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 22:02 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - bking@cumin1001 - T301955
  • 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:41 rzl: UTC late B&C training window done
  • 21:39 rzl@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: CommonSettings: Update comment about Image Suggestions API (T294362) (duration: 00m 48s)
  • 21:34 rzl@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/DiscussionTools/modules/controller.js: Backport: Fix highlighting of comments when reloading (T303261) (duration: 00m 47s)
  • 21:33 rzl@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/VisualEditor/modules/ve-mw: Backport: Preserve classes on media wrapper links (T292657 T303469) (duration: 00m 49s)
  • 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:18 cstone: update Donation Interface revision changed from ca37a93e to 5db12b21
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:13 rzl@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove centralauth-oversight from the config (T302675) (duration: 00m 49s)
  • 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300775)', diff saved to https://phabricator.wikimedia.org/P22356 and previous config saved to /var/cache/conftool/dbconfig/20220310-205114-marostegui.json
  • 20:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P22355 and previous config saved to /var/cache/conftool/dbconfig/20220310-203608-marostegui.json
  • 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P22354 and previous config saved to /var/cache/conftool/dbconfig/20220310-202103-marostegui.json
  • 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300775)', diff saved to https://phabricator.wikimedia.org/P22353 and previous config saved to /var/cache/conftool/dbconfig/20220310-200558-marostegui.json
  • 19:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:47 volans: installed spicerack v2.3.2 on the cumin hosts
  • 19:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:46 volans@cumin2002: END (PASS) - Cookbook sre.misc-clusters.sretest (exit_code=0) rolling restart_daemons on A:sretest
  • 19:46 volans@cumin2002: START - Cookbook sre.misc-clusters.sretest rolling restart_daemons on A:sretest
  • 19:44 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.38.0-wmf.25 refs T300201
  • 19:44 volans: uploaded spicerack_2.3.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 19:33 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 19:32 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 19:32 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 19:31 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 19:29 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 19:29 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:07 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 19:06 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 19:06 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T300775)', diff saved to https://phabricator.wikimedia.org/P22352 and previous config saved to /var/cache/conftool/dbconfig/20220310-190544-marostegui.json
  • 19:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 19:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 19:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22351 and previous config saved to /var/cache/conftool/dbconfig/20220310-190530-marostegui.json
  • 19:04 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 19:04 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:02 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:02 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:01 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 19:00 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 18:59 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 18:59 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:58 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 18:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:57 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:56 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 18:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P22350 and previous config saved to /var/cache/conftool/dbconfig/20220310-185025-marostegui.json
  • 18:46 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 18:43 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 18:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 18:41 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:40 moritzm: restarting thumbor to pick up tiff security updates
  • 18:40 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:40 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:39 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 18:36 moritzm: installing tiff security updates
  • 18:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P22349 and previous config saved to /var/cache/conftool/dbconfig/20220310-183520-marostegui.json
  • 18:33 mbsantos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 18:30 mbsantos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 18:29 mbsantos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 18:28 mbsantos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 18:27 mbsantos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 18:26 mbsantos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22348 and previous config saved to /var/cache/conftool/dbconfig/20220310-182015-marostegui.json
  • 18:20 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 18:19 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 18:19 razzi: cumin 'C:elasticsearch' 'systemctl restart prometheus-wmf-elasticsearch-exporter-9200.service'
  • 18:15 razzi: systemctl restart prometheus-wmf-elasticsearch-exporter-9200.service on elastic2042 for T300295
  • 18:13 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 18:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 18:11 moritzm: installing cyrus-sasl2 security updates
  • 18:08 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 18:08 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 17:51 herron: repool thanos-fe1001
  • 17:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:43 herron: depooling thanos-fe1001 for envoy upgrade
  • 17:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:41 dancy@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: wmf-config: Use __DIR__ instead of "$IP/../wmf-config" (T45956) (duration: 00m 50s)
  • 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1070.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1068.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:41 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1071.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:40 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1069.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:29 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:28 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:25 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1070.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1069.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:23 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1068.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22347 and previous config saved to /var/cache/conftool/dbconfig/20220310-172001-marostegui.json
  • 17:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 17:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300775)', diff saved to https://phabricator.wikimedia.org/P22346 and previous config saved to /var/cache/conftool/dbconfig/20220310-171953-marostegui.json
  • 17:19 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1071.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P22345 and previous config saved to /var/cache/conftool/dbconfig/20220310-170448-marostegui.json
  • 16:57 damilare: civicrm change revision from 9b5aafbc to 4cb2bdbc
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22344 and previous config saved to /var/cache/conftool/dbconfig/20220310-165014-ladsgroup.json
  • 16:50 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.mgmt with reason: Testing alertmanager downtime
  • 16:50 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.mgmt with reason: Testing alertmanager downtime
  • 16:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P22343 and previous config saved to /var/cache/conftool/dbconfig/20220310-164943-marostegui.json
  • 16:49 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:05:00 on D{cumin1001.mgmt} with reason: Testing alertmanager downtime
  • 16:49 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on D{cumin1001.mgmt} with reason: Testing alertmanager downtime
  • 16:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Testing alertmanager downtime
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22342 and previous config saved to /var/cache/conftool/dbconfig/20220310-163509-ladsgroup.json
  • 16:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300775)', diff saved to https://phabricator.wikimedia.org/P22341 and previous config saved to /var/cache/conftool/dbconfig/20220310-163438-marostegui.json
  • 16:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on doh1002.wikimedia.org with reason: testing eBPF filtering
  • 16:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on doh1002.wikimedia.org with reason: testing eBPF filtering
  • 16:30 sukhe: depool doh1002 for testing eBPF
  • 16:21 volans: uploaded spicerack_2.3.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22340 and previous config saved to /var/cache/conftool/dbconfig/20220310-162004-ladsgroup.json
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22339 and previous config saved to /var/cache/conftool/dbconfig/20220310-160457-ladsgroup.json
  • 15:57 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 15:56 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1121.eqiad.wmnet with OS bullseye
  • 15:47 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 15:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: host reimage
  • 15:37 moritzm: rolling restart of thumbor to pick up expat security updates
  • 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: host reimage
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298294)', diff saved to https://phabricator.wikimedia.org/P22338 and previous config saved to /var/cache/conftool/dbconfig/20220310-153428-marostegui.json
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T300775)', diff saved to https://phabricator.wikimedia.org/P22337 and previous config saved to /var/cache/conftool/dbconfig/20220310-153424-marostegui.json
  • 15:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 15:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 15:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300775)', diff saved to https://phabricator.wikimedia.org/P22336 and previous config saved to /var/cache/conftool/dbconfig/20220310-153416-marostegui.json
  • 15:33 sukhe: upload certspotter 0.10-1wm1 to apt.wm.o - T204993
  • 15:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1121.eqiad.wmnet with OS bullseye
  • 15:21 moritzm: installing expat security updates on stretch
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22335 and previous config saved to /var/cache/conftool/dbconfig/20220310-151923-marostegui.json
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P22334 and previous config saved to /var/cache/conftool/dbconfig/20220310-151910-marostegui.json
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22333 and previous config saved to /var/cache/conftool/dbconfig/20220310-150839-ladsgroup.json
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22332 and previous config saved to /var/cache/conftool/dbconfig/20220310-150803-ladsgroup.json
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P22331 and previous config saved to /var/cache/conftool/dbconfig/20220310-150417-marostegui.json
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P22330 and previous config saved to /var/cache/conftool/dbconfig/20220310-150405-marostegui.json
  • 14:55 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:54 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22329 and previous config saved to /var/cache/conftool/dbconfig/20220310-145258-ladsgroup.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T298294)', diff saved to https://phabricator.wikimedia.org/P22328 and previous config saved to /var/cache/conftool/dbconfig/20220310-144911-marostegui.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300775)', diff saved to https://phabricator.wikimedia.org/P22327 and previous config saved to /var/cache/conftool/dbconfig/20220310-144900-marostegui.json
  • 14:44 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T298294)', diff saved to https://phabricator.wikimedia.org/P22326 and previous config saved to /var/cache/conftool/dbconfig/20220310-144222-marostegui.json
  • 14:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298294)', diff saved to https://phabricator.wikimedia.org/P22325 and previous config saved to /var/cache/conftool/dbconfig/20220310-144214-marostegui.json
  • 14:41 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 14:41 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: new kernel
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P22324 and previous config saved to /var/cache/conftool/dbconfig/20220310-143753-ladsgroup.json
  • 14:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22323 and previous config saved to /var/cache/conftool/dbconfig/20220310-142709-marostegui.json
  • 14:26 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:25 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22322 and previous config saved to /var/cache/conftool/dbconfig/20220310-142248-ladsgroup.json
  • 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P22321 and previous config saved to /var/cache/conftool/dbconfig/20220310-141204-marostegui.json
  • 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:08 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=eqiad
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:08 akosiaris: repool ores in eqiad in discovery records
  • 14:06 urbanecm: UTC afternoon B&C done
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T298294)', diff saved to https://phabricator.wikimedia.org/P22320 and previous config saved to /var/cache/conftool/dbconfig/20220310-135659-marostegui.json
  • 13:55 akosiaris: depool ores in eqiad from discovery records to initiate reboot of rdb1011
  • 13:55 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=eqiad
  • 13:51 akosiaris: repool ores in codfw in discovery records
  • 13:50 akosiaris@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=ores,name=codfw
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T298294)', diff saved to https://phabricator.wikimedia.org/P22319 and previous config saved to /var/cache/conftool/dbconfig/20220310-135047-marostegui.json
  • 13:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 13:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298294)', diff saved to https://phabricator.wikimedia.org/P22318 and previous config saved to /var/cache/conftool/dbconfig/20220310-135039-marostegui.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T300775)', diff saved to https://phabricator.wikimedia.org/P22317 and previous config saved to /var/cache/conftool/dbconfig/20220310-134807-marostegui.json
  • 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 13:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300775)', diff saved to https://phabricator.wikimedia.org/P22316 and previous config saved to /var/cache/conftool/dbconfig/20220310-134759-marostegui.json
  • 13:43 akosiaris: reboot rdb2007 for upgrades
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22315 and previous config saved to /var/cache/conftool/dbconfig/20220310-133534-marostegui.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P22314 and previous config saved to /var/cache/conftool/dbconfig/20220310-133254-marostegui.json
  • 13:27 akosiaris: depool ores in codfw from discovery records to initiate reboot of rdb2007
  • 13:26 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=ores,name=codfw
  • 13:22 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T302950)', diff saved to https://phabricator.wikimedia.org/P22313 and previous config saved to /var/cache/conftool/dbconfig/20220310-132234-ladsgroup.json
  • 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 13:20 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P22311 and previous config saved to /var/cache/conftool/dbconfig/20220310-132029-marostegui.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P22310 and previous config saved to /var/cache/conftool/dbconfig/20220310-131748-marostegui.json
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22309 and previous config saved to /var/cache/conftool/dbconfig/20220310-131214-ladsgroup.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T298294)', diff saved to https://phabricator.wikimedia.org/P22308 and previous config saved to /var/cache/conftool/dbconfig/20220310-130523-marostegui.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300775)', diff saved to https://phabricator.wikimedia.org/P22307 and previous config saved to /var/cache/conftool/dbconfig/20220310-130243-marostegui.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T298294)', diff saved to https://phabricator.wikimedia.org/P22306 and previous config saved to /var/cache/conftool/dbconfig/20220310-125909-marostegui.json
  • 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298294)', diff saved to https://phabricator.wikimedia.org/P22305 and previous config saved to /var/cache/conftool/dbconfig/20220310-125901-marostegui.json
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22304 and previous config saved to /var/cache/conftool/dbconfig/20220310-125709-ladsgroup.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22303 and previous config saved to /var/cache/conftool/dbconfig/20220310-124355-marostegui.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P22302 and previous config saved to /var/cache/conftool/dbconfig/20220310-124204-ladsgroup.json
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P22301 and previous config saved to /var/cache/conftool/dbconfig/20220310-122850-marostegui.json
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22300 and previous config saved to /var/cache/conftool/dbconfig/20220310-122659-ladsgroup.json
  • 12:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1141.eqiad.wmnet with OS bullseye
  • 12:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Reboots
  • 12:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Reboots
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T298294)', diff saved to https://phabricator.wikimedia.org/P22299 and previous config saved to /var/cache/conftool/dbconfig/20220310-121344-marostegui.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T300775)', diff saved to https://phabricator.wikimedia.org/P22298 and previous config saved to /var/cache/conftool/dbconfig/20220310-120228-marostegui.json
  • 12:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 12:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300775)', diff saved to https://phabricator.wikimedia.org/P22297 and previous config saved to /var/cache/conftool/dbconfig/20220310-120221-marostegui.json
  • 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1141.eqiad.wmnet with reason: host reimage
  • 11:58 marostegui: Failover m1 master
  • 11:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1141.eqiad.wmnet with reason: host reimage
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Reboots
  • 11:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 7 hosts with reason: Reboots
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P22296 and previous config saved to /var/cache/conftool/dbconfig/20220310-114715-marostegui.json
  • 11:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1141.eqiad.wmnet with OS bullseye
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T302950)', diff saved to https://phabricator.wikimedia.org/P22294 and previous config saved to /var/cache/conftool/dbconfig/20220310-113638-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P22293 and previous config saved to /var/cache/conftool/dbconfig/20220310-113210-marostegui.json
  • 11:29 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@b681376]: (no justification provided) (duration: 00m 07s)
  • 11:29 ebysans@deploy1002: Started deploy [airflow-dags/analytics@b681376]: (no justification provided)
  • 11:26 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 11:26 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 11:25 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 11:25 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1093.eqiad.wmnet
  • 11:24 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:24 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:24 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 11:18 volans: rolled out python3-wmflib v1.1.2 to the entire fleet (buster+ only)
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300775)', diff saved to https://phabricator.wikimedia.org/P22292 and previous config saved to /var/cache/conftool/dbconfig/20220310-111705-marostegui.json
  • 11:16 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic1093.eqiad.wmnet
  • 11:14 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1001.wikimedia.org
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T298294)', diff saved to https://phabricator.wikimedia.org/P22291 and previous config saved to /var/cache/conftool/dbconfig/20220310-111330-marostegui.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T300775)', diff saved to https://phabricator.wikimedia.org/P22290 and previous config saved to /var/cache/conftool/dbconfig/20220310-111320-marostegui.json
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300775)', diff saved to https://phabricator.wikimedia.org/P22289 and previous config saved to /var/cache/conftool/dbconfig/20220310-111313-marostegui.json
  • 11:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:10 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:10 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1001.wikimedia.org
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
  • 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 14 hosts with reason: Maintenance
  • 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 14 hosts with reason: Maintenance
  • 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:04 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298294)', diff saved to https://phabricator.wikimedia.org/P22287 and previous config saved to /var/cache/conftool/dbconfig/20220310-110253-marostegui.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22286 and previous config saved to /var/cache/conftool/dbconfig/20220310-105807-marostegui.json
  • 10:48 jbond: re-enable puppet fleet wide
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22285 and previous config saved to /var/cache/conftool/dbconfig/20220310-104748-marostegui.json
  • 10:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
  • 10:44 akosiaris: reboot rdb2009 for upgrades
  • 10:44 jbond: disable puppet fleet wide
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P22284 and previous config saved to /var/cache/conftool/dbconfig/20220310-104302-marostegui.json
  • 10:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2010.codfw.wmnet with OS bullseye
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P22283 and previous config saved to /var/cache/conftool/dbconfig/20220310-103243-marostegui.json
  • 10:30 moritzm: failover ganeti master for drmrs/B13 to ganeti6004
  • 10:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
  • 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300775)', diff saved to https://phabricator.wikimedia.org/P22282 and previous config saved to /var/cache/conftool/dbconfig/20220310-102757-marostegui.json
  • 10:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2010.codfw.wmnet with reason: host reimage
  • 10:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T298294)', diff saved to https://phabricator.wikimedia.org/P22281 and previous config saved to /var/cache/conftool/dbconfig/20220310-101738-marostegui.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T298294)', diff saved to https://phabricator.wikimedia.org/P22280 and previous config saved to /var/cache/conftool/dbconfig/20220310-101133-marostegui.json
  • 10:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 10:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298294)', diff saved to https://phabricator.wikimedia.org/P22279 and previous config saved to /var/cache/conftool/dbconfig/20220310-101125-marostegui.json
  • 10:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2010.codfw.wmnet with OS bullseye
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
  • 10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22278 and previous config saved to /var/cache/conftool/dbconfig/20220310-095620-marostegui.json
  • 09:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2009.codfw.wmnet with OS bullseye
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P22277 and previous config saved to /var/cache/conftool/dbconfig/20220310-094115-marostegui.json
  • 09:40 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
  • 09:38 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2009.codfw.wmnet with reason: host reimage
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T300775)', diff saved to https://phabricator.wikimedia.org/P22276 and previous config saved to /var/cache/conftool/dbconfig/20220310-092742-marostegui.json
  • 09:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300775)', diff saved to https://phabricator.wikimedia.org/P22275 and previous config saved to /var/cache/conftool/dbconfig/20220310-092735-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T298294)', diff saved to https://phabricator.wikimedia.org/P22274 and previous config saved to /var/cache/conftool/dbconfig/20220310-092610-marostegui.json
  • 09:22 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2009.codfw.wmnet with OS bullseye
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T298294)', diff saved to https://phabricator.wikimedia.org/P22273 and previous config saved to /var/cache/conftool/dbconfig/20220310-091807-marostegui.json
  • 09:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298294)', diff saved to https://phabricator.wikimedia.org/P22272 and previous config saved to /var/cache/conftool/dbconfig/20220310-091759-marostegui.json
  • 09:16 moritzm: failover ganeti master for drmrs/B12 to ganeti6003
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P22271 and previous config saved to /var/cache/conftool/dbconfig/20220310-091230-marostegui.json
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
  • 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22270 and previous config saved to /var/cache/conftool/dbconfig/20220310-090254-marostegui.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P22269 and previous config saved to /var/cache/conftool/dbconfig/20220310-085724-marostegui.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P22268 and previous config saved to /var/cache/conftool/dbconfig/20220310-084749-marostegui.json
  • 08:43 apergos: UTC morning backport and config window completed
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300775)', diff saved to https://phabricator.wikimedia.org/P22267 and previous config saved to /var/cache/conftool/dbconfig/20220310-084219-marostegui.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P22266 and previous config saved to /var/cache/conftool/dbconfig/20220310-084139-marostegui.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: After reboot5', diff saved to https://phabricator.wikimedia.org/P22265 and previous config saved to /var/cache/conftool/dbconfig/20220310-083732-root.json
  • 08:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T298294)', diff saved to https://phabricator.wikimedia.org/P22264 and previous config saved to /var/cache/conftool/dbconfig/20220310-083244-marostegui.json
  • 08:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P22263 and previous config saved to /var/cache/conftool/dbconfig/20220310-082737-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T298294)', diff saved to https://phabricator.wikimedia.org/P22262 and previous config saved to /var/cache/conftool/dbconfig/20220310-082642-marostegui.json
  • 08:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298294)', diff saved to https://phabricator.wikimedia.org/P22261 and previous config saved to /var/cache/conftool/dbconfig/20220310-082634-marostegui.json
  • 08:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:24 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Part 2: SectionTranslation: Also add languages to target (T298237) (duration: 00m 49s)
  • 08:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T300775)', diff saved to https://phabricator.wikimedia.org/P22260 and previous config saved to /var/cache/conftool/dbconfig/20220310-082234-marostegui.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 10%: After reboot5', diff saved to https://phabricator.wikimedia.org/P22259 and previous config saved to /var/cache/conftool/dbconfig/20220310-082227-root.json
  • 08:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 08:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300775)', diff saved to https://phabricator.wikimedia.org/P22258 and previous config saved to /var/cache/conftool/dbconfig/20220310-082223-marostegui.json
  • 08:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:19 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Part 1: Enable SectionTranslation on Javanese, Tagalog, Mongolian, Telugu WPs (T298237) (duration: 00m 50s)
  • 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099 (s1, s8) for reboot', diff saved to https://phabricator.wikimedia.org/P22256 and previous config saved to /var/cache/conftool/dbconfig/20220310-081244-marostegui.json
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P22255 and previous config saved to /var/cache/conftool/dbconfig/20220310-081129-marostegui.json
  • 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P22254 and previous config saved to /var/cache/conftool/dbconfig/20220310-080718-marostegui.json
  • 08:03 marostegui: Reboot dbproxy1017 1016 T303174
  • 08:00 marostegui: Reboot dbproxy1012, 1015, 1016 T303174
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P22253 and previous config saved to /var/cache/conftool/dbconfig/20220310-075623-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P22252 and previous config saved to /var/cache/conftool/dbconfig/20220310-075213-marostegui.json
  • 07:43 marostegui: Reboot dbproxy2001, 2002, 2003, 2004 T303174
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T298294)', diff saved to https://phabricator.wikimedia.org/P22251 and previous config saved to /var/cache/conftool/dbconfig/20220310-074118-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300775)', diff saved to https://phabricator.wikimedia.org/P22250 and previous config saved to /var/cache/conftool/dbconfig/20220310-073708-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T298294)', diff saved to https://phabricator.wikimedia.org/P22249 and previous config saved to /var/cache/conftool/dbconfig/20220310-073523-marostegui.json
  • 07:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22248 and previous config saved to /var/cache/conftool/dbconfig/20220310-073022-marostegui.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T300775)', diff saved to https://phabricator.wikimedia.org/P22247 and previous config saved to /var/cache/conftool/dbconfig/20220310-072124-marostegui.json
  • 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
  • 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 12 hosts with reason: Maintenance
  • 07:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22246 and previous config saved to /var/cache/conftool/dbconfig/20220310-072019-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22245 and previous config saved to /var/cache/conftool/dbconfig/20220310-071516-marostegui.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P22244 and previous config saved to /var/cache/conftool/dbconfig/20220310-070514-marostegui.json
  • 07:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1132.eqiad.wmnet with OS bullseye
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P22243 and previous config saved to /var/cache/conftool/dbconfig/20220310-070011-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P22242 and previous config saved to /var/cache/conftool/dbconfig/20220310-065009-marostegui.json
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1132.eqiad.wmnet with reason: host reimage
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22241 and previous config saved to /var/cache/conftool/dbconfig/20220310-064506-marostegui.json
  • 06:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1132.eqiad.wmnet with reason: host reimage
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22240 and previous config saved to /var/cache/conftool/dbconfig/20220310-063858-marostegui.json
  • 06:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22239 and previous config saved to /var/cache/conftool/dbconfig/20220310-063850-marostegui.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22238 and previous config saved to /var/cache/conftool/dbconfig/20220310-063503-marostegui.json
  • 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1132.eqiad.wmnet with OS bullseye
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T300775)', diff saved to https://phabricator.wikimedia.org/P22237 and previous config saved to /var/cache/conftool/dbconfig/20220310-063017-marostegui.json
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22236 and previous config saved to /var/cache/conftool/dbconfig/20220310-062345-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P22235 and previous config saved to /var/cache/conftool/dbconfig/20220310-060840-marostegui.json
  • 06:07 marostegui: dbmaint on s3@eqiad T272512
  • 06:05 marostegui: dbmaint on s7@eqiad T272512
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22234 and previous config saved to /var/cache/conftool/dbconfig/20220310-055335-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T298294)', diff saved to https://phabricator.wikimedia.org/P22233 and previous config saved to /var/cache/conftool/dbconfig/20220310-054701-marostegui.json
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui: dbmaint on s5@eqiad T272512
  • 05:46 marostegui: dbmaint on s4@eqiad T272512
  • 05:46 marostegui: dbmaint on pc3@eqiad T272512
  • 05:45 marostegui: dbmaint on pc2@eqiad T272512
  • 05:45 marostegui: dbmaint on pc1@eqiad T272512
  • 05:45 marostegui: dbmaint on s2@eqiad T272512
  • 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22232 and previous config saved to /var/cache/conftool/dbconfig/20220310-053950-marostegui.json
  • 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 00:26 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@7975c27]: (no justification provided) (duration: 00m 08s)
  • 00:26 ebysans@deploy1002: Started deploy [airflow-dags/analytics@7975c27]: (no justification provided)
  • 00:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-03-09

  • 23:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:09 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.25 refs T300201 (duration: 00m 49s)
  • 23:08 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 23:08 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 23:08 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.25 refs T300201
  • 23:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1047.eqiad.wmnet
  • 22:59 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1047.eqiad.wmnet
  • 22:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1047.eqiad.wmnet
  • 22:54 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1047.eqiad.wmnet
  • 22:35 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:35 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:31 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300775)', diff saved to https://phabricator.wikimedia.org/P22229 and previous config saved to /var/cache/conftool/dbconfig/20220309-223130-marostegui.json
  • 22:15 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22228 and previous config saved to /var/cache/conftool/dbconfig/20220309-221555-marostegui.json
  • 22:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P22226 and previous config saved to /var/cache/conftool/dbconfig/20220309-220020-marostegui.json
  • 21:57 reedy@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/Gadgets: T303455 (duration: 00m 50s)
  • 21:54 volans: uploaded python3-wmflib_1.1.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 21:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:50 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 21:44 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300775)', diff saved to https://phabricator.wikimedia.org/P22225 and previous config saved to /var/cache/conftool/dbconfig/20220309-214445-marostegui.json
  • 21:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 21:10 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 21:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:06 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 20:51 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 20:49 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 20:48 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation restart without plugin upgrade (1 nodes at a time) for ElasticSearch cluster relforge: relforge cluster restart - ryankemper@cumin1001 - T301955
  • 20:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 20:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 20:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:00 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:54 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 19:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 19:47 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:21 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.24 refs T300201 (duration: 00m 50s)
  • 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:20 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.24 refs T300201
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:07 dancy@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.25 refs T300201 (duration: 00m 49s)
  • 19:06 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.25 refs T300201
  • 18:23 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1127 (T300775)', diff saved to https://phabricator.wikimedia.org/P22222 and previous config saved to /var/cache/conftool/dbconfig/20220309-182355-marostegui.json
  • 18:23 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22221 and previous config saved to /var/cache/conftool/dbconfig/20220309-182316-marostegui.json
  • 18:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22220 and previous config saved to /var/cache/conftool/dbconfig/20220309-180741-marostegui.json
  • 17:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22219 and previous config saved to /var/cache/conftool/dbconfig/20220309-175205-marostegui.json
  • 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:41 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 17:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:36 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22217 and previous config saved to /var/cache/conftool/dbconfig/20220309-173630-marostegui.json
  • 17:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bullseye
  • 17:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:29 reedy@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/WebAuthn/: T303404 (duration: 00m 53s)
  • 17:29 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:28 reedy@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/WebAuthn/: T303404 (duration: 00m 51s)
  • 17:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2008.codfw.wmnet with OS bullseye
  • 17:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage
  • 17:01 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2008.codfw.wmnet with reason: host reimage
  • 16:56 akosiaris: reboot rdb[2008,2010].codfw.wmnet,rdb[1010,1012].eqiad.wmnet for upgrades
  • 16:49 akosiaris: reboot rdb2008 for upgrades
  • 16:45 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2008.codfw.wmnet with OS bullseye
  • 16:22 moritzm: installing 5.10.103 kernels on bullseye hosts
  • 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host karapace1001.eqiad.wmnet
  • 16:00 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:57 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/includes/parser/Sanitizer.php: 31189c6: Ensure that the recognizedTagData static cache is properly initialized (T303360) (duration: 00m 51s)
  • 15:56 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 15:56 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host karapace1001.eqiad.wmnet
  • 15:33 jbond: deploy gerrit:740818 to add more genral rate limits for crawling cached and upload pages
  • 15:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2007.codfw.wmnet with OS bullseye
  • 15:28 volans: uploaded spicerack_2.3.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 15:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage
  • 15:16 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2007.codfw.wmnet with reason: host reimage
  • 15:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:06 taavi: UTC afternoon deploys done
  • 15:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:06 awight@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/VisualEditor/modules/ve-mw/ui/styles/pages/ve.ui.MWParameterPage.css: Backport: Fix missing padding on inline descriptions (T303386) (duration: 00m 49s)
  • 15:05 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 6 hosts with reason: Maintenance
  • 15:05 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on 6 hosts with reason: Maintenance
  • 15:05 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 15:05 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 15:05 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298294)', diff saved to https://phabricator.wikimedia.org/P22215 and previous config saved to /var/cache/conftool/dbconfig/20220309-150523-marostegui.json
  • 15:03 awight@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/VisualEditor/modules/ve-mw/ui/styles/pages/ve.ui.MWParameterPage.css: Backport: Fix missing padding on inline descriptions (T303386) (duration: 00m 49s)
  • 15:01 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2007.codfw.wmnet with OS bullseye
  • 15:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:58 taavi@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/Gadgets/extension.json: Backport: wmf.24 HACK: Add forward class alias for Gadget (T303391) (2/2) (duration: 00m 49s)
  • 14:57 taavi@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/Gadgets/includes: Backport: wmf.24 HACK: Add forward class alias for Gadget (T303391) (1/2) (duration: 00m 50s)
  • 14:55 volans@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1001.eqiad.wmnet with reason: Release v0.4.0 to reimaged cumin1001 - volans@cumin1001
  • 14:54 volans@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin1001.eqiad.wmnet with reason: Release v0.4.0 to reimaged cumin1001 - volans@cumin1001
  • 14:49 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22213 and previous config saved to /var/cache/conftool/dbconfig/20220309-144948-marostegui.json
  • 14:34 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22212 and previous config saved to /var/cache/conftool/dbconfig/20220309-143413-marostegui.json
  • 14:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:27 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Add IPInfo viewing rights for certain groups (T296499) (no-op on prod) (duration: 00m 50s)
  • 14:18 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T298294)', diff saved to https://phabricator.wikimedia.org/P22211 and previous config saved to /var/cache/conftool/dbconfig/20220309-141837-marostegui.json
  • 14:13 damilare: civicrm revision changed from cb0605ed to 9b5aafbc
  • 14:02 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1112 (T298294)', diff saved to https://phabricator.wikimedia.org/P22210 and previous config saved to /var/cache/conftool/dbconfig/20220309-140158-marostegui.json
  • 14:01 marostegui: Failover m5 from db1132 to db1107 - T302190
  • 14:01 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 13:59 btullis: restarting pybal on lvs1019 T301458
  • 13:51 btullis: restarting pybal on lvs102 T301458
  • 13:47 marostegui: dbmaint on s8@eqiad T272512
  • 13:46 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1101:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22209 and previous config saved to /var/cache/conftool/dbconfig/20220309-134631-marostegui.json
  • 13:45 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 13:45 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22208 and previous config saved to /var/cache/conftool/dbconfig/20220309-134552-marostegui.json
  • 13:42 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:42 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:42 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298294)', diff saved to https://phabricator.wikimedia.org/P22207 and previous config saved to /var/cache/conftool/dbconfig/20220309-134235-marostegui.json
  • 13:30 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22206 and previous config saved to /var/cache/conftool/dbconfig/20220309-133017-marostegui.json
  • 13:27 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22205 and previous config saved to /var/cache/conftool/dbconfig/20220309-132700-marostegui.json
  • 13:14 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P22204 and previous config saved to /var/cache/conftool/dbconfig/20220309-131442-marostegui.json
  • 13:11 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22203 and previous config saved to /var/cache/conftool/dbconfig/20220309-131124-marostegui.json
  • 12:59 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22202 and previous config saved to /var/cache/conftool/dbconfig/20220309-125907-marostegui.json
  • 12:56 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest[1001-1002].eqiad.wmnet with reason: just a test
  • 12:56 jmm@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on sretest[1001-1002].eqiad.wmnet with reason: just a test
  • 12:55 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T298294)', diff saved to https://phabricator.wikimedia.org/P22201 and previous config saved to /var/cache/conftool/dbconfig/20220309-125549-marostegui.json
  • 12:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cumin1001.eqiad.wmnet with OS bullseye
  • 12:26 btullis@cumin2002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 12:25 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1179 (T298294)', diff saved to https://phabricator.wikimedia.org/P22200 and previous config saved to /var/cache/conftool/dbconfig/20220309-122536-marostegui.json
  • 12:25 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:24 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298294)', diff saved to https://phabricator.wikimedia.org/P22199 and previous config saved to /var/cache/conftool/dbconfig/20220309-120554-marostegui.json
  • 11:50 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22198 and previous config saved to /var/cache/conftool/dbconfig/20220309-115019-marostegui.json
  • 11:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:43 awight: sketchy EU deployment complete.
  • 11:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:42 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Syntax highlighting color scheme update on all wikis except enwiki (T280024) (duration: 00m 50s)
  • 11:41 btullis@cumin2002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 11:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:37 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Bracket matching on all wikis except enwiki (T280023) (duration: 00m 49s)
  • 11:34 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22197 and previous config saved to /var/cache/conftool/dbconfig/20220309-113442-marostegui.json
  • 11:32 awight@deploy1002: Synchronized wmf-config/: Config: VE template expanded sidebar and inline descriptions on all wikis except enwiki (T286991) (duration: 00m 51s)
  • 11:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin1001.eqiad.wmnet with reason: host reimage
  • 11:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin1001.eqiad.wmnet with reason: host reimage
  • 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:19 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T298294)', diff saved to https://phabricator.wikimedia.org/P22195 and previous config saved to /var/cache/conftool/dbconfig/20220309-111907-marostegui.json
  • 11:17 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: VE template back and delete button on all wikis except enwiki (T286990) (duration: 00m 50s)
  • 11:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cumin1001.eqiad.wmnet with OS bullseye
  • 11:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:11 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Template search improvements to all wikis except enwiki (T286990) (duration: 00m 51s)
  • 11:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1016.eqiad.wmnet
  • 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:51 btullis@cumin2002: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 10:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2001.wikimedia.org
  • 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
  • 10:39 btullis@cumin2002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 10:32 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1175 (T298294)', diff saved to https://phabricator.wikimedia.org/P22194 and previous config saved to /var/cache/conftool/dbconfig/20220309-103226-marostegui.json
  • 10:31 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:31 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298294)', diff saved to https://phabricator.wikimedia.org/P22193 and previous config saved to /var/cache/conftool/dbconfig/20220309-103146-marostegui.json
  • 10:29 marostegui: dbmaint on s6@eqiad T272512
  • 10:29 marostegui: dbmaint on s3@eqiad T298295
  • 10:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 10:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 10:16 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22192 and previous config saved to /var/cache/conftool/dbconfig/20220309-101610-marostegui.json
  • 10:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 10:08 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: reenable DPL on nowikimedia (duration: 00m 51s)
  • 10:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22191 and previous config saved to /var/cache/conftool/dbconfig/20220309-100036-marostegui.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2147', diff saved to https://phabricator.wikimedia.org/P22190 and previous config saved to /var/cache/conftool/dbconfig/20220309-094704-marostegui.json
  • 09:45 marostegui: dbmaint on s7@eqiad T298295
  • 09:45 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T298294)', diff saved to https://phabricator.wikimedia.org/P22189 and previous config saved to /var/cache/conftool/dbconfig/20220309-094501-marostegui.json
  • 09:31 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1098:3317 (T300775)', diff saved to https://phabricator.wikimedia.org/P22188 and previous config saved to /var/cache/conftool/dbconfig/20220309-093119-marostegui.json
  • 09:30 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:30 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:27 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1166 (T298294)', diff saved to https://phabricator.wikimedia.org/P22187 and previous config saved to /var/cache/conftool/dbconfig/20220309-092731-marostegui.json
  • 09:26 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:26 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:23 marostegui: dbmaint on s2@eqiad T298295
  • 09:18 marostegui: dbmaint on s1@eqiad T298295
  • 09:16 marostegui: dbmaint on s4@eqiad T298295
  • 09:07 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298294)', diff saved to https://phabricator.wikimedia.org/P22186 and previous config saved to /var/cache/conftool/dbconfig/20220309-090737-marostegui.json
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 08:53 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
  • 08:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22184 and previous config saved to /var/cache/conftool/dbconfig/20220309-085201-marostegui.json
  • 08:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 08:49 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
  • 08:46 XioNoX: Redirect one of Microsoft's range to codfw - T282861
  • 08:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 08:43 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host dumpsdata1007.eqiad.wmnet
  • 08:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 08:36 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P22183 and previous config saved to /var/cache/conftool/dbconfig/20220309-083626-marostegui.json
  • 08:21 marostegui: dbmaint on s3@eqiad T300380
  • 08:20 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T298294)', diff saved to https://phabricator.wikimedia.org/P22182 and previous config saved to /var/cache/conftool/dbconfig/20220309-082051-marostegui.json
  • 08:11 marostegui: dbmaint on s7@eqiad T300380
  • 08:03 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1123 (T298294)', diff saved to https://phabricator.wikimedia.org/P22181 and previous config saved to /var/cache/conftool/dbconfig/20220309-080307-marostegui.json
  • 08:02 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 08:02 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 40%: After schema change', diff saved to https://phabricator.wikimedia.org/P22180 and previous config saved to /var/cache/conftool/dbconfig/20220309-075704-root.json
  • 07:55 marostegui: dbmaint on s2@eqiad T300380
  • 07:49 marostegui: dbmaint on s8@eqiad T300380
  • 07:49 marostegui: dbmaint on s4@eqiad T300380
  • 07:42 marostegui: dbmaint on s1@eqiad T300380
  • 07:42 marostegui: dbmaint on s6@eqiad T300380
  • 07:42 marostegui: dbmaint on s5@eqiad T300380
  • 07:42 marostegui: dbmaint on s5 T300380
  • 07:42 marostegui: dbmaint on s6 T300380
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22179 and previous config saved to /var/cache/conftool/dbconfig/20220309-074200-root.json
  • 07:41 marostegui: dbmaint on s1 T300380
  • 07:41 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22178 and previous config saved to /var/cache/conftool/dbconfig/20220309-074107-root.json
  • 07:34 marostegui: dbmaint on s7@eqiad T300775
  • 07:33 marostegui: dbmaint on db1123 s3@eqiad T300600
  • 07:31 elukey: manually sync pcc facts following https://wikitech.wikimedia.org/wiki/Help:Puppet-compiler#Manually_update_production
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 15%: After schema change', diff saved to https://phabricator.wikimedia.org/P22177 and previous config saved to /var/cache/conftool/dbconfig/20220309-072656-root.json
  • 07:25 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22176 and previous config saved to /var/cache/conftool/dbconfig/20220309-072540-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: After schema change', diff saved to https://phabricator.wikimedia.org/P22175 and previous config saved to /var/cache/conftool/dbconfig/20220309-071153-root.json
  • 07:10 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22174 and previous config saved to /var/cache/conftool/dbconfig/20220309-071014-root.json
  • 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1123.eqiad.wmnet with OS bullseye
  • 06:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1123.eqiad.wmnet with reason: host reimage
  • 06:54 marostegui@cumin2002: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22173 and previous config saved to /var/cache/conftool/dbconfig/20220309-065447-root.json
  • 06:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1123.eqiad.wmnet with reason: host reimage
  • 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1123.eqiad.wmnet with OS bullseye
  • 06:20 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1146:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22172 and previous config saved to /var/cache/conftool/dbconfig/20220309-062010-marostegui.json
  • 06:19 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 06:06 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:06 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 01:48 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22171 and previous config saved to /var/cache/conftool/dbconfig/20220309-014831-marostegui.json
  • 01:32 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22170 and previous config saved to /var/cache/conftool/dbconfig/20220309-013256-marostegui.json
  • 01:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P22169 and previous config saved to /var/cache/conftool/dbconfig/20220309-011721-marostegui.json
  • 01:01 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22168 and previous config saved to /var/cache/conftool/dbconfig/20220309-010146-marostegui.json
  • 00:53 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1105:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22167 and previous config saved to /var/cache/conftool/dbconfig/20220309-005325-marostegui.json
  • 00:52 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 00:52 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 00:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298294)', diff saved to https://phabricator.wikimedia.org/P22166 and previous config saved to /var/cache/conftool/dbconfig/20220309-005245-marostegui.json
  • 00:37 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22165 and previous config saved to /var/cache/conftool/dbconfig/20220309-003710-marostegui.json
  • 00:21 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P22164 and previous config saved to /var/cache/conftool/dbconfig/20220309-002135-marostegui.json
  • 00:06 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T298294)', diff saved to https://phabricator.wikimedia.org/P22163 and previous config saved to /var/cache/conftool/dbconfig/20220309-000600-marostegui.json
  • 00:02 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1182 (T298294)', diff saved to https://phabricator.wikimedia.org/P22162 and previous config saved to /var/cache/conftool/dbconfig/20220309-000250-marostegui.json
  • 00:02 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 00:02 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 00:00 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 00:00 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 00:00 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22161 and previous config saved to /var/cache/conftool/dbconfig/20220309-000025-marostegui.json

2022-03-08

  • 23:44 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22160 and previous config saved to /var/cache/conftool/dbconfig/20220308-234450-marostegui.json
  • 23:29 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P22159 and previous config saved to /var/cache/conftool/dbconfig/20220308-232915-marostegui.json
  • 23:13 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22158 and previous config saved to /var/cache/conftool/dbconfig/20220308-231340-marostegui.json
  • 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:10 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1170:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22157 and previous config saved to /var/cache/conftool/dbconfig/20220308-231028-marostegui.json
  • 23:09 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:09 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:09 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298294)', diff saved to https://phabricator.wikimedia.org/P22156 and previous config saved to /var/cache/conftool/dbconfig/20220308-230949-marostegui.json
  • 23:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:54 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22155 and previous config saved to /var/cache/conftool/dbconfig/20220308-225413-marostegui.json
  • 22:38 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P22153 and previous config saved to /var/cache/conftool/dbconfig/20220308-223838-marostegui.json
  • 22:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:24 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.25 refs T300201
  • 22:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T298294)', diff saved to https://phabricator.wikimedia.org/P22152 and previous config saved to /var/cache/conftool/dbconfig/20220308-222303-marostegui.json
  • 22:20 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1162 (T298294)', diff saved to https://phabricator.wikimedia.org/P22151 and previous config saved to /var/cache/conftool/dbconfig/20220308-222055-marostegui.json
  • 22:20 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 22:20 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 22:20 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298294)', diff saved to https://phabricator.wikimedia.org/P22150 and previous config saved to /var/cache/conftool/dbconfig/20220308-222016-marostegui.json
  • 22:04 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22149 and previous config saved to /var/cache/conftool/dbconfig/20220308-220441-marostegui.json
  • 21:49 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P22148 and previous config saved to /var/cache/conftool/dbconfig/20220308-214906-marostegui.json
  • 21:40 andrew@cumin1001: START - Cookbook sre.hosts.dhcp for host cloudvirt1016.eqiad.wmnet
  • 21:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:33 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T298294)', diff saved to https://phabricator.wikimedia.org/P22147 and previous config saved to /var/cache/conftool/dbconfig/20220308-213331-marostegui.json
  • 21:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:33 urbanecm: UTC early B&C window done
  • 21:32 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 21:30 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1156 (T298294)', diff saved to https://phabricator.wikimedia.org/P22146 and previous config saved to /var/cache/conftool/dbconfig/20220308-213024-marostegui.json
  • 21:29 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 21:29 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 21:29 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:29 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 21:29 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22145 and previous config saved to /var/cache/conftool/dbconfig/20220308-212939-marostegui.json
  • 21:28 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/DiscussionTools/includes/ApiDiscussionToolsEdit.php: cc5acc2: Fix handling of disabled mobileformat (T303262) (duration: 00m 49s)
  • 21:26 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.25/extensions/VisualEditor/includes/ApiVisualEditorEdit.php: a5c6d06: Fix handling of disabled mobileformat (T303262) (duration: 00m 49s)
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:18 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:14 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22144 and previous config saved to /var/cache/conftool/dbconfig/20220308-211404-marostegui.json
  • 21:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3132fca: Enable DiscussionTools autotopicsub on MediaWiki.org (T302256) (duration: 00m 49s)
  • 21:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:03 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.22 (duration: 01m 28s)
  • 21:01 dancy@deploy1002: Pruned MediaWiki: 1.38.0-wmf.23 (duration: 01m 46s)
  • 20:59 dancy@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.25 refs T300201 (duration: 32m 13s)
  • 20:58 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P22143 and previous config saved to /var/cache/conftool/dbconfig/20220308-205829-marostegui.json
  • 20:42 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22142 and previous config saved to /var/cache/conftool/dbconfig/20220308-204254-marostegui.json
  • 20:36 rzl: rzl@apt1001:~$ sudo -i reprepro copy bullseye-wikimedia buster-wikimedia envoyproxy # T300324
  • 20:36 rzl: rzl@apt1001:~$ sudo -i reprepro copy stretch-wikimedia buster-wikimedia envoyproxy # T300324
  • 20:27 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.25 refs T300201
  • 20:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 19:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 19:52 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 19:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 19:43 XioNoX: !log push DHCP term to labs-in filters on eqiad cr
  • 19:42 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1146:3312 (T298294)', diff saved to https://phabricator.wikimedia.org/P22139 and previous config saved to /var/cache/conftool/dbconfig/20220308-194159-marostegui.json
  • 19:41 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:41 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:39 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:39 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 19:39 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298294)', diff saved to https://phabricator.wikimedia.org/P22138 and previous config saved to /var/cache/conftool/dbconfig/20220308-193930-marostegui.json
  • 19:36 cstone: updated donorwiki revision changed from 73de4731 to ca37a93e
  • 19:32 dancy@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.25 refs T300201
  • 19:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22137 and previous config saved to /var/cache/conftool/dbconfig/20220308-192354-marostegui.json
  • 19:21 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:08 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P22136 and previous config saved to /var/cache/conftool/dbconfig/20220308-190818-marostegui.json
  • 18:55 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:53 ejegg: updated payments-wiki from 3dfac3b2 to ca37a93e
  • 18:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T298294)', diff saved to https://phabricator.wikimedia.org/P22135 and previous config saved to /var/cache/conftool/dbconfig/20220308-185242-marostegui.json
  • 18:50 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1129 (T298294)', diff saved to https://phabricator.wikimedia.org/P22134 and previous config saved to /var/cache/conftool/dbconfig/20220308-185033-marostegui.json
  • 18:49 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 18:49 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 18:49 vgutierrez@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp5004.eqsin.wmnet
  • 18:49 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 18:48 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 18:48 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 18:47 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 18:47 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 18:47 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 18:47 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 18:47 vgutierrez@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp1085.eqiad.wmnet
  • 18:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:35 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:35 bblack: cp10[3579] - restarting varnish-fe
  • 18:29 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1021.eqiad.wmnet with OS bullseye
  • 18:27 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1032.eqiad.wmnet with OS buster
  • 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1012.eqiad.wmnet with OS stretch
  • 18:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
  • 18:11 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
  • 18:10 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
  • 18:07 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1012.eqiad.wmnet with reason: host reimage
  • 17:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS buster
  • 17:57 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1012.eqiad.wmnet with OS stretch
  • 17:48 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298294)', diff saved to https://phabricator.wikimedia.org/P22133 and previous config saved to /var/cache/conftool/dbconfig/20220308-174838-marostegui.json
  • 17:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 17:33 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22132 and previous config saved to /var/cache/conftool/dbconfig/20220308-173302-marostegui.json
  • 17:27 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1017.eqiad.wmnet with OS bullseye
  • 17:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P22131 and previous config saved to /var/cache/conftool/dbconfig/20220308-171728-marostegui.json
  • 17:07 jbond: deploy minor clean up of puppetmaster classes gerrit:769072
  • 17:01 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T298294)', diff saved to https://phabricator.wikimedia.org/P22130 and previous config saved to /var/cache/conftool/dbconfig/20220308-170153-marostegui.json
  • 17:01 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 16:58 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1161 (T298294)', diff saved to https://phabricator.wikimedia.org/P22129 and previous config saved to /var/cache/conftool/dbconfig/20220308-165843-marostegui.json
  • 16:58 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:58 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:58 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 16:57 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 16:56 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:56 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 16:54 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:54 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:54 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22128 and previous config saved to /var/cache/conftool/dbconfig/20220308-165436-marostegui.json
  • 16:54 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 16:54 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on 10 hosts with reason: Maintenance
  • 16:53 inflatador: bking@deneb manually installed tox for T293862 . moritzm will add puppet patch for this
  • 16:53 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on 10 hosts with reason: Maintenance
  • 16:53 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:53 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:46 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1016.eqiad.wmnet with OS bullseye
  • 16:39 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22127 and previous config saved to /var/cache/conftool/dbconfig/20220308-163901-marostegui.json
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22126 and previous config saved to /var/cache/conftool/dbconfig/20220308-163835-root.json
  • 16:34 rzl: rzl@apt1001:~$ sudo -i reprepro -C main includedeb buster-wikimedia /home/rzl/envoyproxy_1.18.3-1_amd64.deb # reimporting from component/envoy-future into main, for T300324
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22125 and previous config saved to /var/cache/conftool/dbconfig/20220308-162331-root.json
  • 16:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P22124 and previous config saved to /var/cache/conftool/dbconfig/20220308-162326-marostegui.json
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22123 and previous config saved to /var/cache/conftool/dbconfig/20220308-160815-root.json
  • 16:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22122 and previous config saved to /var/cache/conftool/dbconfig/20220308-160751-marostegui.json
  • 16:05 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1113:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22121 and previous config saved to /var/cache/conftool/dbconfig/20220308-160542-marostegui.json
  • 16:05 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 16:05 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 16:04 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 16:04 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T298294)', diff saved to https://phabricator.wikimedia.org/P22120 and previous config saved to /var/cache/conftool/dbconfig/20220308-160416-marostegui.json
  • 16:02 inflatador: bking@deneb manually installed openjdk-11-jdk for T293862 . moritzm will add puppet patch for this
  • 15:55 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22119 and previous config saved to /var/cache/conftool/dbconfig/20220308-155312-root.json
  • 15:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:48 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22118 and previous config saved to /var/cache/conftool/dbconfig/20220308-154841-marostegui.json
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P22117 and previous config saved to /var/cache/conftool/dbconfig/20220308-154507-marostegui.json
  • 15:42 XioNoX: update capirca hosts definitions
  • 15:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22116 and previous config saved to /var/cache/conftool/dbconfig/20220308-154232-root.json
  • 15:40 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:39 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:33 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P22115 and previous config saved to /var/cache/conftool/dbconfig/20220308-153306-marostegui.json
  • 15:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T298294)', diff saved to https://phabricator.wikimedia.org/P22114 and previous config saved to /var/cache/conftool/dbconfig/20220308-151731-marostegui.json
  • 15:14 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1100 (T298294)', diff saved to https://phabricator.wikimedia.org/P22113 and previous config saved to /var/cache/conftool/dbconfig/20220308-151446-marostegui.json
  • 15:14 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 15:14 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 15:14 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298294)', diff saved to https://phabricator.wikimedia.org/P22112 and previous config saved to /var/cache/conftool/dbconfig/20220308-151406-marostegui.json
  • 14:58 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22111 and previous config saved to /var/cache/conftool/dbconfig/20220308-145831-marostegui.json
  • 14:42 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P22110 and previous config saved to /var/cache/conftool/dbconfig/20220308-144256-marostegui.json
  • 14:33 urbanecm: UTC afternoon B&C window done
  • 14:32 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/DiscussionTools/includes/Notifications/DiscussionToolsEventTrait.php: 23939c7: Fix logic for finding the oldest comment in a bundle (T302014) (duration: 00m 50s)
  • 14:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:27 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T298294)', diff saved to https://phabricator.wikimedia.org/P22109 and previous config saved to /var/cache/conftool/dbconfig/20220308-142721-marostegui.json
  • 14:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:24 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1110 (T298294)', diff saved to https://phabricator.wikimedia.org/P22108 and previous config saved to /var/cache/conftool/dbconfig/20220308-142412-marostegui.json
  • 14:23 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22107 and previous config saved to /var/cache/conftool/dbconfig/20220308-142332-marostegui.json
  • 14:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:07 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22104 and previous config saved to /var/cache/conftool/dbconfig/20220308-140758-marostegui.json
  • 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:06 dcaro@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1003.wikimedia.org
  • 14:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 75465dd: fawiki: Add patrolmarks right to autopatrolled group (T303269) (duration: 00m 49s)
  • 13:56 aqu@deploy1002: Finished deploy [airflow-dags/analytics@d1c8ae0]: Fix wikidata_item_page_link destination table after tests (duration: 00m 07s)
  • 13:56 aqu@deploy1002: Started deploy [airflow-dags/analytics@d1c8ae0]: Fix wikidata_item_page_link destination table after tests
  • 13:52 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P22103 and previous config saved to /var/cache/conftool/dbconfig/20220308-135223-marostegui.json
  • 13:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
  • 13:46 dcaro@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1003.wikimedia.org
  • 13:40 aqu@deploy1002: Finished deploy [airflow-dags/analytics@725f528]: Set wikidata/item_page_link/weekly start date in production (duration: 00m 07s)
  • 13:40 aqu@deploy1002: Started deploy [airflow-dags/analytics@725f528]: Set wikidata/item_page_link/weekly start date in production
  • 13:40 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
  • 13:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
  • 13:36 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22102 and previous config saved to /var/cache/conftool/dbconfig/20220308-133647-marostegui.json
  • 13:34 btullis@cumin2002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 13:33 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1144:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22101 and previous config saved to /var/cache/conftool/dbconfig/20220308-133335-marostegui.json
  • 13:33 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:33 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 13:32 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22100 and previous config saved to /var/cache/conftool/dbconfig/20220308-133255-marostegui.json
  • 13:31 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
  • 13:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 13:17 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1004.wikimedia.org
  • 13:17 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
  • 13:17 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22099 and previous config saved to /var/cache/conftool/dbconfig/20220308-131720-marostegui.json
  • 13:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T300775)', diff saved to https://phabricator.wikimedia.org/P22098 and previous config saved to /var/cache/conftool/dbconfig/20220308-131420-marostegui.json
  • 13:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22097 and previous config saved to /var/cache/conftool/dbconfig/20220308-131309-root.json
  • 13:09 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 13:07 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1004.wikimedia.org
  • 13:07 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1005.wikimedia.org
  • 13:01 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P22096 and previous config saved to /var/cache/conftool/dbconfig/20220308-130145-marostegui.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22095 and previous config saved to /var/cache/conftool/dbconfig/20220308-125806-root.json
  • 12:57 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
  • 12:56 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1021.eqiad.wmnet
  • 12:51 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1021.eqiad.wmnet
  • 12:51 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudcephosd1021.eqiad.wmnet
  • 12:51 aborrero@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1021.eqiad.wmnet
  • 12:46 btullis@cumin2002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 12:46 marostegui@cumin2002: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22094 and previous config saved to /var/cache/conftool/dbconfig/20220308-124610-marostegui.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22093 and previous config saved to /var/cache/conftool/dbconfig/20220308-124302-root.json
  • 12:42 marostegui@cumin2002: dbctl commit (dc=all): 'Depooling db1096:3315 (T298294)', diff saved to https://phabricator.wikimedia.org/P22092 and previous config saved to /var/cache/conftool/dbconfig/20220308-124257-marostegui.json
  • 12:42 btullis@cumin2002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 12:42 marostegui@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:42 marostegui@cumin2002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22091 and previous config saved to /var/cache/conftool/dbconfig/20220308-122752-root.json
  • 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 8 hosts with reason: Maintenance
  • 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on 8 hosts with reason: Maintenance
  • 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 12:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 12:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298294)', diff saved to https://phabricator.wikimedia.org/P22090 and previous config saved to /var/cache/conftool/dbconfig/20220308-121443-marostegui.json
  • 12:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P22089 and previous config saved to /var/cache/conftool/dbconfig/20220308-115938-marostegui.json
  • 11:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:58 volans: uploaded spicerack_2.2.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 11:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:55 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: Use namespaced ApiFeatureUsageQueryEngineElastica T302907 (duration: 00m 49s)
  • 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
  • 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
  • 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 11:51 btullis@cumin2002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 11:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1083.eqiad.wmnet with OS buster
  • 11:48 vgutierrez: pool cp1083 with HAProxy as TLS termination layer - T290005
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P22088 and previous config saved to /var/cache/conftool/dbconfig/20220308-114434-marostegui.json
  • 11:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22086 and previous config saved to /var/cache/conftool/dbconfig/20220308-113424-root.json
  • 11:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2008.codfw.wmnet
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T300381)', diff saved to https://phabricator.wikimedia.org/P22085 and previous config saved to /var/cache/conftool/dbconfig/20220308-113110-marostegui.json
  • 11:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300381)', diff saved to https://phabricator.wikimedia.org/P22084 and previous config saved to /var/cache/conftool/dbconfig/20220308-113102-marostegui.json
  • 11:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T298294)', diff saved to https://phabricator.wikimedia.org/P22083 and previous config saved to /var/cache/conftool/dbconfig/20220308-112929-marostegui.json
  • 11:29 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T298294)', diff saved to https://phabricator.wikimedia.org/P22082 and previous config saved to /var/cache/conftool/dbconfig/20220308-112811-marostegui.json
  • 11:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22081 and previous config saved to /var/cache/conftool/dbconfig/20220308-112804-marostegui.json
  • 11:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 11:25 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2008.codfw.wmnet
  • 11:25 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2007.codfw.wmnet
  • 11:20 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
  • 11:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22080 and previous config saved to /var/cache/conftool/dbconfig/20220308-111920-root.json
  • 11:18 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2007.codfw.wmnet
  • 11:17 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22079 and previous config saved to /var/cache/conftool/dbconfig/20220308-111558-marostegui.json
  • 11:15 XioNoX: Cleanup transport-in filters for codfw/eqiad (CR747551)
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P22078 and previous config saved to /var/cache/conftool/dbconfig/20220308-111259-marostegui.json
  • 11:12 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2006.codfw.wmnet
  • 11:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 11:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1083.eqiad.wmnet with OS buster
  • 11:10 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1083.eqiad.wmnet with OS buster
  • 11:09 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 11:08 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2003.codfw.wmnet
  • 11:06 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1003.eqiad.wmnet
  • 11:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 11:05 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2006.codfw.wmnet
  • 11:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22077 and previous config saved to /var/cache/conftool/dbconfig/20220308-110416-root.json
  • 11:03 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2003.codfw.wmnet
  • 11:03 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
  • 11:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1083.eqiad.wmnet with reason: host reimage
  • 11:02 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 11:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2005.codfw.wmnet
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P22076 and previous config saved to /var/cache/conftool/dbconfig/20220308-110053-marostegui.json
  • 10:59 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:59 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 10:59 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
  • 10:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
  • 10:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P22075 and previous config saved to /var/cache/conftool/dbconfig/20220308-105754-marostegui.json
  • 10:57 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2002.codfw.wmnet
  • 10:54 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2005.codfw.wmnet
  • 10:52 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2002.codfw.wmnet
  • 10:52 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-cache2001.codfw.wmnet
  • 10:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
  • 10:51 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
  • 10:51 btullis: btullis@datahubsearch1001:~$ sudo systemctl reset-failed ifup@ens13.service T273026
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22074 and previous config saved to /var/cache/conftool/dbconfig/20220308-104913-root.json
  • 10:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
  • 10:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1083.eqiad.wmnet with OS buster
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T300381)', diff saved to https://phabricator.wikimedia.org/P22073 and previous config saved to /var/cache/conftool/dbconfig/20220308-104548-marostegui.json
  • 10:45 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-cache2001.codfw.wmnet
  • 10:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22072 and previous config saved to /var/cache/conftool/dbconfig/20220308-104250-marostegui.json
  • 10:39 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2035.codfw.wmnet with OS buster
  • 10:39 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
  • 10:36 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
  • 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
  • 10:34 vgutierrez: pool cp2035 with HAProxy as TLS termination layer - T290005
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22070 and previous config saved to /var/cache/conftool/dbconfig/20220308-103409-root.json
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22069 and previous config saved to /var/cache/conftool/dbconfig/20220308-103251-marostegui.json
  • 10:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T298294)', diff saved to https://phabricator.wikimedia.org/P22068 and previous config saved to /var/cache/conftool/dbconfig/20220308-103243-marostegui.json
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T300381)', diff saved to https://phabricator.wikimedia.org/P22067 and previous config saved to /var/cache/conftool/dbconfig/20220308-103017-marostegui.json
  • 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 10:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 10:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
  • 10:27 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
  • 10:27 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
  • 10:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
  • 10:22 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
  • 10:19 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
  • 10:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P22066 and previous config saved to /var/cache/conftool/dbconfig/20220308-101739-marostegui.json
  • 10:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
  • 10:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2035.codfw.wmnet with reason: host reimage
  • 10:12 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
  • 10:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
  • 10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300381)', diff saved to https://phabricator.wikimedia.org/P22065 and previous config saved to /var/cache/conftool/dbconfig/20220308-100559-marostegui.json
  • 10:03 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P22064 and previous config saved to /var/cache/conftool/dbconfig/20220308-100234-marostegui.json
  • 09:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2035.codfw.wmnet with OS buster
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22063 and previous config saved to /var/cache/conftool/dbconfig/20220308-095055-marostegui.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T298294)', diff saved to https://phabricator.wikimedia.org/P22062 and previous config saved to /var/cache/conftool/dbconfig/20220308-094730-marostegui.json
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T298294)', diff saved to https://phabricator.wikimedia.org/P22061 and previous config saved to /var/cache/conftool/dbconfig/20220308-094613-marostegui.json
  • 09:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298294)', diff saved to https://phabricator.wikimedia.org/P22060 and previous config saved to /var/cache/conftool/dbconfig/20220308-094605-marostegui.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T300775)', diff saved to https://phabricator.wikimedia.org/P22059 and previous config saved to /var/cache/conftool/dbconfig/20220308-094354-marostegui.json
  • 09:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 09:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P22058 and previous config saved to /var/cache/conftool/dbconfig/20220308-094155-root.json
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P22057 and previous config saved to /var/cache/conftool/dbconfig/20220308-093550-marostegui.json
  • 09:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2022.codfw.wmnet
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P22056 and previous config saved to /var/cache/conftool/dbconfig/20220308-093101-marostegui.json
  • 09:27 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2022.codfw.wmnet
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22055 and previous config saved to /var/cache/conftool/dbconfig/20220308-092651-root.json
  • 09:26 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2021.codfw.wmnet
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T300381)', diff saved to https://phabricator.wikimedia.org/P22054 and previous config saved to /var/cache/conftool/dbconfig/20220308-092045-marostegui.json
  • 09:18 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2021.codfw.wmnet
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P22053 and previous config saved to /var/cache/conftool/dbconfig/20220308-091556-marostegui.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22052 and previous config saved to /var/cache/conftool/dbconfig/20220308-091147-root.json
  • 09:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2020.codfw.wmnet
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T300381)', diff saved to https://phabricator.wikimedia.org/P22051 and previous config saved to /var/cache/conftool/dbconfig/20220308-090531-marostegui.json
  • 09:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2020.codfw.wmnet
  • 09:00 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2019.codfw.wmnet
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T298294)', diff saved to https://phabricator.wikimedia.org/P22050 and previous config saved to /var/cache/conftool/dbconfig/20220308-090051-marostegui.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T298294)', diff saved to https://phabricator.wikimedia.org/P22049 and previous config saved to /var/cache/conftool/dbconfig/20220308-085934-marostegui.json
  • 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T298294)', diff saved to https://phabricator.wikimedia.org/P22048 and previous config saved to /var/cache/conftool/dbconfig/20220308-085921-marostegui.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22047 and previous config saved to /var/cache/conftool/dbconfig/20220308-085644-root.json
  • 08:54 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2019.codfw.wmnet
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P22046 and previous config saved to /var/cache/conftool/dbconfig/20220308-084416-marostegui.json
  • 08:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300381)', diff saved to https://phabricator.wikimedia.org/P22045 and previous config saved to /var/cache/conftool/dbconfig/20220308-084148-marostegui.json
  • 08:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2018.codfw.wmnet
  • 08:32 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2018.codfw.wmnet
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P22044 and previous config saved to /var/cache/conftool/dbconfig/20220308-082912-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22043 and previous config saved to /var/cache/conftool/dbconfig/20220308-082643-marostegui.json
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:14 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Add image experiment for fa/fr/pt/trwiki (T302828) (duration: 00m 49s)
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T298294)', diff saved to https://phabricator.wikimedia.org/P22042 and previous config saved to /var/cache/conftool/dbconfig/20220308-081407-marostegui.json
  • 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P22041 and previous config saved to /var/cache/conftool/dbconfig/20220308-081138-marostegui.json
  • 08:11 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1004.eqiad.wmnet
  • 08:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage1004.eqiad.wmnet
  • 08:01 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1003.eqiad.wmnet
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T300381)', diff saved to https://phabricator.wikimedia.org/P22040 and previous config saved to /var/cache/conftool/dbconfig/20220308-075634-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T298294)', diff saved to https://phabricator.wikimedia.org/P22039 and previous config saved to /var/cache/conftool/dbconfig/20220308-075345-marostegui.json
  • 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22038 and previous config saved to /var/cache/conftool/dbconfig/20220308-075338-marostegui.json
  • 07:53 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage1003.eqiad.wmnet
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T300381)', diff saved to https://phabricator.wikimedia.org/P22037 and previous config saved to /var/cache/conftool/dbconfig/20220308-074136-marostegui.json
  • 07:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 07:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P22036 and previous config saved to /var/cache/conftool/dbconfig/20220308-073833-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P22035 and previous config saved to /var/cache/conftool/dbconfig/20220308-072329-marostegui.json
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300381)', diff saved to https://phabricator.wikimedia.org/P22034 and previous config saved to /var/cache/conftool/dbconfig/20220308-071724-marostegui.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22033 and previous config saved to /var/cache/conftool/dbconfig/20220308-070824-marostegui.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22032 and previous config saved to /var/cache/conftool/dbconfig/20220308-070728-marostegui.json
  • 07:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22031 and previous config saved to /var/cache/conftool/dbconfig/20220308-070721-marostegui.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22030 and previous config saved to /var/cache/conftool/dbconfig/20220308-070219-marostegui.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P22029 and previous config saved to /var/cache/conftool/dbconfig/20220308-065216-marostegui.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P22028 and previous config saved to /var/cache/conftool/dbconfig/20220308-064714-marostegui.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P22027 and previous config saved to /var/cache/conftool/dbconfig/20220308-063711-marostegui.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T300381)', diff saved to https://phabricator.wikimedia.org/P22026 and previous config saved to /var/cache/conftool/dbconfig/20220308-063210-marostegui.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22025 and previous config saved to /var/cache/conftool/dbconfig/20220308-062206-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T298294)', diff saved to https://phabricator.wikimedia.org/P22024 and previous config saved to /var/cache/conftool/dbconfig/20220308-062100-marostegui.json
  • 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T300775)', diff saved to https://phabricator.wikimedia.org/P22023 and previous config saved to /var/cache/conftool/dbconfig/20220308-061842-marostegui.json
  • 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T300381)', diff saved to https://phabricator.wikimedia.org/P22022 and previous config saved to /var/cache/conftool/dbconfig/20220308-061700-marostegui.json
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P22021 and previous config saved to /var/cache/conftool/dbconfig/20220308-061609-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P22020 and previous config saved to /var/cache/conftool/dbconfig/20220308-060106-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P22019 and previous config saved to /var/cache/conftool/dbconfig/20220308-054602-root.json
  • 02:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:57 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@1c598f5]: (no justification provided) (duration: 00m 04s)
  • 01:57 ebysans@deploy1002: Started deploy [airflow-dags/analytics@1c598f5]: (no justification provided)
  • 01:32 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@1c598f5]: (no justification provided) (duration: 00m 08s)
  • 01:31 ebysans@deploy1002: Started deploy [airflow-dags/analytics@1c598f5]: (no justification provided)
  • 01:22 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@21af07c]: (no justification provided) (duration: 00m 07s)
  • 01:22 ebysans@deploy1002: Started deploy [airflow-dags/analytics@21af07c]: (no justification provided)
  • 01:11 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@c47e886]: (no justification provided) (duration: 00m 04s)
  • 01:11 ebysans@deploy1002: Started deploy [airflow-dags/analytics@c47e886]: (no justification provided)
  • 01:07 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@c47e886]: (no justification provided) (duration: 00m 08s)
  • 01:07 ebysans@deploy1002: Started deploy [airflow-dags/analytics@c47e886]: (no justification provided)
  • 00:34 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@c8a753b]: (no justification provided) (duration: 00m 07s)
  • 00:34 ebysans@deploy1002: Started deploy [airflow-dags/analytics@c8a753b]: (no justification provided)
  • 00:08 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@b5f7840]: (no justification provided) (duration: 00m 08s)
  • 00:08 ebysans@deploy1002: Started deploy [airflow-dags/analytics@b5f7840]: (no justification provided)

2022-03-07

  • 23:50 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mx2001.wikimedia.org with reason: reboot
  • 23:50 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on mx2001.wikimedia.org with reason: reboot
  • 23:49 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mx1001.wikimedia.org with reason: reboot
  • 23:49 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on mx1001.wikimedia.org with reason: reboot
  • 23:40 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on mirror1001.wikimedia.org with reason: reboot
  • 23:40 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on mirror1001.wikimedia.org with reason: reboot
  • 22:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1003.wikimedia.org with OS bullseye
  • 22:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 22:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 22:25 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
  • 22:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1003.wikimedia.org with OS bullseye
  • 22:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 22:18 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 21:49 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
  • 21:38 urbanecm: UTC late B&C window done
  • 21:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:37 urbanecm@deploy1002: Synchronized php-1.38.0-wmf.24/skins/Vector/includes/SkinVector.php: eac551c: Fix language alert regression (T302018) (duration: 00m 50s)
  • 21:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:22 eileen: config aa7dcd88 -> 16fa8e1c
  • 20:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices1003.wikimedia.org with OS bullseye
  • 20:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 20:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1003.wikimedia.org with reason: host reimage
  • 19:49 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1003.wikimedia.org with OS bullseye
  • 18:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22016 and previous config saved to /var/cache/conftool/dbconfig/20220307-181310-marostegui.json
  • 17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22015 and previous config saved to /var/cache/conftool/dbconfig/20220307-175805-marostegui.json
  • 17:55 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2002.codfw.wmnet
  • 17:49 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
  • 17:47 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubestage2002.codfw.wmnet
  • 17:47 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2002.codfw.wmnet
  • 17:44 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet
  • 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P22014 and previous config saved to /var/cache/conftool/dbconfig/20220307-174300-marostegui.json
  • 17:36 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet
  • 17:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1004.wikimedia.org with OS bullseye
  • 17:29 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1022.eqiad.wmnet
  • 17:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22013 and previous config saved to /var/cache/conftool/dbconfig/20220307-172755-marostegui.json
  • 17:24 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1022.eqiad.wmnet
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22012 and previous config saved to /var/cache/conftool/dbconfig/20220307-172134-marostegui.json
  • 17:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22011 and previous config saved to /var/cache/conftool/dbconfig/20220307-172126-marostegui.json
  • 17:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp5004.eqsin.wmnet with reason: HW issues see T303043
  • 17:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp5004.eqsin.wmnet with reason: HW issues see T303043
  • 17:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1004.wikimedia.org with reason: host reimage
  • 17:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3058.esams.wmnet with OS buster
  • 17:07 vgutierrez: pool cp3058 with HAProxy as TLS termination layer - T290005
  • 17:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1004.wikimedia.org with reason: host reimage
  • 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22010 and previous config saved to /var/cache/conftool/dbconfig/20220307-170622-marostegui.json
  • 17:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
  • 16:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
  • 16:58 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 16:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
  • 16:54 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1004.wikimedia.org with OS bullseye
  • 16:52 vgutierrez: depool cp5004 - T303043
  • 16:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P22009 and previous config saved to /var/cache/conftool/dbconfig/20220307-165117-marostegui.json
  • 16:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
  • 16:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
  • 16:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 16:46 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 16:45 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 16:45 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3058.esams.wmnet with reason: host reimage
  • 16:44 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 16:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
  • 16:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5010.eqsin.wmnet with OS buster
  • 16:41 vgutierrez: pool cp5010 with HAProxy as TLS termination layer - T290005
  • 16:38 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 16:36 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
  • 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22008 and previous config saved to /var/cache/conftool/dbconfig/20220307-163612-marostegui.json
  • 16:36 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 16:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
  • 16:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
  • 16:29 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host graphite2003.codfw.wmnet
  • 16:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
  • 16:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
  • 16:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P22007 and previous config saved to /var/cache/conftool/dbconfig/20220307-162821-marostegui.json
  • 16:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 16:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 16:27 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
  • 16:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
  • 16:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
  • 16:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
  • 16:22 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
  • 16:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 16:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 16:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300381)', diff saved to https://phabricator.wikimedia.org/P22006 and previous config saved to /var/cache/conftool/dbconfig/20220307-162157-marostegui.json
  • 16:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
  • 16:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2002.wikimedia.org
  • 16:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
  • 16:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3058.esams.wmnet with OS buster
  • 16:17 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5010.eqsin.wmnet with reason: host reimage
  • 16:17 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
  • 16:16 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 16:16 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
  • 16:16 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host gerrit2002.wikimedia.org
  • 16:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5010.eqsin.wmnet with reason: host reimage
  • 16:14 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 16:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
  • 16:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 16:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
  • 16:09 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
  • 16:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
  • 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22005 and previous config saved to /var/cache/conftool/dbconfig/20220307-160650-marostegui.json
  • 16:06 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 16:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
  • 16:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
  • 16:04 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
  • 16:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
  • 16:04 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
  • 16:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
  • 16:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
  • 16:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
  • 15:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
  • 15:58 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 15:56 jayme: eqiad: kubectl -n istio-system delete po istiod-69d679d8b5-hm64j - T303184
  • 15:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P22004 and previous config saved to /var/cache/conftool/dbconfig/20220307-155146-marostegui.json
  • 15:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5010.eqsin.wmnet with OS buster
  • 15:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp1085.eqiad.wmnet with reason: HW issues see T303183
  • 15:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on cp1085.eqiad.wmnet with reason: HW issues see T303183
  • 15:38 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1085.eqiad.wmnet with OS buster
  • 15:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300381)', diff saved to https://phabricator.wikimedia.org/P22003 and previous config saved to /var/cache/conftool/dbconfig/20220307-153641-marostegui.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T300381)', diff saved to https://phabricator.wikimedia.org/P22002 and previous config saved to /var/cache/conftool/dbconfig/20220307-153357-marostegui.json
  • 15:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300381)', diff saved to https://phabricator.wikimedia.org/P22001 and previous config saved to /var/cache/conftool/dbconfig/20220307-153343-marostegui.json
  • 15:20 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@7642d65]: (no justification provided) (duration: 00m 07s)
  • 15:20 ntsako@deploy1002: Started deploy [airflow-dags/analytics@7642d65]: (no justification provided)
  • 15:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P22000 and previous config saved to /var/cache/conftool/dbconfig/20220307-151929-root.json
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21999 and previous config saved to /var/cache/conftool/dbconfig/20220307-151839-marostegui.json
  • 15:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:09 ntsako@deploy1002: Finished deploy [airflow-dags/analytics_test@7642d65]: (no justification provided) (duration: 00m 09s)
  • 15:09 ntsako@deploy1002: Started deploy [airflow-dags/analytics_test@7642d65]: (no justification provided)
  • 15:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 15:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21998 and previous config saved to /var/cache/conftool/dbconfig/20220307-150426-root.json
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 15:03 vgutierrez: pool cp4030 with HAProxy as TLS termination layer - T290005
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21997 and previous config saved to /var/cache/conftool/dbconfig/20220307-150334-marostegui.json
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 15:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
  • 15:02 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4030.ulsfo.wmnet with OS buster
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 15:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2088.codfw.wmnet with reason: Maintenance
  • 14:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
  • 14:56 vgutierrez: depool cp1085
  • 14:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21996 and previous config saved to /var/cache/conftool/dbconfig/20220307-144922-root.json
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300381)', diff saved to https://phabricator.wikimedia.org/P21995 and previous config saved to /var/cache/conftool/dbconfig/20220307-144829-marostegui.json
  • 14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 14:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:45 vgutierrez: pool cp1085 with HAProxy as TLS termination layer - T290005
  • 14:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 14:37 urbanecm@deploy1002: Synchronized static/images/project-logos/: f50c474: Revert "Change temporary logo for slwiki" (T302661; 2/2) (duration: 00m 48s)
  • 14:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:36 urbanecm@deploy1002: Synchronized wmf-config/logos.php: f50c474: Revert "Change temporary logo for slwiki" (T302661; 1/2) (duration: 00m 49s)
  • 14:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:35 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@46d88a2]: (no justification provided) (duration: 00m 04s)
  • 14:35 ntsako@deploy1002: Started deploy [airflow-dags/analytics@46d88a2]: (no justification provided)
  • 14:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 14:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4030.ulsfo.wmnet with reason: host reimage
  • 14:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21994 and previous config saved to /var/cache/conftool/dbconfig/20220307-143419-root.json
  • 14:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1003.eqiad.wmnet
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21993 and previous config saved to /var/cache/conftool/dbconfig/20220307-143229-ladsgroup.json
  • 14:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4030.ulsfo.wmnet with reason: host reimage
  • 14:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host etherpad1003.eqiad.wmnet
  • 14:30 moritzm: rebooting etherpad1003 (running etherpad1003) for kernel update
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 14:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
  • 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
  • 14:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
  • 14:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1085.eqiad.wmnet with reason: host reimage
  • 14:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1002.eqiad.wmnet
  • 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
  • 14:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1002.eqiad.wmnet
  • 14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21992 and previous config saved to /var/cache/conftool/dbconfig/20220307-141915-root.json
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21991 and previous config saved to /var/cache/conftool/dbconfig/20220307-141911-root.json
  • 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 64b1284: Enable reply tool by default on enwiki (T296645) (duration: 00m 49s)
  • 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21990 and previous config saved to /var/cache/conftool/dbconfig/20220307-141724-ladsgroup.json
  • 14:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8f20ec9: fawiki: Disable creating community books and remove "Create a book" link from sidebar (T303173) (duration: 00m 49s)
  • 14:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4030.ulsfo.wmnet with OS buster
  • 14:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
  • 14:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
  • 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
  • 14:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1085.eqiad.wmnet with OS buster
  • 14:08 urbanecm@deploy1002: Synchronized logos/config.yaml: 8619f59: etwikiquote: Update logo (T302683; 3/3) (duration: 00m 49s)
  • 14:07 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 8619f59: etwikiquote: Update logo (T302683; 2/3) (duration: 00m 49s)
  • 14:07 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/etwikiquote.png (T302683)
  • 14:07 urbanecm@deploy1002: Synchronized static/images/project-logos/: 8619f59: etwikiquote: Update logo (T302683; 1/3) (duration: 00m 50s)
  • 14:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
  • 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21989 and previous config saved to /var/cache/conftool/dbconfig/20220307-140408-root.json
  • 14:02 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2037.codfw.wmnet with OS buster
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21988 and previous config saved to /var/cache/conftool/dbconfig/20220307-140219-ladsgroup.json
  • 14:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
  • 14:00 kormat: removing cumin2001 grants from all db sections T276589
  • 14:00 vgutierrez: pool cp2037 with HAProxy as TLS termination layer - T290005
  • 14:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
  • 13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21987 and previous config saved to /var/cache/conftool/dbconfig/20220307-135614-ladsgroup.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21986 and previous config saved to /var/cache/conftool/dbconfig/20220307-134904-root.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T300381)', diff saved to https://phabricator.wikimedia.org/P21985 and previous config saved to /var/cache/conftool/dbconfig/20220307-134848-marostegui.json
  • 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 13:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300381)', diff saved to https://phabricator.wikimedia.org/P21984 and previous config saved to /var/cache/conftool/dbconfig/20220307-134840-marostegui.json
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21983 and previous config saved to /var/cache/conftool/dbconfig/20220307-134715-ladsgroup.json
  • 13:47 aqu@deploy1002: Finished deploy [analytics/refinery@51d074b] (hadoop-test): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 07m 17s)
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21982 and previous config saved to /var/cache/conftool/dbconfig/20220307-134109-ladsgroup.json
  • 13:39 aqu@deploy1002: Started deploy [analytics/refinery@51d074b] (hadoop-test): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
  • 13:39 aqu@deploy1002: Finished deploy [analytics/refinery@51d074b] (thin): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 00m 08s)
  • 13:39 aqu@deploy1002: Started deploy [analytics/refinery@51d074b] (thin): Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
  • 13:37 aqu@deploy1002: Finished deploy [analytics/refinery@51d074b]: Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b] (duration: 25m 04s)
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21981 and previous config saved to /var/cache/conftool/dbconfig/20220307-133400-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21980 and previous config saved to /var/cache/conftool/dbconfig/20220307-133335-marostegui.json
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P21979 and previous config saved to /var/cache/conftool/dbconfig/20220307-132605-ladsgroup.json
  • 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1142.eqiad.wmnet with OS bullseye
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21978 and previous config saved to /var/cache/conftool/dbconfig/20220307-131857-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21977 and previous config saved to /var/cache/conftool/dbconfig/20220307-131830-marostegui.json
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096 (s5,s6)', diff saved to https://phabricator.wikimedia.org/P21976 and previous config saved to /var/cache/conftool/dbconfig/20220307-131606-marostegui.json
  • 13:12 aqu@deploy1002: Started deploy [analytics/refinery@51d074b]: Migrate wikidata/item_page_link/weekly from Oozie to Airflow [analytics/refinery@51d074b]
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21975 and previous config saved to /var/cache/conftool/dbconfig/20220307-131100-ladsgroup.json
  • 13:09 aqu_: About to deploy analytics/refinery - Migrate wikidata/item_page_link/weekly from Oozie to Airflow
  • 13:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1142.eqiad.wmnet with reason: host reimage
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21974 and previous config saved to /var/cache/conftool/dbconfig/20220307-130520-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21973 and previous config saved to /var/cache/conftool/dbconfig/20220307-130512-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1142.eqiad.wmnet with reason: host reimage
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300381)', diff saved to https://phabricator.wikimedia.org/P21972 and previous config saved to /var/cache/conftool/dbconfig/20220307-130326-marostegui.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T300381)', diff saved to https://phabricator.wikimedia.org/P21971 and previous config saved to /var/cache/conftool/dbconfig/20220307-125540-marostegui.json
  • 12:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300381)', diff saved to https://phabricator.wikimedia.org/P21970 and previous config saved to /var/cache/conftool/dbconfig/20220307-125532-marostegui.json
  • 12:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1142.eqiad.wmnet with OS bullseye
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21969 and previous config saved to /var/cache/conftool/dbconfig/20220307-125007-ladsgroup.json
  • 12:49 aqu@deploy1002: Finished deploy [airflow-dags/analytics@46d88a2]: Migrate wikidata/item_page_link/weekly (duration: 00m 07s)
  • 12:49 aqu@deploy1002: Started deploy [airflow-dags/analytics@46d88a2]: Migrate wikidata/item_page_link/weekly
  • 12:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T302950)', diff saved to https://phabricator.wikimedia.org/P21968 and previous config saved to /var/cache/conftool/dbconfig/20220307-124815-ladsgroup.json
  • 12:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21967 and previous config saved to /var/cache/conftool/dbconfig/20220307-124028-marostegui.json
  • 12:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2037.codfw.wmnet with reason: host reimage
  • 12:37 XioNoX: restart cr1-drmrs for software upgrade
  • 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P21966 and previous config saved to /var/cache/conftool/dbconfig/20220307-123503-ladsgroup.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21965 and previous config saved to /var/cache/conftool/dbconfig/20220307-122523-marostegui.json
  • 12:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2037.codfw.wmnet with OS buster
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21964 and previous config saved to /var/cache/conftool/dbconfig/20220307-121958-ladsgroup.json
  • 12:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3060.esams.wmnet with OS buster
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T300992)', diff saved to https://phabricator.wikimedia.org/P21963 and previous config saved to /var/cache/conftool/dbconfig/20220307-121443-ladsgroup.json
  • 12:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 12:13 vgutierrez: pool cp3060 with HAProxy as TLS termination layer - T290005
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T300775)', diff saved to https://phabricator.wikimedia.org/P21962 and previous config saved to /var/cache/conftool/dbconfig/20220307-121122-marostegui.json
  • 12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300381)', diff saved to https://phabricator.wikimedia.org/P21961 and previous config saved to /var/cache/conftool/dbconfig/20220307-121018-marostegui.json
  • 12:10 XioNoX: reboot cr2-drmrs for software upgrade
  • 12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300992)', diff saved to https://phabricator.wikimedia.org/P21960 and previous config saved to /var/cache/conftool/dbconfig/20220307-120821-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21959 and previous config saved to /var/cache/conftool/dbconfig/20220307-120722-ladsgroup.json
  • 12:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 12:06 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T300381)', diff saved to https://phabricator.wikimedia.org/P21958 and previous config saved to /var/cache/conftool/dbconfig/20220307-120532-marostegui.json
  • 12:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5016.eqsin.wmnet with OS buster
  • 12:03 vgutierrez: pool cp5016 with HAProxy as TLS termination layer - T290005
  • 11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
  • 11:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P21957 and previous config saved to /var/cache/conftool/dbconfig/20220307-115337-marostegui.json
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21956 and previous config saved to /var/cache/conftool/dbconfig/20220307-115316-ladsgroup.json
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21955 and previous config saved to /var/cache/conftool/dbconfig/20220307-115217-ladsgroup.json
  • 11:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3060.esams.wmnet with reason: host reimage
  • 11:45 XioNoX: remove MTU1400 on drmrs GTT links
  • 11:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21954 and previous config saved to /var/cache/conftool/dbconfig/20220307-113833-marostegui.json
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P21953 and previous config saved to /var/cache/conftool/dbconfig/20220307-113811-ladsgroup.json
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21952 and previous config saved to /var/cache/conftool/dbconfig/20220307-113712-ladsgroup.json
  • 11:36 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
  • 11:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 11:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.prepare-upgrade (exit_code=0)
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21951 and previous config saved to /var/cache/conftool/dbconfig/20220307-112328-marostegui.json
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T300992)', diff saved to https://phabricator.wikimedia.org/P21950 and previous config saved to /var/cache/conftool/dbconfig/20220307-112307-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21949 and previous config saved to /var/cache/conftool/dbconfig/20220307-112207-ladsgroup.json
  • 11:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3060.esams.wmnet with OS buster
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21948 and previous config saved to /var/cache/conftool/dbconfig/20220307-111834-root.json
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T300992)', diff saved to https://phabricator.wikimedia.org/P21947 and previous config saved to /var/cache/conftool/dbconfig/20220307-111816-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300992)', diff saved to https://phabricator.wikimedia.org/P21946 and previous config saved to /var/cache/conftool/dbconfig/20220307-111809-ladsgroup.json
  • 11:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4036.ulsfo.wmnet with OS buster
  • 11:12 vgutierrez: pool cp4036 with HAProxy as TLS termination layer - T290005
  • 11:10 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5016.eqsin.wmnet with OS buster
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P21945 and previous config saved to /var/cache/conftool/dbconfig/20220307-110823-marostegui.json
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21944 and previous config saved to /var/cache/conftool/dbconfig/20220307-110330-root.json
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21943 and previous config saved to /var/cache/conftool/dbconfig/20220307-110304-ladsgroup.json
  • 11:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1143.eqiad.wmnet with OS bullseye
  • 11:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1084.eqiad.wmnet with OS buster
  • 10:59 vgutierrez: pool cp1084 with HAProxy as TLS termination layer - T290005
  • 10:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4036.ulsfo.wmnet with reason: host reimage
  • 10:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4036.ulsfo.wmnet with reason: host reimage
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T300381)', diff saved to https://phabricator.wikimedia.org/P21942 and previous config saved to /var/cache/conftool/dbconfig/20220307-104906-marostegui.json
  • 10:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21941 and previous config saved to /var/cache/conftool/dbconfig/20220307-104826-root.json
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P21940 and previous config saved to /var/cache/conftool/dbconfig/20220307-104759-ladsgroup.json
  • 10:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1143.eqiad.wmnet with reason: host reimage
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1143.eqiad.wmnet with reason: host reimage
  • 10:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
  • 10:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4036.ulsfo.wmnet with OS buster
  • 10:34 jayme: (re)started ferm on kubernetes1001
  • 10:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1084.eqiad.wmnet with reason: host reimage
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21939 and previous config saved to /var/cache/conftool/dbconfig/20220307-103323-root.json
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T300992)', diff saved to https://phabricator.wikimedia.org/P21938 and previous config saved to /var/cache/conftool/dbconfig/20220307-103253-ladsgroup.json
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1143.eqiad.wmnet with OS bullseye
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T300992)', diff saved to https://phabricator.wikimedia.org/P21937 and previous config saved to /var/cache/conftool/dbconfig/20220307-102737-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300992)', diff saved to https://phabricator.wikimedia.org/P21936 and previous config saved to /var/cache/conftool/dbconfig/20220307-102730-ladsgroup.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312', diff saved to https://phabricator.wikimedia.org/P21935 and previous config saved to /var/cache/conftool/dbconfig/20220307-102209-marostegui.json
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T302950)', diff saved to https://phabricator.wikimedia.org/P21934 and previous config saved to /var/cache/conftool/dbconfig/20220307-102158-ladsgroup.json
  • 10:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21933 and previous config saved to /var/cache/conftool/dbconfig/20220307-102129-ladsgroup.json
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21932 and previous config saved to /var/cache/conftool/dbconfig/20220307-102054-marostegui.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162', diff saved to https://phabricator.wikimedia.org/P21931 and previous config saved to /var/cache/conftool/dbconfig/20220307-101824-marostegui.json
  • 10:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1084.eqiad.wmnet with OS buster
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21930 and previous config saved to /var/cache/conftool/dbconfig/20220307-101657-root.json
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21929 and previous config saved to /var/cache/conftool/dbconfig/20220307-101225-ladsgroup.json
  • 10:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2036.codfw.wmnet with OS buster
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21928 and previous config saved to /var/cache/conftool/dbconfig/20220307-100624-ladsgroup.json
  • 10:04 vgutierrez: pool cp2036 with HAProxy as TLS termination layer - T290005
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21927 and previous config saved to /var/cache/conftool/dbconfig/20220307-100153-root.json
  • 10:00 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:58 ayounsi@cumin1001: START - Cookbook sre.network.prepare-upgrade
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P21926 and previous config saved to /var/cache/conftool/dbconfig/20220307-095720-ladsgroup.json
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21925 and previous config saved to /var/cache/conftool/dbconfig/20220307-095120-ladsgroup.json
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21924 and previous config saved to /var/cache/conftool/dbconfig/20220307-095111-root.json
  • 09:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21923 and previous config saved to /var/cache/conftool/dbconfig/20220307-094649-root.json
  • 09:46 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2036.codfw.wmnet with reason: host reimage
  • 09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T300992)', diff saved to https://phabricator.wikimedia.org/P21922 and previous config saved to /var/cache/conftool/dbconfig/20220307-094216-ladsgroup.json
  • 09:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T300992)', diff saved to https://phabricator.wikimedia.org/P21921 and previous config saved to /var/cache/conftool/dbconfig/20220307-093701-ladsgroup.json
  • 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300992)', diff saved to https://phabricator.wikimedia.org/P21920 and previous config saved to /var/cache/conftool/dbconfig/20220307-093653-ladsgroup.json
  • 09:36 jynus: updated non-A wikipedia.org DNS records T302617
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21919 and previous config saved to /var/cache/conftool/dbconfig/20220307-093615-ladsgroup.json
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21918 and previous config saved to /var/cache/conftool/dbconfig/20220307-093607-root.json
  • 09:35 jynus: updated non-A wikipedia.org DNS records
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21917 and previous config saved to /var/cache/conftool/dbconfig/20220307-093146-root.json
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21916 and previous config saved to /var/cache/conftool/dbconfig/20220307-093032-ladsgroup.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123', diff saved to https://phabricator.wikimedia.org/P21915 and previous config saved to /var/cache/conftool/dbconfig/20220307-093013-marostegui.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21914 and previous config saved to /var/cache/conftool/dbconfig/20220307-092924-root.json
  • 09:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2036.codfw.wmnet with OS buster
  • 09:22 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@19520c1]: (no justification provided) (duration: 00m 04s)
  • 09:22 ebysans@deploy1002: Started deploy [airflow-dags/analytics@19520c1]: (no justification provided)
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21913 and previous config saved to /var/cache/conftool/dbconfig/20220307-092148-ladsgroup.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 60%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21912 and previous config saved to /var/cache/conftool/dbconfig/20220307-092103-root.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T300381)', diff saved to https://phabricator.wikimedia.org/P21911 and previous config saved to /var/cache/conftool/dbconfig/20220307-092034-marostegui.json
  • 09:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21910 and previous config saved to /var/cache/conftool/dbconfig/20220307-091527-ladsgroup.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21909 and previous config saved to /var/cache/conftool/dbconfig/20220307-091421-root.json
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P21908 and previous config saved to /var/cache/conftool/dbconfig/20220307-090644-ladsgroup.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21907 and previous config saved to /var/cache/conftool/dbconfig/20220307-090600-root.json
  • 09:01 dcausse: restarting blazegraph on wdqs1013 (jvm stuck for 6hours)
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21906 and previous config saved to /var/cache/conftool/dbconfig/20220307-090021-ladsgroup.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21905 and previous config saved to /var/cache/conftool/dbconfig/20220307-085917-root.json
  • 08:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T300992)', diff saved to https://phabricator.wikimedia.org/P21904 and previous config saved to /var/cache/conftool/dbconfig/20220307-085139-ladsgroup.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 40%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21903 and previous config saved to /var/cache/conftool/dbconfig/20220307-085056-root.json
  • 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:46 elukey: `kafka configs --alter --entity-type topics --entity-name udp_localhost-info --add-config retention.bytes=300000000000` on kafka-logging to reduce the size of the biggest topic partitions
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T300992)', diff saved to https://phabricator.wikimedia.org/P21902 and previous config saved to /var/cache/conftool/dbconfig/20220307-084641-ladsgroup.json
  • 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21901 and previous config saved to /var/cache/conftool/dbconfig/20220307-084516-ladsgroup.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21900 and previous config saved to /var/cache/conftool/dbconfig/20220307-084413-root.json
  • 08:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e3f70f6: enwiki: Deploy Growth features to 100% of users (T302846) (duration: 00m 50s)
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P21899 and previous config saved to /var/cache/conftool/dbconfig/20220307-084235-marostegui.json
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21898 and previous config saved to /var/cache/conftool/dbconfig/20220307-084219-root.json
  • 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300992)', diff saved to https://phabricator.wikimedia.org/P21897 and previous config saved to /var/cache/conftool/dbconfig/20220307-083948-ladsgroup.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21896 and previous config saved to /var/cache/conftool/dbconfig/20220307-083553-root.json
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21895 and previous config saved to /var/cache/conftool/dbconfig/20220307-082716-root.json
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21894 and previous config saved to /var/cache/conftool/dbconfig/20220307-082443-ladsgroup.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 20%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21893 and previous config saved to /var/cache/conftool/dbconfig/20220307-082049-root.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21892 and previous config saved to /var/cache/conftool/dbconfig/20220307-081212-root.json
  • 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P21891 and previous config saved to /var/cache/conftool/dbconfig/20220307-080938-ladsgroup.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: repooling after schema change', diff saved to https://phabricator.wikimedia.org/P21890 and previous config saved to /var/cache/conftool/dbconfig/20220307-080545-root.json
  • 08:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1144.eqiad.wmnet with OS bullseye
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21889 and previous config saved to /var/cache/conftool/dbconfig/20220307-075708-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P21888 and previous config saved to /var/cache/conftool/dbconfig/20220307-075523-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21887 and previous config saved to /var/cache/conftool/dbconfig/20220307-075504-root.json
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T300992)', diff saved to https://phabricator.wikimedia.org/P21886 and previous config saved to /var/cache/conftool/dbconfig/20220307-075433-ladsgroup.json
  • 07:53 marostegui: dbmaint on db1181 s7@eqiad T276150
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P21885 and previous config saved to /var/cache/conftool/dbconfig/20220307-075120-marostegui.json
  • 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T300992)', diff saved to https://phabricator.wikimedia.org/P21884 and previous config saved to /var/cache/conftool/dbconfig/20220307-074923-ladsgroup.json
  • 07:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300992)', diff saved to https://phabricator.wikimedia.org/P21883 and previous config saved to /var/cache/conftool/dbconfig/20220307-074909-ladsgroup.json
  • 07:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1144.eqiad.wmnet with reason: host reimage
  • 07:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1144.eqiad.wmnet with reason: host reimage
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21882 and previous config saved to /var/cache/conftool/dbconfig/20220307-074001-root.json
  • 07:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21881 and previous config saved to /var/cache/conftool/dbconfig/20220307-073405-ladsgroup.json
  • 07:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1144.eqiad.wmnet with OS bullseye
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T302950)', diff saved to https://phabricator.wikimedia.org/P21880 and previous config saved to /var/cache/conftool/dbconfig/20220307-072624-ladsgroup.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21879 and previous config saved to /var/cache/conftool/dbconfig/20220307-072457-root.json
  • 07:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T302950)', diff saved to https://phabricator.wikimedia.org/P21878 and previous config saved to /var/cache/conftool/dbconfig/20220307-072453-ladsgroup.json
  • 07:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 07:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P21877 and previous config saved to /var/cache/conftool/dbconfig/20220307-071900-ladsgroup.json
  • 07:15 elukey: `elukey@ml-staging-ctrl2002:~$ sudo systemctl reset-failed ifup@ens13.service`
  • 07:14 elukey: kill tmux sessions of user 'zpapierski' on wdqs[1004,2002,2003] (puppet broken, offboarded user)
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21876 and previous config saved to /var/cache/conftool/dbconfig/20220307-071227-ladsgroup.json
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After mysql restart', diff saved to https://phabricator.wikimedia.org/P21875 and previous config saved to /var/cache/conftool/dbconfig/20220307-070953-root.json
  • 07:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:06 marostegui: dbmaint on db1179 s3@eqiad T302222
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P21874 and previous config saved to /var/cache/conftool/dbconfig/20220307-070537-marostegui.json
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T300992)', diff saved to https://phabricator.wikimedia.org/P21873 and previous config saved to /var/cache/conftool/dbconfig/20220307-070355-ladsgroup.json
  • 07:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T300992)', diff saved to https://phabricator.wikimedia.org/P21872 and previous config saved to /var/cache/conftool/dbconfig/20220307-065839-ladsgroup.json
  • 06:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300992)', diff saved to https://phabricator.wikimedia.org/P21871 and previous config saved to /var/cache/conftool/dbconfig/20220307-065832-ladsgroup.json
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21870 and previous config saved to /var/cache/conftool/dbconfig/20220307-065722-ladsgroup.json
  • 06:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:52 urbanecm: Reset authentication throttle for 217.23.37.10 via resetAuthenticationThrottle.php (T302973)
  • 06:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:49 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: 2e9fdd4: 867bb7b: Add throttle rules (T302973; T303002) (duration: 00m 49s)
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21869 and previous config saved to /var/cache/conftool/dbconfig/20220307-064327-ladsgroup.json
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21868 and previous config saved to /var/cache/conftool/dbconfig/20220307-064217-ladsgroup.json
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164', diff saved to https://phabricator.wikimedia.org/P21867 and previous config saved to /var/cache/conftool/dbconfig/20220307-062823-ladsgroup.json
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21866 and previous config saved to /var/cache/conftool/dbconfig/20220307-062713-ladsgroup.json
  • 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1164 (T300992)', diff saved to https://phabricator.wikimedia.org/P21865 and previous config saved to /var/cache/conftool/dbconfig/20220307-061318-ladsgroup.json
  • 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1164 (T300992)', diff saved to https://phabricator.wikimedia.org/P21864 and previous config saved to /var/cache/conftool/dbconfig/20220307-060819-ladsgroup.json
  • 06:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 06:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1164.eqiad.wmnet with reason: Maintenance
  • 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300992)', diff saved to https://phabricator.wikimedia.org/P21863 and previous config saved to /var/cache/conftool/dbconfig/20220307-060811-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21862 and previous config saved to /var/cache/conftool/dbconfig/20220307-055307-ladsgroup.json
  • 05:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1147.eqiad.wmnet with OS bullseye
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P21861 and previous config saved to /var/cache/conftool/dbconfig/20220307-053802-ladsgroup.json
  • 05:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1147.eqiad.wmnet with reason: host reimage
  • 05:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1147.eqiad.wmnet with reason: host reimage
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T300992)', diff saved to https://phabricator.wikimedia.org/P21860 and previous config saved to /var/cache/conftool/dbconfig/20220307-052257-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1147.eqiad.wmnet with OS bullseye
  • 05:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T300992)', diff saved to https://phabricator.wikimedia.org/P21859 and previous config saved to /var/cache/conftool/dbconfig/20220307-051807-ladsgroup.json
  • 05:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21858 and previous config saved to /var/cache/conftool/dbconfig/20220307-051537-ladsgroup.json
  • 05:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 05:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 05:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance

2022-03-04

  • 17:59 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:57 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 17:57 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:48 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 17:46 mforns@deploy1002: Finished deploy [airflow-dags/analytics@19520c1]: (no justification provided) (duration: 00m 07s)
  • 17:46 mforns@deploy1002: Started deploy [airflow-dags/analytics@19520c1]: (no justification provided)
  • 17:39 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@19520c1]: (no justification provided) (duration: 00m 08s)
  • 17:39 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@19520c1]: (no justification provided)
  • 17:09 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 08s)
  • 17:09 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 16:35 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 07s)
  • 16:35 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 16:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 10s)
  • 16:13 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 16:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21856 and previous config saved to /var/cache/conftool/dbconfig/20220304-160629-ladsgroup.json
  • 16:03 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 03s)
  • 16:03 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 15:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1086.eqiad.wmnet with OS buster
  • 15:58 vgutierrez: pool cp1086 with HAProxy as TLS termination layer - T290005
  • 15:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2038.codfw.wmnet with OS buster
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P21854 and previous config saved to /var/cache/conftool/dbconfig/20220304-155124-ladsgroup.json
  • 15:51 vgutierrez: pool cp2038 with HAProxy as TLS termination layer - T290005
  • 15:49 mforns@deploy1002: Finished deploy [airflow-dags/analytics_test@1388c61]: (no justification provided) (duration: 00m 07s)
  • 15:49 mforns@deploy1002: Started deploy [airflow-dags/analytics_test@1388c61]: (no justification provided)
  • 15:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
  • 15:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1086.eqiad.wmnet with reason: host reimage
  • 15:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P21852 and previous config saved to /var/cache/conftool/dbconfig/20220304-153619-ladsgroup.json
  • 15:34 XioNoX: blackhole IPs - T303055
  • 15:34 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2038.codfw.wmnet with reason: host reimage
  • 15:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1086.eqiad.wmnet with OS buster
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21851 and previous config saved to /var/cache/conftool/dbconfig/20220304-152114-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21850 and previous config saved to /var/cache/conftool/dbconfig/20220304-152007-ladsgroup.json
  • 15:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 15:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2079.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300992)', diff saved to https://phabricator.wikimedia.org/P21849 and previous config saved to /var/cache/conftool/dbconfig/20220304-151937-ladsgroup.json
  • 15:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2038.codfw.wmnet with OS buster
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21848 and previous config saved to /var/cache/conftool/dbconfig/20220304-150433-ladsgroup.json
  • 14:59 ebernhardson: restart elasticsearch_6@production-search-psi-eqiad.service on elastic1049 to resolve CirrusSearchJVMGCOldPoolFlatlined alert
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P21847 and previous config saved to /var/cache/conftool/dbconfig/20220304-144926-ladsgroup.json
  • 14:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3059.esams.wmnet with OS buster
  • 14:43 vgutierrez: pool cp3059 with HAProxy as TLS termination layer - T290005
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T300992)', diff saved to https://phabricator.wikimedia.org/P21846 and previous config saved to /var/cache/conftool/dbconfig/20220304-143421-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T300992)', diff saved to https://phabricator.wikimedia.org/P21845 and previous config saved to /var/cache/conftool/dbconfig/20220304-143214-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 14:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300992)', diff saved to https://phabricator.wikimedia.org/P21844 and previous config saved to /var/cache/conftool/dbconfig/20220304-143206-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P21842 and previous config saved to /var/cache/conftool/dbconfig/20220304-141701-ladsgroup.json
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P21841 and previous config saved to /var/cache/conftool/dbconfig/20220304-140156-ladsgroup.json
  • 13:49 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1302-1306].eqiad.wmnet
  • 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T300992)', diff saved to https://phabricator.wikimedia.org/P21840 and previous config saved to /var/cache/conftool/dbconfig/20220304-134651-ladsgroup.json
  • 13:45 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T300992)', diff saved to https://phabricator.wikimedia.org/P21839 and previous config saved to /var/cache/conftool/dbconfig/20220304-134443-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300992)', diff saved to https://phabricator.wikimedia.org/P21838 and previous config saved to /var/cache/conftool/dbconfig/20220304-134436-ladsgroup.json
  • 13:38 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P21837 and previous config saved to /var/cache/conftool/dbconfig/20220304-132931-ladsgroup.json
  • 13:19 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1302-1306].eqiad.wmnet
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P21836 and previous config saved to /var/cache/conftool/dbconfig/20220304-131426-ladsgroup.json
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T300992)', diff saved to https://phabricator.wikimedia.org/P21835 and previous config saved to /var/cache/conftool/dbconfig/20220304-125921-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T300992)', diff saved to https://phabricator.wikimedia.org/P21834 and previous config saved to /var/cache/conftool/dbconfig/20220304-125714-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300992)', diff saved to https://phabricator.wikimedia.org/P21833 and previous config saved to /var/cache/conftool/dbconfig/20220304-125706-ladsgroup.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P21832 and previous config saved to /var/cache/conftool/dbconfig/20220304-124201-ladsgroup.json
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P21831 and previous config saved to /var/cache/conftool/dbconfig/20220304-122656-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T300992)', diff saved to https://phabricator.wikimedia.org/P21830 and previous config saved to /var/cache/conftool/dbconfig/20220304-121152-ladsgroup.json
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T300992)', diff saved to https://phabricator.wikimedia.org/P21829 and previous config saved to /var/cache/conftool/dbconfig/20220304-120944-ladsgroup.json
  • 12:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300992)', diff saved to https://phabricator.wikimedia.org/P21828 and previous config saved to /var/cache/conftool/dbconfig/20220304-120937-ladsgroup.json
  • 12:04 jbond: enable SameSite=Strict on idp
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P21827 and previous config saved to /var/cache/conftool/dbconfig/20220304-115432-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P21826 and previous config saved to /var/cache/conftool/dbconfig/20220304-113927-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T300992)', diff saved to https://phabricator.wikimedia.org/P21825 and previous config saved to /var/cache/conftool/dbconfig/20220304-112422-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T300992)', diff saved to https://phabricator.wikimedia.org/P21824 and previous config saved to /var/cache/conftool/dbconfig/20220304-112214-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 11:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
  • 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300992)', diff saved to https://phabricator.wikimedia.org/P21823 and previous config saved to /var/cache/conftool/dbconfig/20220304-112207-ladsgroup.json
  • 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3059.esams.wmnet with reason: host reimage
  • 11:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4024.ulsfo.wmnet with OS buster
  • 11:09 vgutierrez: pool cp4024 with HAProxy as TLS termination layer - T290005
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P21822 and previous config saved to /var/cache/conftool/dbconfig/20220304-110702-ladsgroup.json
  • 10:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4024.ulsfo.wmnet with reason: host reimage
  • 10:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4024.ulsfo.wmnet with reason: host reimage
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P21821 and previous config saved to /var/cache/conftool/dbconfig/20220304-105157-ladsgroup.json
  • 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3059.esams.wmnet with OS buster
  • 10:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4024.ulsfo.wmnet with OS buster
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T300992)', diff saved to https://phabricator.wikimedia.org/P21820 and previous config saved to /var/cache/conftool/dbconfig/20220304-103652-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T300992)', diff saved to https://phabricator.wikimedia.org/P21819 and previous config saved to /var/cache/conftool/dbconfig/20220304-103444-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300992)', diff saved to https://phabricator.wikimedia.org/P21818 and previous config saved to /var/cache/conftool/dbconfig/20220304-103437-ladsgroup.json
  • 10:29 vgutierrez: pool cp5004 with HAProxy as TLS termination layer - T290005
  • 10:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5004.eqsin.wmnet with OS buster
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21817 and previous config saved to /var/cache/conftool/dbconfig/20220304-101932-ladsgroup.json
  • 10:08 aqu@deploy1002: Finished deploy [airflow-dags/analytics@1c8384f]: AF //tion default args (duration: 00m 07s)
  • 10:08 aqu@deploy1002: Started deploy [airflow-dags/analytics@1c8384f]: AF //tion default args
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21816 and previous config saved to /var/cache/conftool/dbconfig/20220304-100427-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T300992)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20220304-094918-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T300992)', diff saved to https://phabricator.wikimedia.org/P21815 and previous config saved to /var/cache/conftool/dbconfig/20220304-094710-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21814 and previous config saved to /var/cache/conftool/dbconfig/20220304-094702-ladsgroup.json
  • 09:43 vgutierrez: restart varnish on cp3056
  • 09:41 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5004.eqsin.wmnet with reason: host reimage
  • 09:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5004.eqsin.wmnet with reason: host reimage
  • 09:37 vgutierrez: restart varnish on cp3058
  • 09:33 vgutierrez: restart varnish on cp3060
  • 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21813 and previous config saved to /var/cache/conftool/dbconfig/20220304-093157-ladsgroup.json
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21812 and previous config saved to /var/cache/conftool/dbconfig/20220304-091652-ladsgroup.json
  • 09:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5004.eqsin.wmnet with OS buster
  • 09:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts rdb[1005-1006].eqiad.wmnet
  • 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21811 and previous config saved to /var/cache/conftool/dbconfig/20220304-090147-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T300992)', diff saved to https://phabricator.wikimedia.org/P21810 and previous config saved to /var/cache/conftool/dbconfig/20220304-085939-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300992)', diff saved to https://phabricator.wikimedia.org/P21809 and previous config saved to /var/cache/conftool/dbconfig/20220304-085932-ladsgroup.json
  • 08:56 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21808 and previous config saved to /var/cache/conftool/dbconfig/20220304-084427-ladsgroup.json
  • 08:34 akosiaris: T303027 depool mw130[2-6]. Old jobrunners/videoscalers, being decommisioned
  • 08:33 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=mw130[2-6].eqiad.wmnet
  • 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21807 and previous config saved to /var/cache/conftool/dbconfig/20220304-082922-ladsgroup.json
  • 08:23 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 08:19 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission for hosts rdb[1005-1006].eqiad.wmnet
  • 08:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T300992)', diff saved to https://phabricator.wikimedia.org/P21806 and previous config saved to /var/cache/conftool/dbconfig/20220304-081417-ladsgroup.json
  • 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T300992)', diff saved to https://phabricator.wikimedia.org/P21805 and previous config saved to /var/cache/conftool/dbconfig/20220304-081210-ladsgroup.json
  • 08:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:27 XioNoX: push pfw policies - T303003
  • 01:35 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 01:34 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 01:34 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 01:33 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 01:33 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 01:32 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 01:32 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 01:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 01:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 01:31 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 01:31 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 01:30 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 01:30 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 01:29 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 01:29 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 01:27 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 01:27 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 01:25 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 01:25 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 01:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 01:24 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 01:24 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 01:24 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 01:23 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 01:23 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 01:22 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply

2022-03-03

  • 21:35 brennen: end of UTC late backport & config window / training
  • 21:30 brennen@deploy1002: Finished scap: Config: Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956) (duration: 01m 33s)
  • 21:28 brennen@deploy1002: Started scap: Config: Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956)
  • 21:28 brennen@deploy1002: Synchronized multiversion/MWRealm.php: Config: Write the same value to $wmgDatacenter(s) as to $wmfDatacenter(s) (T45956) (duration: 00m 48s)
  • 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:35 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.38.0-wmf.24 refs T300200
  • 19:32 brennen: 1.38.0-wmf.24 train (T300200): no current blockers; proceeding to all wikis
  • 19:30 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/skins/Vector/includes/SkinVector.php: Backport: Unset data-toc in SkinVector (T302461) (duration: 00m 49s)
  • 19:23 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/skins/MinervaNeue/resources/skins.minerva.base.styles/userMenu.less: Backport: Remove user navigation min width and width (T302753) (duration: 00m 51s)
  • 19:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 18:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 18:50 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 18:39 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 18:32 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:29 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 18:11 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: updating wmf-puppet-dashboard (duration: 09m 12s)
  • 18:02 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 18:02 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: updating wmf-puppet-dashboard
  • 17:59 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:58 krinkle@deploy1002: Synchronized wmf-config/: Idf7b21159423 (duration: 00m 51s)
  • 17:49 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:48 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:47 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T302950)', diff saved to https://phabricator.wikimedia.org/P21802 and previous config saved to /var/cache/conftool/dbconfig/20220303-173630-ladsgroup.json
  • 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21801 and previous config saved to /var/cache/conftool/dbconfig/20220303-172125-ladsgroup.json
  • 17:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 17:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21800 and previous config saved to /var/cache/conftool/dbconfig/20220303-170621-ladsgroup.json
  • 17:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 17:04 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 17:03 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 16:53 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 16:53 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 16:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T302950)', diff saved to https://phabricator.wikimedia.org/P21799 and previous config saved to /var/cache/conftool/dbconfig/20220303-165116-ladsgroup.json
  • 16:30 godog: roll-restart logstash to pick up config changes - T291946
  • 16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1148.eqiad.wmnet with OS bullseye
  • 16:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1148.eqiad.wmnet with reason: host reimage
  • 15:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1148.eqiad.wmnet with reason: host reimage
  • 15:53 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:47 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1148.eqiad.wmnet with OS bullseye
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T302950)', diff saved to https://phabricator.wikimedia.org/P21798 and previous config saved to /var/cache/conftool/dbconfig/20220303-152242-ladsgroup.json
  • 15:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 15:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 15:21 moritzm: restarting FPM/Apache on mw job runners to pick up expat security updates
  • 15:08 mutante: T296022 - phabricator - disabled git cloning over ssh for 'stewardscripts' repo - stewards have been asked via mailing list
  • 14:48 godog: force a puppet run on cp6011 to unblock icinga and disable puppet again, cc bblack
  • 14:48 Lucas_WMDE: UTC afternoon backport window done
  • 14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport: GLAM event: Update landing page content (T301097) (full sync because of i18n change) (duration: 09m 45s)
  • 14:37 lucaswerkmeister-wmde@deploy1002: Started scap: Backport: GLAM event: Update landing page content (T301097) (full sync because of i18n change)
  • 14:26 XioNoX: merge Icinga: use parent switch shortname
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 14:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 14:04 volans: upgraded spicerack to v2.1.0 on cumin1001/cumin2002
  • 14:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T302950)', diff saved to https://phabricator.wikimedia.org/P21794 and previous config saved to /var/cache/conftool/dbconfig/20220303-135737-ladsgroup.json
  • 13:54 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:54 akosiaris: switch changeprop, changeprop-jobqueue to use rdb1011. T281217
  • 13:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:53 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 13:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 13:53 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 13:52 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:52 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:52 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 13:51 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 13:45 akosiaris: roll restart ores uwsgi and celery for rdb1005 decommissioning. T281217
  • 13:44 akosiaris@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21793 and previous config saved to /var/cache/conftool/dbconfig/20220303-134232-ladsgroup.json
  • 13:20 moritzm: restarting FPM/Apache on mw app servers to pick up expat security updates
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T302950)', diff saved to https://phabricator.wikimedia.org/P21791 and previous config saved to /var/cache/conftool/dbconfig/20220303-131223-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1149.eqiad.wmnet with OS bullseye
  • 12:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1149.eqiad.wmnet with reason: host reimage
  • 12:47 hashar: Upgrading Quibble on CI Jenkins jobs from 1.3.0 to 1.4.3 https://gerrit.wikimedia.org/r/c/integration/config/+/767749/
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1149.eqiad.wmnet with reason: host reimage
  • 12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1149.eqiad.wmnet with OS bullseye
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T302950)', diff saved to https://phabricator.wikimedia.org/P21790 and previous config saved to /var/cache/conftool/dbconfig/20220303-123030-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 11:49 volans: uploaded spicerack_2.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 11:33 kormat@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repooling to 100% after incident', diff saved to https://phabricator.wikimedia.org/P21789 and previous config saved to /var/cache/conftool/dbconfig/20220303-113304-kormat.json
  • 11:18 kormat@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repooling to 100% after incident', diff saved to https://phabricator.wikimedia.org/P21788 and previous config saved to /var/cache/conftool/dbconfig/20220303-111801-kormat.json
  • 11:02 kormat@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repooling to 100% after incident', diff saved to https://phabricator.wikimedia.org/P21787 and previous config saved to /var/cache/conftool/dbconfig/20220303-110257-kormat.json
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T302950)', diff saved to https://phabricator.wikimedia.org/P21786 and previous config saved to /var/cache/conftool/dbconfig/20220303-110224-ladsgroup.json
  • 11:02 kormat@cumin1001: dbctl commit (dc=all): 'Start repooling db1126 to full weight', diff saved to https://phabricator.wikimedia.org/P21785 and previous config saved to /var/cache/conftool/dbconfig/20220303-110220-kormat.json
  • 10:58 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.23/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Backport: rdbms: Change getConnectionRef to return with getLazyConnectionRef (T255493) (duration: 00m 50s)
  • 10:50 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.24/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Backport: rdbms: Change getConnectionRef to return with getLazyConnectionRef (T255493) (duration: 00m 51s)
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21784 and previous config saved to /var/cache/conftool/dbconfig/20220303-104713-ladsgroup.json
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21783 and previous config saved to /var/cache/conftool/dbconfig/20220303-103659-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21782 and previous config saved to /var/cache/conftool/dbconfig/20220303-103209-ladsgroup.json
  • 10:30 XioNoX: repool ulsfo
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P21781 and previous config saved to /var/cache/conftool/dbconfig/20220303-102154-ladsgroup.json
  • 10:18 elukey: kubectl cordon kubernetes200[1-4] to avoid scheduling pods on nodes that will be decommed during the next weeks - T302208
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T302950)', diff saved to https://phabricator.wikimedia.org/P21780 and previous config saved to /var/cache/conftool/dbconfig/20220303-101704-ladsgroup.json
  • 10:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1160.eqiad.wmnet with OS bullseye
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P21779 and previous config saved to /var/cache/conftool/dbconfig/20220303-100649-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1160.eqiad.wmnet with reason: host reimage
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21778 and previous config saved to /var/cache/conftool/dbconfig/20220303-095145-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1160.eqiad.wmnet with reason: host reimage
  • 09:37 aqu@deploy1002: Finished deploy [airflow-dags/analytics_test@1c8384f]: AF //tion default args (duration: 00m 09s)
  • 09:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1160.eqiad.wmnet with OS bullseye
  • 09:37 aqu@deploy1002: Started deploy [airflow-dags/analytics_test@1c8384f]: AF //tion default args
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T302950)', diff saved to https://phabricator.wikimedia.org/P21777 and previous config saved to /var/cache/conftool/dbconfig/20220303-093306-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2073 (T302950)', diff saved to https://phabricator.wikimedia.org/P21775 and previous config saved to /var/cache/conftool/dbconfig/20220303-091340-ladsgroup.json
  • 09:12 moritzm: restarting FPM/Apache on mw API servers to pick up expat security updates
  • 09:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2073.codfw.wmnet with OS bullseye
  • 09:01 moritzm: restarting superset on an-tool1010 to pick up expat security updates
  • 08:52 taavi: UTC morning deploys done
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21774 and previous config saved to /var/cache/conftool/dbconfig/20220303-085125-ladsgroup.json
  • 08:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21773 and previous config saved to /var/cache/conftool/dbconfig/20220303-085118-ladsgroup.json
  • 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2073.codfw.wmnet with reason: host reimage
  • 08:48 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GLAM event: Update wgGECampaigns and wgGECampaignTopics (T301029) (duration: 00m 51s)
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2073.codfw.wmnet with reason: host reimage
  • 08:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21772 and previous config saved to /var/cache/conftool/dbconfig/20220303-083613-ladsgroup.json
  • 08:34 moritzm: installing expat security updates
  • 08:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2073.codfw.wmnet with OS bullseye
  • 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2073 (T302950)', diff saved to https://phabricator.wikimedia.org/P21771 and previous config saved to /var/cache/conftool/dbconfig/20220303-082842-ladsgroup.json
  • 08:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance
  • 08:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2073.codfw.wmnet with reason: Maintenance
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2090 (T302950)', diff saved to https://phabricator.wikimedia.org/P21770 and previous config saved to /var/cache/conftool/dbconfig/20220303-082656-ladsgroup.json
  • 08:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P21768 and previous config saved to /var/cache/conftool/dbconfig/20220303-082108-ladsgroup.json
  • 08:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4034.ulsfo.wmnet with OS buster
  • 08:18 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add centralauth-suppress to steward and wmf-supportsafety at metawiki (T302675) (duration: 00m 50s)
  • 08:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2090.codfw.wmnet with OS bullseye
  • 08:13 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: fawiki: Remove the Book namespace (T302957) (duration: 00m 51s)
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21767 and previous config saved to /var/cache/conftool/dbconfig/20220303-080603-ladsgroup.json
  • 08:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2090.codfw.wmnet with reason: host reimage
  • 07:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2090.codfw.wmnet with reason: host reimage
  • 07:57 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4034.ulsfo.wmnet with reason: host reimage
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21766 and previous config saved to /var/cache/conftool/dbconfig/20220303-075534-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 07:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 07:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4034.ulsfo.wmnet with reason: host reimage
  • 07:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2090.codfw.wmnet with OS bullseye
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2090 (T302950)', diff saved to https://phabricator.wikimedia.org/P21765 and previous config saved to /var/cache/conftool/dbconfig/20220303-074209-ladsgroup.json
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2090.codfw.wmnet with reason: Maintenance
  • 07:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300992)', diff saved to https://phabricator.wikimedia.org/P21764 and previous config saved to /var/cache/conftool/dbconfig/20220303-073920-ladsgroup.json
  • 07:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 07:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21763 and previous config saved to /var/cache/conftool/dbconfig/20220303-072415-ladsgroup.json
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T302950)', diff saved to https://phabricator.wikimedia.org/P21762 and previous config saved to /var/cache/conftool/dbconfig/20220303-071800-ladsgroup.json
  • 07:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2106.codfw.wmnet with OS bullseye
  • 07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P21761 and previous config saved to /var/cache/conftool/dbconfig/20220303-070910-ladsgroup.json
  • 06:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2106.codfw.wmnet with reason: host reimage
  • 06:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T300992)', diff saved to https://phabricator.wikimedia.org/P21760 and previous config saved to /var/cache/conftool/dbconfig/20220303-065405-ladsgroup.json
  • 06:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2106.codfw.wmnet with reason: host reimage
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T300992)', diff saved to https://phabricator.wikimedia.org/P21759 and previous config saved to /var/cache/conftool/dbconfig/20220303-064945-ladsgroup.json
  • 06:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 06:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300992)', diff saved to https://phabricator.wikimedia.org/P21758 and previous config saved to /var/cache/conftool/dbconfig/20220303-064937-ladsgroup.json
  • 06:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2106.codfw.wmnet with OS bullseye
  • 06:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T302950)', diff saved to https://phabricator.wikimedia.org/P21757 and previous config saved to /var/cache/conftool/dbconfig/20220303-063514-ladsgroup.json
  • 06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21756 and previous config saved to /var/cache/conftool/dbconfig/20220303-063433-ladsgroup.json
  • 06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T302950)', diff saved to https://phabricator.wikimedia.org/P21755 and previous config saved to /var/cache/conftool/dbconfig/20220303-063350-ladsgroup.json
  • 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2119.codfw.wmnet with OS bullseye
  • 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P21754 and previous config saved to /var/cache/conftool/dbconfig/20220303-061928-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2119.codfw.wmnet with reason: host reimage
  • 06:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2119.codfw.wmnet with reason: host reimage
  • 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T300992)', diff saved to https://phabricator.wikimedia.org/P21753 and previous config saved to /var/cache/conftool/dbconfig/20220303-060423-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T300992)', diff saved to https://phabricator.wikimedia.org/P21752 and previous config saved to /var/cache/conftool/dbconfig/20220303-060006-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300992)', diff saved to https://phabricator.wikimedia.org/P21751 and previous config saved to /var/cache/conftool/dbconfig/20220303-055959-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2119.codfw.wmnet with OS bullseye
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T302950)', diff saved to https://phabricator.wikimedia.org/P21750 and previous config saved to /var/cache/conftool/dbconfig/20220303-054657-ladsgroup.json
  • 05:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 05:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 05:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21749 and previous config saved to /var/cache/conftool/dbconfig/20220303-054454-ladsgroup.json
  • 05:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T302950)', diff saved to https://phabricator.wikimedia.org/P21748 and previous config saved to /var/cache/conftool/dbconfig/20220303-053324-ladsgroup.json
  • 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P21747 and previous config saved to /var/cache/conftool/dbconfig/20220303-052949-ladsgroup.json
  • 05:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2136.codfw.wmnet with OS bullseye
  • 05:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T300992)', diff saved to https://phabricator.wikimedia.org/P21746 and previous config saved to /var/cache/conftool/dbconfig/20220303-051444-ladsgroup.json
  • 04:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2136.codfw.wmnet with reason: host reimage
  • 04:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2136.codfw.wmnet with reason: host reimage
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T300992)', diff saved to https://phabricator.wikimedia.org/P21745 and previous config saved to /var/cache/conftool/dbconfig/20220303-044933-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 04:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300992)', diff saved to https://phabricator.wikimedia.org/P21744 and previous config saved to /var/cache/conftool/dbconfig/20220303-044926-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2136.codfw.wmnet with OS bullseye
  • 04:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T302950)', diff saved to https://phabricator.wikimedia.org/P21743 and previous config saved to /var/cache/conftool/dbconfig/20220303-043942-ladsgroup.json
  • 04:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 04:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 04:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T302950)', diff saved to https://phabricator.wikimedia.org/P21742 and previous config saved to /var/cache/conftool/dbconfig/20220303-043759-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P21741 and previous config saved to /var/cache/conftool/dbconfig/20220303-043421-ladsgroup.json
  • 04:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2140.codfw.wmnet with OS bullseye
  • 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P21740 and previous config saved to /var/cache/conftool/dbconfig/20220303-041916-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2140.codfw.wmnet with reason: host reimage
  • 04:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2140.codfw.wmnet with reason: host reimage
  • 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T300992)', diff saved to https://phabricator.wikimedia.org/P21739 and previous config saved to /var/cache/conftool/dbconfig/20220303-040412-ladsgroup.json
  • 03:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T300992)', diff saved to https://phabricator.wikimedia.org/P21738 and previous config saved to /var/cache/conftool/dbconfig/20220303-035954-ladsgroup.json
  • 03:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 03:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 03:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 03:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2140.codfw.wmnet with OS bullseye
  • 03:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T302950)', diff saved to https://phabricator.wikimedia.org/P21737 and previous config saved to /var/cache/conftool/dbconfig/20220303-035328-ladsgroup.json
  • 03:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 03:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 10 hosts with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 10 hosts with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21736 and previous config saved to /var/cache/conftool/dbconfig/20220303-035134-ladsgroup.json
  • 03:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2147.codfw.wmnet with OS bullseye
  • 03:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21735 and previous config saved to /var/cache/conftool/dbconfig/20220303-033628-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2147.codfw.wmnet with reason: host reimage
  • 03:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2147.codfw.wmnet with reason: host reimage
  • 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21734 and previous config saved to /var/cache/conftool/dbconfig/20220303-032123-ladsgroup.json
  • 03:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db2147.codfw.wmnet with OS bullseye
  • 03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21733 and previous config saved to /var/cache/conftool/dbconfig/20220303-030618-ladsgroup.json
  • 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T302950)', diff saved to https://phabricator.wikimedia.org/P21732 and previous config saved to /var/cache/conftool/dbconfig/20220303-030518-ladsgroup.json
  • 03:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 03:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T300992)', diff saved to https://phabricator.wikimedia.org/P21731 and previous config saved to /var/cache/conftool/dbconfig/20220303-025500-ladsgroup.json
  • 02:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 02:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 01:42 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on datahubsearch[1001-1003].eqiad.wmnet with reason: Still having errors setting up opensearch
  • 01:42 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on datahubsearch[1001-1003].eqiad.wmnet with reason: Still having errors setting up opensearch
  • 00:31 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dumpsdata1007.eqiad.wmnet
  • 00:31 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:25 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 00:21 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts dumpsdata1007.eqiad.wmnet

2022-03-02

  • 23:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 23:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 23:32 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 23:25 ryankemper: T276198 Re-enabled puppet across fleet: `ryankemper@cumin1001:~$ sudo -E cumin 'R:Elasticsearch::instance' 'enable-puppet "deploy fix from T276198"'`
  • 23:21 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 23:21 ryankemper: T276198 https://gerrit.wikimedia.org/r/c/operations/puppet/+/767600 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/767603/ fixed all the problems. Re-enabling puppet on elastic*, cloudelastic*, and relforge* shortly
  • 23:15 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 23:08 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 22:56 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 22:55 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 22:55 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 22:54 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 22:54 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 22:52 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 22:52 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 22:51 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 22:51 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 22:50 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 22:50 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 22:49 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 22:49 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 22:48 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 22:48 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 22:47 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 22:47 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 22:46 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 22:46 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 22:45 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 22:45 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 22:43 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 22:43 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 22:43 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 22:43 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 22:42 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 22:42 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 22:41 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 22:21 ryankemper: T276198 Downtimed `elastic1052` for 2 hours while troubleshooting
  • 22:16 ryankemper: T276198 Testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/766876/ on `elastic1052`; elasticsearch service fails to start. It's expecting to find `/etc/tmpfiles.d/elasticsearch-production-search-psi-eqiad.conf` but the actual filename is `elasticsearch-production-search-psi-eqiad-conf.conf`. Not sure why that trailing `-conf` is there in the filename. It doesn't look like something `systemd::tmpfile` is doing.
  • 22:05 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 21:59 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/Linter/includes/Hooks.php: Backport: Hooks.php: Check for non-array $tags (T302918) (duration: 00m 50s)
  • 21:53 ryankemper: T276198 Disabled puppet across all of elastic*, cloudelastic*, and relforge* to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/766876/ on a single elastic host
  • 21:44 mutante: rolling out scap 4.4.2 on 'all' T302919
  • 21:36 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 21:19 dancy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wmf-config: Undeploy the fawiki test survey from production (T300291) (duration: 00m 50s)
  • 21:13 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 21:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: testing scap 4.4.2
  • 21:05 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 21:00 mutante: deploy1002 - upgraded scap to 4.4.2-1 T302919
  • 20:48 mutante: running test-deploy to devcluster (restbase) to test new scap version, succesful and then rolled back, as the docs say T302919
  • 20:48 dzahn@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided) (duration: 00m 41s)
  • 20:47 dzahn@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): (no justification provided)
  • 20:44 mutante: testec 'scap pull' still worked on mwdebug1001; rolling out scap 4.4.2 to A:restbase-canary (T302919)
  • 20:38 mutante: rolling out scap 4.4.2 to A:mw-canary or A:parsoid-canary or A:mw-jobrunner-canary (T302919)
  • 20:20 robh@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dumpsdata1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:11 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:03 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:57 brennen@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
  • 19:53 brennen@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
  • 19:47 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:46 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/extensions/ApiFeatureUsage: Backport: Add a non-namespaced alias for ApiFeatureUsageQueryEngineElastica (T302907) (duration: 00m 50s)
  • 19:45 robh@cumin1001: START - Cookbook sre.hosts.provision for host dumpsdata1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:36 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:33 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 19:30 mutante: stopped icinga-wm
  • 19:14 brennen@deploy1002: Synchronized php: group1 wikis to 1.38.0-wmf.24 refs T300200 (duration: 00m 50s)
  • 19:13 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.38.0-wmf.24 refs T300200
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21729 and previous config saved to /var/cache/conftool/dbconfig/20220302-191323-ladsgroup.json
  • 19:10 brennen: 1.38.0-wmf.24 train (T300200): no current blockers; proceeding to group1
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21728 and previous config saved to /var/cache/conftool/dbconfig/20220302-185819-ladsgroup.json
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P21727 and previous config saved to /var/cache/conftool/dbconfig/20220302-184314-ladsgroup.json
  • 18:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21726 and previous config saved to /var/cache/conftool/dbconfig/20220302-182809-ladsgroup.json
  • 18:26 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T300992)', diff saved to https://phabricator.wikimedia.org/P21725 and previous config saved to /var/cache/conftool/dbconfig/20220302-182153-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21724 and previous config saved to /var/cache/conftool/dbconfig/20220302-182145-ladsgroup.json
  • 18:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 18:14 rzl@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 18:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 18:13 rzl@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 18:13 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 18:13 rzl@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 18:13 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:12 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 18:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 18:11 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 18:11 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 18:11 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 18:11 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 18:10 rzl@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 18:10 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 18:09 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 18:09 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 18:09 rzl@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21723 and previous config saved to /var/cache/conftool/dbconfig/20220302-180640-ladsgroup.json
  • 17:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P21722 and previous config saved to /var/cache/conftool/dbconfig/20220302-175136-ladsgroup.json
  • 17:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21721 and previous config saved to /var/cache/conftool/dbconfig/20220302-173631-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T300992)', diff saved to https://phabricator.wikimedia.org/P21720 and previous config saved to /var/cache/conftool/dbconfig/20220302-173112-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21719 and previous config saved to /var/cache/conftool/dbconfig/20220302-173104-ladsgroup.json
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21718 and previous config saved to /var/cache/conftool/dbconfig/20220302-171559-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P21717 and previous config saved to /var/cache/conftool/dbconfig/20220302-170055-ladsgroup.json
  • 16:51 vgutierrez: pool cp3061 running HAProxy as TLS termination layer - T290005 T271421
  • 16:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3061.esams.wmnet with OS buster
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21716 and previous config saved to /var/cache/conftool/dbconfig/20220302-164550-ladsgroup.json
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T300992)', diff saved to https://phabricator.wikimedia.org/P21715 and previous config saved to /var/cache/conftool/dbconfig/20220302-163329-ladsgroup.json
  • 16:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 16:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 16:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21714 and previous config saved to /var/cache/conftool/dbconfig/20220302-163322-ladsgroup.json
  • 16:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
  • 16:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3061.esams.wmnet with reason: host reimage
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P21713 and previous config saved to /var/cache/conftool/dbconfig/20220302-161817-ladsgroup.json
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P21711 and previous config saved to /var/cache/conftool/dbconfig/20220302-160312-ladsgroup.json
  • 15:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3061.esams.wmnet with OS buster
  • 15:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5014.eqsin.wmnet with OS buster
  • 15:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21710 and previous config saved to /var/cache/conftool/dbconfig/20220302-154807-ladsgroup.json
  • 15:47 vgutierrez: pool cp5014 running HAProxy as TLS termination layer - T290005 T271421
  • 15:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 15:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T300992)', diff saved to https://phabricator.wikimedia.org/P21709 and previous config saved to /var/cache/conftool/dbconfig/20220302-154039-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21708 and previous config saved to /var/cache/conftool/dbconfig/20220302-154026-ladsgroup.json
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21707 and previous config saved to /var/cache/conftool/dbconfig/20220302-152519-ladsgroup.json
  • 15:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
  • 15:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P21706 and previous config saved to /var/cache/conftool/dbconfig/20220302-151015-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21705 and previous config saved to /var/cache/conftool/dbconfig/20220302-145510-ladsgroup.json
  • 14:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp5014.eqsin.wmnet with OS buster
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T300992)', diff saved to https://phabricator.wikimedia.org/P21704 and previous config saved to /var/cache/conftool/dbconfig/20220302-145054-ladsgroup.json
  • 14:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21703 and previous config saved to /var/cache/conftool/dbconfig/20220302-145046-ladsgroup.json
  • 14:41 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
  • 14:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 14:38 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
  • 14:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21702 and previous config saved to /var/cache/conftool/dbconfig/20220302-143541-ladsgroup.json
  • 14:34 vgutierrez@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4034.ulsfo.wmnet with OS buster
  • 14:27 moritzm: rebalance VMs in Ganeti row A after adding new servers (and decomissioning old ones)
  • 14:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 14:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
  • 14:21 ladsgroup@deploy1002: Synchronized php-1.38.0-wmf.23/extensions/FlaggedRevs/modules/ext.flaggedRevs.review/review.js: Backport: ext.flaggedRevs.review: Restore tolerance when setting "disabled" prop (duration: 00m 52s)
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P21701 and previous config saved to /var/cache/conftool/dbconfig/20220302-142037-ladsgroup.json
  • 14:13 mmandere: pool cp6013
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21700 and previous config saved to /var/cache/conftool/dbconfig/20220302-140532-ladsgroup.json
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21699 and previous config saved to /var/cache/conftool/dbconfig/20220302-140112-ladsgroup.json
  • 14:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21698 and previous config saved to /var/cache/conftool/dbconfig/20220302-140105-ladsgroup.json
  • 13:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P21697 and previous config saved to /var/cache/conftool/dbconfig/20220302-134600-ladsgroup.json
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P21696 and previous config saved to /var/cache/conftool/dbconfig/20220302-133055-ladsgroup.json
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21695 and previous config saved to /var/cache/conftool/dbconfig/20220302-131550-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T300992)', diff saved to https://phabricator.wikimedia.org/P21694 and previous config saved to /var/cache/conftool/dbconfig/20220302-131032-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21693 and previous config saved to /var/cache/conftool/dbconfig/20220302-131024-ladsgroup.json
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21692 and previous config saved to /var/cache/conftool/dbconfig/20220302-125519-ladsgroup.json
  • 12:47 reedy@deploy1002: Finished scap: Fix MassMessage translations T302840 (duration: 01m 50s)
  • 12:45 reedy@deploy1002: Started scap: Fix MassMessage translations T302840
  • 12:43 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4034.ulsfo.wmnet with OS buster
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P21690 and previous config saved to /var/cache/conftool/dbconfig/20220302-124014-ladsgroup.json
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21689 and previous config saved to /var/cache/conftool/dbconfig/20220302-122510-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T300992)', diff saved to https://phabricator.wikimedia.org/P21688 and previous config saved to /var/cache/conftool/dbconfig/20220302-122049-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 12 hosts with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 12 hosts with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21687 and previous config saved to /var/cache/conftool/dbconfig/20220302-121754-ladsgroup.json
  • 12:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp4034.ulsfo.wmnet with OS buster
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21686 and previous config saved to /var/cache/conftool/dbconfig/20220302-120250-ladsgroup.json
  • 11:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P21685 and previous config saved to /var/cache/conftool/dbconfig/20220302-114745-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21684 and previous config saved to /var/cache/conftool/dbconfig/20220302-113240-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T300992)', diff saved to https://phabricator.wikimedia.org/P21683 and previous config saved to /var/cache/conftool/dbconfig/20220302-112824-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 11:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21682 and previous config saved to /var/cache/conftool/dbconfig/20220302-112347-ladsgroup.json
  • 11:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e" (duration: 01m 29s)
  • 11:22 mbsantos: rollback maps eqiad to a previous working state to mitigate geoshape errors
  • 11:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@3dc404c] (eqiad): Merge "Update kartotherian-package to f239c6e"
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P21681 and previous config saved to /var/cache/conftool/dbconfig/20220302-110842-ladsgroup.json
  • 11:05 moritzm: installing expat security updates
  • 10:56 moritzm: restarting apache2 and mailman3-web on lists.wikimedia.org for expat security update
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P21680 and previous config saved to /var/cache/conftool/dbconfig/20220302-105336-ladsgroup.json
  • 10:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21678 and previous config saved to /var/cache/conftool/dbconfig/20220302-103832-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T300992)', diff saved to https://phabricator.wikimedia.org/P21677 and previous config saved to /var/cache/conftool/dbconfig/20220302-103407-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:20 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 10:18 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 10:15 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 10:15 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d049589] (eqiad): Revert "Temporarily increase poolsize for debugging" (duration: 01m 45s)
  • 10:14 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 10:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d049589] (eqiad): Revert "Temporarily increase poolsize for debugging"
  • 10:13 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@d049589] (codfw): Revert "Temporarily increase poolsize for debugging" (duration: 01m 36s)
  • 10:11 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@d049589] (codfw): Revert "Temporarily increase poolsize for debugging"
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21676 and previous config saved to /var/cache/conftool/dbconfig/20220302-100903-ladsgroup.json
  • 10:04 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-ctrl2002.codfw.wmnet
  • 09:56 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 09:55 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 09:55 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21675 and previous config saved to /var/cache/conftool/dbconfig/20220302-095358-ladsgroup.json
  • 09:51 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@fd6bc59] (codfw): Temporarily increase poolsize for debugging (duration: 04m 26s)
  • 09:49 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 09:49 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-ctrl2002.codfw.wmnet
  • 09:48 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-ctrl2001.codfw.wmnet
  • 09:47 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@fd6bc59] (codfw): Temporarily increase poolsize for debugging
  • 09:46 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@fd6bc59] (eqiad): Temporarily increase poolsize for debugging (duration: 02m 13s)
  • 09:44 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@fd6bc59] (eqiad): Temporarily increase poolsize for debugging
  • 09:39 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P21674 and previous config saved to /var/cache/conftool/dbconfig/20220302-093853-ladsgroup.json
  • 09:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 09:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-ctrl2001.codfw.wmnet
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21673 and previous config saved to /var/cache/conftool/dbconfig/20220302-093027-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21672 and previous config saved to /var/cache/conftool/dbconfig/20220302-092348-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T300992)', diff saved to https://phabricator.wikimedia.org/P21671 and previous config saved to /var/cache/conftool/dbconfig/20220302-092128-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21670 and previous config saved to /var/cache/conftool/dbconfig/20220302-092120-ladsgroup.json
  • 09:16 mmandere: rolling restart of varnishkafka-* on cp6*
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21669 and previous config saved to /var/cache/conftool/dbconfig/20220302-091523-ladsgroup.json
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21668 and previous config saved to /var/cache/conftool/dbconfig/20220302-090615-ladsgroup.json
  • 09:05 XioNoX: push Capirca managed labs-in firewall filter to eqiad routers
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P21667 and previous config saved to /var/cache/conftool/dbconfig/20220302-090018-ladsgroup.json
  • 08:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100', diff saved to https://phabricator.wikimedia.org/P21666 and previous config saved to /var/cache/conftool/dbconfig/20220302-085111-ladsgroup.json
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21665 and previous config saved to /var/cache/conftool/dbconfig/20220302-084513-ladsgroup.json
  • 08:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1167.eqiad.wmnet with OS bullseye
  • 08:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21664 and previous config saved to /var/cache/conftool/dbconfig/20220302-083606-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1100 (T300992)', diff saved to https://phabricator.wikimedia.org/P21663 and previous config saved to /var/cache/conftool/dbconfig/20220302-083345-ladsgroup.json
  • 08:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 08:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21662 and previous config saved to /var/cache/conftool/dbconfig/20220302-083338-ladsgroup.json
  • 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage
  • 08:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1167.eqiad.wmnet with reason: host reimage
  • 08:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21661 and previous config saved to /var/cache/conftool/dbconfig/20220302-081832-ladsgroup.json
  • 08:09 godog: test thanos 0.24.0 on thanos-fe2001 to check if https://github.com/thanos-io/thanos/issues/4531 is fixed
  • 08:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1167.eqiad.wmnet with OS bullseye
  • 08:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P21660 and previous config saved to /var/cache/conftool/dbconfig/20220302-080327-ladsgroup.json
  • 08:02 Amir1: killing all entity dumpers of wikidata in snapshot1008 (T300255)
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21659 and previous config saved to /var/cache/conftool/dbconfig/20220302-074822-ladsgroup.json
  • 07:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21658 and previous config saved to /var/cache/conftool/dbconfig/20220302-074602-ladsgroup.json
  • 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T302185)', diff saved to https://phabricator.wikimedia.org/P21657 and previous config saved to /var/cache/conftool/dbconfig/20220302-074210-ladsgroup.json
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21656 and previous config saved to /var/cache/conftool/dbconfig/20220302-073610-ladsgroup.json
  • 07:35 _joe_: filling request patterns in etcd
  • 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21655 and previous config saved to /var/cache/conftool/dbconfig/20220302-072105-ladsgroup.json
  • 07:09 _joe_: installing scap 4.4.1 everywhere T302464
  • 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P21654 and previous config saved to /var/cache/conftool/dbconfig/20220302-070601-ladsgroup.json
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21653 and previous config saved to /var/cache/conftool/dbconfig/20220302-065056-ladsgroup.json
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21652 and previous config saved to /var/cache/conftool/dbconfig/20220302-063933-ladsgroup.json
  • 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21651 and previous config saved to /var/cache/conftool/dbconfig/20220302-062428-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P21650 and previous config saved to /var/cache/conftool/dbconfig/20220302-060924-ladsgroup.json
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21649 and previous config saved to /var/cache/conftool/dbconfig/20220302-055419-ladsgroup.json
  • 05:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1101.eqiad.wmnet with OS bullseye
  • 05:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1101.eqiad.wmnet with reason: host reimage
  • 05:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1101.eqiad.wmnet with reason: host reimage
  • 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1101.eqiad.wmnet with OS bullseye
  • 05:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T302185)', diff saved to https://phabricator.wikimedia.org/P21648 and previous config saved to /var/cache/conftool/dbconfig/20220302-052033-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21647 and previous config saved to /var/cache/conftool/dbconfig/20220302-051947-ladsgroup.json
  • 05:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T302185)', diff saved to https://phabricator.wikimedia.org/P21646 and previous config saved to /var/cache/conftool/dbconfig/20220302-051853-ladsgroup.json
  • 05:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21645 and previous config saved to /var/cache/conftool/dbconfig/20220302-050526-ladsgroup.json
  • 05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21644 and previous config saved to /var/cache/conftool/dbconfig/20220302-050442-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21643 and previous config saved to /var/cache/conftool/dbconfig/20220302-045021-ladsgroup.json
  • 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P21642 and previous config saved to /var/cache/conftool/dbconfig/20220302-044938-ladsgroup.json
  • 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21641 and previous config saved to /var/cache/conftool/dbconfig/20220302-043516-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21640 and previous config saved to /var/cache/conftool/dbconfig/20220302-043433-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21639 and previous config saved to /var/cache/conftool/dbconfig/20220302-043313-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21638 and previous config saved to /var/cache/conftool/dbconfig/20220302-043229-ladsgroup.json
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21637 and previous config saved to /var/cache/conftool/dbconfig/20220302-042012-ladsgroup.json
  • 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21636 and previous config saved to /var/cache/conftool/dbconfig/20220302-041725-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1104.eqiad.wmnet with OS bullseye
  • 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P21635 and previous config saved to /var/cache/conftool/dbconfig/20220302-040220-ladsgroup.json
  • 04:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1104.eqiad.wmnet with reason: host reimage
  • 03:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1104.eqiad.wmnet with reason: host reimage
  • 03:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.reimage for host db1104.eqiad.wmnet with OS bullseye
  • 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21634 and previous config saved to /var/cache/conftool/dbconfig/20220302-034715-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21633 and previous config saved to /var/cache/conftool/dbconfig/20220302-034502-ladsgroup.json
  • 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T300992)', diff saved to https://phabricator.wikimedia.org/P21632 and previous config saved to /var/cache/conftool/dbconfig/20220302-034454-ladsgroup.json
  • 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 03:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 03:43 ejegg: updated CiviCRM from e9f0eff5 to cb0605ed
  • 02:13 ejegg: Fundraising CiviCRM updated from 2874d623 to e9f0eff5
  • 00:15 topranks: Re-enabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo.
  • 00:07 topranks: disabling Lumen AS3356 BGP session over IPv4 on cr3-ulsfo to assess affect on currently broken routing to ulsfo.

2022-03-01

  • 22:51 inflatador: T276198 reenabled puppet on elastic1052.eqiad.wmnet
  • 22:37 inflatador: T276198 rebooting elastic1052.eqiad.wmnet to test failure condition
  • 22:33 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
  • 22:33 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp6016.drmrs.wmnet with reason: debugging till we find the root cause of the purged OOM issue; no traffic served
  • 22:32 inflatador: T276198 disabling puppet on elastic1052.eqiad.wmnet to test failure condition (rebooting shortly)
  • 21:53 dancy@deploy1002: Finished scap: Resync to try to clear alerts (duration: 12m 08s)
  • 21:41 dancy@deploy1002: Started scap: Resync to try to clear alerts
  • 21:36 dancy@deploy1002: Started scap: Resync to try to clear alerts
  • 20:36 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.38.0-wmf.24 refs T300200
  • 20:33 brennen: 1.38.0-wmf.24 train (T300200): no current blockers; proceeding to group0; note this may briefly trigger some version alerts
  • 20:30 brennen@deploy1002: Synchronized php-1.38.0-wmf.24/includes: Backport: Revert "preferences: Use a faster and simpler form descriptor when validating" (T302643) (duration: 00m 55s)
  • 20:05 mutante: alert1001 - re-enabled puppet
  • 20:05 brennen@deploy1002: Finished scap: testwikis wikis to 1.38.0-wmf.24 refs T300200 (duration: 53m 17s)
  • 19:45 mutante: alert1001 - disable puppet, systemctl stop ircecho - to stop bot spam, caused somehow by new scap version breaking "mw versions mismwatch" alerting - affects labtestwiki,testwiki,testwikidatawiki
  • 19:38 mutante: mw1449 - scap pull
  • 19:36 mutante: mw1414 - scap pull
  • 19:11 brennen@deploy1002: Started scap: testwikis wikis to 1.38.0-wmf.24 refs T300200
  • 19:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2008.codfw.wmnet
  • 19:01 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 18:57 brennen: 1.38.0-wmf.24 train (T300200): there's currently a single blocker at T302643; staging to testwikis and holding there until backport's available
  • 18:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2008.codfw.wmnet
  • 18:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
  • 18:45 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti2008.codfw.wmnet with reason: Remove from Ganeti cluster for decom
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21626 and previous config saved to /var/cache/conftool/dbconfig/20220301-180216-ladsgroup.json
  • 17:52 cwhite: completed grafana upgrade in eqiad T282863
  • 17:50 herron: re-enabling puppet and ircecho on alert1001
  • 17:47 cwhite: upgrade grafana in eqiad T282863
  • 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21625 and previous config saved to /var/cache/conftool/dbconfig/20220301-174711-ladsgroup.json
  • 17:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P21624 and previous config saved to /var/cache/conftool/dbconfig/20220301-173206-ladsgroup.json
  • 17:24 dancy@deploy1002: Finished scap: testing container image build (duration: 28m 39s)
  • 17:17 herron: stopped ircecho on alert1001 due to systemd unit alert shower
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21622 and previous config saved to /var/cache/conftool/dbconfig/20220301-171701-ladsgroup.json
  • 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T300992)', diff saved to https://phabricator.wikimedia.org/P21621 and previous config saved to /var/cache/conftool/dbconfig/20220301-171441-ladsgroup.json
  • 17:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 17:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:55 dancy@deploy1002: Started scap: testing container image build
  • 16:24 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 03s)
  • 16:23 ebysans@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
  • 16:12 moritzm: restarting apache on logstash nodes to pick up expat update
  • 16:11 elukey@deploy1002: Finished deploy [ores/deploy@29de1cc]: ORES Winter deployment - T300195 (duration: 36m 13s)
  • 16:05 moritzm: restarting nginx on wcqs* nodes to pick up expat update
  • 15:35 elukey@deploy1002: Started deploy [ores/deploy@29de1cc]: ORES Winter deployment - T300195
  • 15:21 ntsako@deploy1002: Finished deploy [airflow-dags/analytics@cac16e8]: (no justification provided) (duration: 00m 07s)
  • 15:21 ntsako@deploy1002: Started deploy [airflow-dags/analytics@cac16e8]: (no justification provided)
  • 15:06 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2003.codfw.wmnet
  • 14:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:52 elukey: elukey@deploy1002:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the node)
  • 14:51 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 14:51 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
  • 14:48 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2002.codfw.wmnet
  • 14:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 14:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 14:38 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:36 vgutierrez: pool cp1087 running HAProxy as TLS termination layer - T290005 T271421
  • 14:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1087.eqiad.wmnet with OS buster
  • 14:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 14:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
  • 14:32 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
  • 14:32 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ml-staging-etcd2001.codfw.wmnet
  • 14:19 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 14:19 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 14:14 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
  • 14:09 moritzm: restarting nginx on wdqs* nodes to pick up expat update
  • 14:03 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
  • 14:03 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:57 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:57 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:53 mmandere: restart purged on cp60[15-16]
  • 13:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
  • 13:48 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:48 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
  • 13:48 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2002.codfw.wmnet
  • 13:48 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: host reimage
  • 13:44 klausman@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ml-staging-etcd2003.codfw.wmnet
  • 13:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:43 klausman@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:40 kormat: Deploying wmfmariadbpy 0.9 T302796
  • 13:40 kormat: uploaded wmfmariadbpy 0.9 to apt.wm.o T302796
  • 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:39 klausman@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2003.codfw.wmnet
  • 13:39 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 13:39 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2002.codfw.wmnet
  • 13:32 moritzm: restarting nginx on registry* nodes to pick up expat update
  • 13:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp1087.eqiad.wmnet with OS buster
  • 13:15 XioNoX: restart cr1-drmrs for software upgrade
  • 13:03 moritzm: restarting FPM/Apache on parsoid hosts to pick up expat update
  • 12:50 vgutierrez: pool cp3062 running HAProxy as TLS termination layer - T290005 T271421
  • 12:47 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3062.esams.wmnet with OS buster
  • 12:39 moritzm: installing expat security updates
  • 12:34 mmandere: restart purged on cp60[12-14]
  • 12:32 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker (duration: 01m 06s)
  • 12:31 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (eqiad): Reduce pool size to 1 connection per node worker
  • 12:30 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker (duration: 01m 30s)
  • 12:28 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@41d2498] (codfw): Reduce pool size to 1 connection per node worker
  • 12:15 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration (duration: 01m 41s)
  • 12:13 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (codfw): Fix pool size configuration
  • 12:11 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration (duration: 02m 01s)
  • 12:09 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@51d5a07] (eqiad): Fix pool size configuration
  • 11:43 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:36 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 11:35 klausman@cumin2002: START - Cookbook sre.dns.netbox
  • 11:35 klausman@cumin2002: START - Cookbook sre.ganeti.makevm for new host ml-staging-etcd2001.codfw.wmnet
  • 11:33 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 11:32 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 11:30 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 11:28 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 11:27 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 11:27 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:21 _joe_: restarted pybal, removed ipvsadm entry on lvs1019. Now all of MediaWiki has no http LVS endpoint available.T244843
  • 11:18 _joe_: also removed the ipvsadm entry for apaches:80 T244843
  • 11:17 jayme: rolled back linkrecommendation staging helm release to revision 12 - T302744
  • 11:17 _joe_: restarting pybal on lvs1020 T244843
  • 11:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
  • 11:11 _joe_: restarted pybal on lvs2009, T244843
  • 11:09 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3062.esams.wmnet with reason: host reimage
  • 11:07 _joe_: restarted pybal on lvs2010, T244843
  • 11:02 mmandere: restart purged on cp60[09,10,11]
  • 11:00 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1148.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:47 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1147.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp3062.esams.wmnet with OS buster
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 259 hosts
  • 10:40 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 259 hosts
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ema out of all services on: 1353 hosts
  • 10:39 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ema out of all services on: 1353 hosts
  • 10:31 mmandere: restart purged on cp600[6-8]
  • 10:28 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:05 vgutierrez: pool cp2039 running HAProxy as TLS termination layer - T290005 T271421
  • 09:48 elukey: elukey@stat1004:~$ sudo kill `pgrep -u zpapierski` (offboarded user, puppet broken on the host)
  • 09:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2039.codfw.wmnet with OS buster
  • 09:33 _joe_: restarted pybal on lvs1019, removed the mw api from ipvsadm, the mw api is internally fully encrypted
  • 09:31 _joe_: restart pybal on lvs1020
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Amuigai out of all services on: 1881 hosts
  • 09:25 elukey: restart varnishkafka-webrequest on cp6009 as attempt to clear a weird status of librdkafka (delivery errors to kafka)
  • 09:25 _joe_: manually removed ipvs entries on lvs2*, so it is actually now that the http api is not available in codfw anymore
  • 09:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Amuigai out of all services on: 1881 hosts
  • 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ZPapierski out of all services on: 1881 hosts
  • 09:22 jmm@cumin2002: START - Cookbook sre.idm.logout Logging ZPapierski out of all services on: 1881 hosts
  • 09:22 _joe_: restarted pybal on lvs2009, the mw api is now effectively https-only in codfw T287820
  • 09:20 _joe_: restarted pybal on lvs2010
  • 09:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
  • 09:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2039.codfw.wmnet with reason: host reimage
  • 09:06 elukey: restart purged on cp6005
  • 08:57 elukey: restart purged on cp6004
  • 08:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.reimage for host cp2039.codfw.wmnet with OS buster
  • 08:27 urbanecm: UTC morning B&C window done
  • 08:25 elukey: restart purged on cp6003
  • 08:16 moritzm: drain instances off ganeti2008 for eventual decom
  • 08:08 urbanecm@deploy1002: Synchronized wmf-config/ProductionServices.php: d149208: Use service-proxy to connect to linkrecommendation (T302719) (duration: 00m 49s)
  • 07:59 elukey: restart purged on cp6002
  • 06:58 oblivian@deploy1002: Finished deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test (duration: 00m 17s)
  • 06:57 oblivian@deploy1002: Started deploy [restbase/deploy@0848b15] (dev-cluster): T302464 test
  • 06:56 elukey: restart purged on cp6001 to clear stale kafka TLS consumer state (or attempting to)
  • 06:46 _joe_: uploaded scap 4.4.1 to {stretch,buster,bullseye} T302464
  • 06:46 _joe_: uploaded scap 4.4.1 to {stretch,buster,bullseye}
  • 02:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21618 and previous config saved to /var/cache/conftool/dbconfig/20220301-025938-ladsgroup.json
  • 02:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21617 and previous config saved to /var/cache/conftool/dbconfig/20220301-024433-ladsgroup.json
  • 02:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P21616 and previous config saved to /var/cache/conftool/dbconfig/20220301-022928-ladsgroup.json
  • 02:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21615 and previous config saved to /var/cache/conftool/dbconfig/20220301-021424-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T302185)', diff saved to https://phabricator.wikimedia.org/P21614 and previous config saved to /var/cache/conftool/dbconfig/20220301-011404-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 00:17 mutante: 15.wikipedia.org on k8s (staging) deploy1002:~] $ curl -s --resolve "15.wikipedia.org:4111:staging.svc.eqiad.wmnet" 'https://15.wikipedia.org' | grep grandpa => "“Wikipedia is like an all-knowing grandpa.”" | T300171


2000s

2010s

2020s