Server Admin Log/Archive 56

From Wikitech

2022-08-31

  • 23:31 krinkle@deploy1002: Synchronized wmf-config/: I493b5e4662 (duration: 03m 43s)
  • 23:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:17 krinkle@deploy1002: Synchronized private/: (no justification provided) (duration: 03m 42s)
  • 23:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:13 krinkle@deploy1002: Synchronized wmf-config/: Ibdac0a (duration: 03m 44s)
  • 23:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:12 Krinkle: krinkle@deploy1002 Change /srv/mediawiki-staging/private to remove wmgElectronSecret
  • 23:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:45 ejegg: payments-wiki upgraded from 80657b06 to 648842f9
  • 22:38 eileen: config revision changed from 4331ef59 to d3696af7
  • 22:00 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1030.eqiad.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
  • 21:59 mutante: etherpad (etherpad1003) - rebooting for maintenance
  • 21:58 mutante: mw1383 start php7.2-fpm_check_restart.service
  • 21:52 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1029.eqiad.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
  • 21:48 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1030.eqiad.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
  • 21:42 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1029.eqiad.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
  • 21:42 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
  • 21:42 ebernhardson: run search index creation for pcmwiki
  • 21:41 ebernhardson: run search index creation for bjnwiktionary
  • 21:40 ebernhardson: run search index creation for guwwiktionary
  • 21:32 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Restart to apply new certificates (T316697) - eevans@cumin1001
  • 21:30 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
  • 21:30 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
  • 21:30 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
  • 21:29 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on etherpad1003.eqiad.wmnet with reason: kernel upgrade
  • 21:00 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@94b160c]: drop_old_data: Add new required param --allowed-interval (duration: 02m 07s)
  • 20:58 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@94b160c]: drop_old_data: Add new required param --allowed-interval
  • 20:57 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
  • 20:43 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 20:39 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 20:38 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.upgrade-and-reboot (exit_code=0)
  • 20:36 eileen: config revision changed from b1bd9422 to 4331ef59
  • 20:31 denisse@cumin1001: START - Cookbook sre.hosts.upgrade-and-reboot
  • m: rebooting netmon1003 for a kernel upgrade
  • 20:23 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:08 bking@cumin1001: conftool action : get/pooled; selector: dnsdisc=wdqs,name=codfw
  • 19:56 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
  • 19:41 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 19:37 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 19:30 ryankemper: T316719 Rolling upgrade operation complete; all of elastic codfw is now on `7.10.2`. Next week our related cirrus changes will go out with the mediawiki deploy train in `1.39.0-wmf.28`
  • 19:21 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
  • 19:21 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T314041)', diff saved to https://phabricator.wikimedia.org/P33725 and previous config saved to /var/cache/conftool/dbconfig/20220831-192120-ladsgroup.json
  • 19:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T314041)', diff saved to https://phabricator.wikimedia.org/P33724 and previous config saved to /var/cache/conftool/dbconfig/20220831-192032-ladsgroup.json
  • 19:18 dzahn@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gitlab2003.wikimedia.org with OS bullseye
  • 19:16 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 19:15 mutante: gitlab: reimaging gitlab2003 with cookbook after reverting partman change and comment on gerrit:827578 T274463
  • 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P33723 and previous config saved to /var/cache/conftool/dbconfig/20220831-190526-ladsgroup.json
  • 18:56 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P33722 and previous config saved to /var/cache/conftool/dbconfig/20220831-185020-ladsgroup.json
  • 18:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T314041)', diff saved to https://phabricator.wikimedia.org/P33721 and previous config saved to /var/cache/conftool/dbconfig/20220831-183513-ladsgroup.json
  • 18:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:27 dduvall@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.27 refs T314188 (duration: 03m 37s)
  • 18:24 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.27 refs T314188
  • 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:06 volans: installing spicerack 3.2.1 on cumin1001
  • 16:52 volans: installing spicerack 3.2.1 on cumin2002
  • 16:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet,service=varnish-fe
  • 16:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet,service=ats-be
  • 16:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6016.drmrs.wmnet,service=ats-tls
  • 16:00 volans: uploaded spicerack_3.2.1 to apt.wikimedia.org bullseye-wikimedia
  • 15:57 _joe_: updated php 7.4 in all of production T316691
  • 15:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS buster
  • 15:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti2015.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 15:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti2015.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 15:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
  • 15:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
  • 15:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS buster
  • 15:06 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp6016.drmrs.wmnet,service=varnish-fe
  • 15:06 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp6016.drmrs.wmnet,service=ats-be
  • 15:06 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp6016.drmrs.wmnet,service=ats-tls
  • 15:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6008.drmrs.wmnet,service=varnish-fe
  • 15:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6008.drmrs.wmnet,service=ats-be
  • 15:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp6008.drmrs.wmnet,service=ats-tls
  • 15:01 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-serve-ctrl2001.codfw.wmnet
  • 15:01 klausman@cumin1001: START - Cookbook sre.hosts.remove-downtime for ml-serve-ctrl2001.codfw.wmnet
  • 15:00 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS buster
  • 14:56 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-serve-ctrl2001.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
  • 14:56 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-serve-ctrl2001.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
  • 14:56 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-serve-ctrl2002.codfw.wmnet
  • 14:56 klausman@cumin1001: START - Cookbook sre.hosts.remove-downtime for ml-serve-ctrl2002.codfw.wmnet
  • 14:50 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-serve-ctrl2002.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
  • 14:50 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-serve-ctrl2002.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
  • 14:49 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 14:48 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 14:47 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 14:47 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 14:43 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 14:43 jayme@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 14:42 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 14:42 jayme@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 14:41 klausman@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
  • 14:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
  • 14:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
  • 14:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS buster
  • 14:11 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp6008.drmrs.wmnet,service=varnish-fe
  • 14:11 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp6008.drmrs.wmnet,service=ats-be
  • 14:11 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp6008.drmrs.wmnet,service=ats-tls
  • 14:09 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp6008.drmrs.wmnet with OS buster
  • 14:08 vgutierrez: deploy trafficserver: Hide non session cookies during cache lookup globally - T316338
  • 14:08 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS buster
  • 13:49 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:47 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikidata.php: Config: Remove unused assignments from SearchSettingsForWikibase.php (2/2) (duration: 03m 33s)
  • 13:43 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikibase.php: Config: Remove unused assignments from SearchSettingsForWikibase.php (1/2) (duration: 03m 38s)
  • 13:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
  • 13:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikidata.php: Config: Directly set WikibaseCirrusSearch settings in IS.php (3/3) (duration: 03m 42s)
  • 13:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
  • 13:31 moritzm: restarting exim on the MXes to pick up zlib update
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:30 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Directly set WikibaseCirrusSearch settings in IS.php (2/3) (duration: 03m 39s)
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:26 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Directly set WikibaseCirrusSearch settings in IS.php (1/3) (duration: 03m 47s)
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:21 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikibase.php: Config: Only set WikibaseCirrusSearch settings if wmg globals are set (duration: 03m 42s)
  • 13:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
  • 13:13 moritzm: installing zlib security updates on bullseye
  • 13:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:11 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: InitialiseSettings.php: Enable Realtime Preview on Group 2 (T314828) (duration: 03m 54s)
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:07 klausman@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
  • 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
  • 12:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
  • 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2002.codfw.wmnet
  • 12:57 vgutierrez: test trafficserver: Hide non session cookies during cache lookup in drmrs - T316338
  • 12:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2002.codfw.wmnet
  • 12:39 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-staging-ctrl2002.codfw.wmnet
  • 12:39 klausman@cumin1001: START - Cookbook sre.hosts.remove-downtime for ml-staging-ctrl2002.codfw.wmnet
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
  • 12:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
  • 12:34 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-staging-ctrl2002.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
  • 12:34 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-staging-ctrl2002.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
  • 12:33 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-staging-ctrl2001.codfw.wmnet
  • 12:33 klausman@cumin1001: START - Cookbook sre.hosts.remove-downtime for ml-staging-ctrl2001.codfw.wmnet
  • 12:28 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-staging-ctrl2001.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 12:28 klausman@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-staging-ctrl2001.codfw.wmnet with reason: Reboot to pick up kernel 5.10.136 (T316185)
  • 12:27 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
  • 12:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 12:17 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
  • 12:16 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
  • 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 12:06 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
  • 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 11:58 marostegui: Reboot sanitarium hosts, lag will appear on clouddb* hosts
  • 11:49 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host webperf1004.eqiad.wmnet
  • 11:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1004.eqiad.wmnet
  • 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
  • 11:27 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 11:27 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 11:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
  • 11:22 moritzm: draining ganeti2015 for eventual reimage T311686
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2004.codfw.wmnet
  • 11:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2004.codfw.wmnet
  • 11:04 vgutierrez: test trafficserver: Hide non session cookies during cache lookup in cp6016 - T316338
  • 11:00 _joe_: updating php 7.4 packages in wikimedia/bustrer T316601
  • 10:42 _joe_: updating php 7.4 on mwdebug1002 to test the new patched packages T316601
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33715 and previous config saved to /var/cache/conftool/dbconfig/20220831-100853-root.json
  • 10:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1002.eqiad.wmnet with OS buster
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33714 and previous config saved to /var/cache/conftool/dbconfig/20220831-095348-root.json
  • 09:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2003.codfw.wmnet
  • 09:44 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2003.codfw.wmnet
  • 09:44 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33713 and previous config saved to /var/cache/conftool/dbconfig/20220831-093844-root.json
  • 09:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
  • 09:34 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-fe2001.codfw.wmnet
  • 09:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1002.eqiad.wmnet with reason: host reimage
  • 09:29 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1002.eqiad.wmnet with reason: host reimage
  • 09:27 moritzm: installing docker.io bugfix updates from Bullseye point release
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33712 and previous config saved to /var/cache/conftool/dbconfig/20220831-092339-root.json
  • 09:22 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
  • 09:17 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host parse1002.eqiad.wmnet with OS buster
  • 09:17 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
  • 09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
  • 09:11 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33711 and previous config saved to /var/cache/conftool/dbconfig/20220831-090834-root.json
  • 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33710 and previous config saved to /var/cache/conftool/dbconfig/20220831-085329-root.json
  • 08:51 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
  • 08:43 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
  • 08:39 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 4%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33709 and previous config saved to /var/cache/conftool/dbconfig/20220831-083824-root.json
  • 08:32 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
  • 08:28 moritzm: upgrading ganeti2016/ganeti2018 to 3.0.2 T312637
  • 08:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 24 hosts with reason: Downtiming php7.4 parsoid servers until they are ready to pool
  • 08:27 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 24 hosts with reason: Downtiming php7.4 parsoid servers until they are ready to pool
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 3%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33708 and previous config saved to /var/cache/conftool/dbconfig/20220831-082319-root.json
  • 08:20 vgutierrez: end test trafficserver: Hide non session cookies during cache lookup in cp6016 - T316338
  • 08:12 vgutierrez: test trafficserver: Hide non session cookies during cache lookup in cp6016 - T316338
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 2%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33707 and previous config saved to /var/cache/conftool/dbconfig/20220831-080815-root.json
  • 07:54 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host prometheus2006.codfw.wmnet
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 1%: Repooling after upgrade', diff saved to https://phabricator.wikimedia.org/P33706 and previous config saved to /var/cache/conftool/dbconfig/20220831-075310-root.json
  • 07:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2022.codfw.wmnet to cluster codfw and group B
  • 07:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 07:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2022.codfw.wmnet to cluster codfw and group B
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 for upgrade', diff saved to https://phabricator.wikimedia.org/P33705 and previous config saved to /var/cache/conftool/dbconfig/20220831-074748-root.json
  • 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
  • 07:40 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 07:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 07:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
  • 07:15 godog: bounce thanos-compact on thanos-fe2001
  • 05:00 marostegui: Failover m3 from db1183 to db1159 - T316506
  • 04:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2132,2160].codfw.wmnet,db[1117,1195].eqiad.wmnet with reason: switchover m1 T316506
  • 04:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2132,2160].codfw.wmnet,db[1117,1195].eqiad.wmnet with reason: switchover m1 T316506
  • 03:23 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
  • 03:23 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
  • 03:17 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
  • 02:50 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
  • 02:49 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
  • 00:15 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
  • 00:14 ryankemper: T316719 First elastic host upgraded properly. Cancelling cookbook to kick off a new rolling upgrade that will go 3 nodes at a time (first run was just one host as a sanity check)
  • 00:14 ryankemper@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
  • 00:08 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719

2022-08-30

  • 23:55 ryankemper: T316719 Merged https://phabricator.wikimedia.org/T316719; running puppet across codfw fleet: `ryankemper@cumin2002:~$ sudo -E cumin -b 6 'A:elastic-codfw' 'run-puppet-agent'`
  • 23:50 ryankemper@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
  • 23:50 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
  • 22:02 eileen: civicrm upgraded from a31c7590 to 76308ffb
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T314041)', diff saved to https://phabricator.wikimedia.org/P33703 and previous config saved to /var/cache/conftool/dbconfig/20220830-210218-ladsgroup.json
  • 21:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 21:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 20:43 ryankemper@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
  • 20:43 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw es7 cluster upgrade - ryankemper@cumin2002 - T316719
  • 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:11 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 03m 43s)
  • 19:45 ryankemper: [Relforge] `ryankemper@cumin1001:~$ sudo -E cumin '*relforge*' 'run-puppet-agent --force'`
  • 18:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:15 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.27 refs T314188
  • 17:46 joal@deploy1002: Finished deploy [analytics/refinery@aa8f88f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aa8f88f] (duration: 08m 19s)
  • 17:38 joal@deploy1002: Started deploy [analytics/refinery@aa8f88f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@aa8f88f]
  • 17:38 joal@deploy1002: Finished deploy [analytics/refinery@aa8f88f] (thin): Regular analytics weekly train THIN [analytics/refinery@aa8f88f] (duration: 00m 08s)
  • 17:37 joal@deploy1002: Started deploy [analytics/refinery@aa8f88f] (thin): Regular analytics weekly train THIN [analytics/refinery@aa8f88f]
  • 17:37 joal@deploy1002: Finished deploy [analytics/refinery@aa8f88f]: Regular analytics weekly train [analytics/refinery@aa8f88f] (duration: 26m 10s)
  • 17:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:12 moritzm: installing logrotate security updates on Bullseye
  • 17:11 joal@deploy1002: Started deploy [analytics/refinery@aa8f88f]: Regular analytics weekly train [analytics/refinery@aa8f88f]
  • 17:08 dduvall@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.27 refs T314188 (duration: 39m 07s)
  • 17:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2022.codfw.wmnet with OS bullseye
  • 17:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
  • 16:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
  • 16:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 16:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
  • 16:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2022.codfw.wmnet with reason: host reimage
  • 16:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
  • 16:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2022.codfw.wmnet with reason: host reimage
  • 16:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2004.codfw.wmnet
  • 16:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T316186)', diff saved to https://phabricator.wikimedia.org/P33701 and previous config saved to /var/cache/conftool/dbconfig/20220830-163619-ladsgroup.json
  • 16:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2004.codfw.wmnet
  • 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2022.codfw.wmnet with OS bullseye
  • 16:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1010.eqiad.wmnet
  • 16:29 dduvall@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.27 refs T314188
  • 16:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host stat1010.eqiad.wmnet
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P33700 and previous config saved to /var/cache/conftool/dbconfig/20220830-162113-ladsgroup.json
  • 16:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1005.eqiad.wmnet
  • 16:12 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1005.eqiad.wmnet
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P33699 and previous config saved to /var/cache/conftool/dbconfig/20220830-160607-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T316186)', diff saved to https://phabricator.wikimedia.org/P33698 and previous config saved to /var/cache/conftool/dbconfig/20220830-155101-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T316186)', diff saved to https://phabricator.wikimedia.org/P33697 and previous config saved to /var/cache/conftool/dbconfig/20220830-154337-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T316186)', diff saved to https://phabricator.wikimedia.org/P33696 and previous config saved to /var/cache/conftool/dbconfig/20220830-154314-ladsgroup.json
  • 15:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 15:32 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:32 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P33695 and previous config saved to /var/cache/conftool/dbconfig/20220830-152807-ladsgroup.json
  • 15:25 vgutierrez: restarting ats in cp6008
  • 15:25 vgutierrez: restarting ats in cp6007
  • 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1009.eqiad.wmnet
  • 15:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti2022.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 15:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti2022.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 15:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host stat1009.eqiad.wmnet
  • 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2002.codfw.wmnet
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P33694 and previous config saved to /var/cache/conftool/dbconfig/20220830-151301-ladsgroup.json
  • 15:10 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 15:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2002.codfw.wmnet
  • 15:09 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 15:07 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 15:06 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 15:06 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:05 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T316186)', diff saved to https://phabricator.wikimedia.org/P33693 and previous config saved to /var/cache/conftool/dbconfig/20220830-145755-ladsgroup.json
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T316186)', diff saved to https://phabricator.wikimedia.org/P33691 and previous config saved to /var/cache/conftool/dbconfig/20220830-145035-ladsgroup.json
  • 14:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T316186)', diff saved to https://phabricator.wikimedia.org/P33690 and previous config saved to /var/cache/conftool/dbconfig/20220830-145011-ladsgroup.json
  • 14:45 cmooney@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1439.eqiad.wmnet
  • 14:45 cmooney@cumin1001: conftool action : set/pooled=yes; selector: cluster=jobrunner,name=mw1437.eqiad.wmnet
  • 14:39 cmooney@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1440.eqiad.wmnet
  • 14:39 cmooney@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1439.eqiad.wmnet
  • 14:39 cmooney@cumin1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1437.eqiad.wmnet
  • 14:38 topranks: de-pooling mw1437/mw1439/mw1440 from jobrunner cluster as those hosts are busy running videoscaler tasks
  • 14:35 herron@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1439.eqiad.wmnet
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P33689 and previous config saved to /var/cache/conftool/dbconfig/20220830-143505-ladsgroup.json
  • 14:33 herron@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=videoscaler,name=mw1437.eqiad.wmnet
  • 14:28 herron@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1438.eqiad.wmnet
  • 14:27 herron@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=jobrunner,name=mw1438
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P33688 and previous config saved to /var/cache/conftool/dbconfig/20220830-141959-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T316186)', diff saved to https://phabricator.wikimedia.org/P33687 and previous config saved to /var/cache/conftool/dbconfig/20220830-140452-ladsgroup.json
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T316186)', diff saved to https://phabricator.wikimedia.org/P33686 and previous config saved to /var/cache/conftool/dbconfig/20220830-135733-ladsgroup.json
  • 13:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T316186)', diff saved to https://phabricator.wikimedia.org/P33685 and previous config saved to /var/cache/conftool/dbconfig/20220830-135658-ladsgroup.json
  • 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:52 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/Translate/src/PageTranslation/RenderTranslationPageJob.php: 75d8e6c: RenderTranslationPageJob: Add patrol status for translation page (T315708) (duration: 03m 59s)
  • 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P33684 and previous config saved to /var/cache/conftool/dbconfig/20220830-134152-ladsgroup.json
  • 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 25500e5: Make DiscussionTools topicsubscription, autotopicsub opt-out on all wikis (T315714) (duration: 03m 56s)
  • 13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ea3228b: Enable reply tool by default on fiwiki (T297533) (duration: 04m 01s)
  • 13:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:28 vgutierrez: Increase roll-out of query-sorting to 100% - T314868
  • 13:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P33683 and previous config saved to /var/cache/conftool/dbconfig/20220830-132646-ladsgroup.json
  • 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:21 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: testwiki: Fix language code for Bhojpuri (T313296) (duration: 03m 53s)
  • 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:12 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable SectionTranslation on 10 more WPs where ContentTranslation is default (T313300) (duration: 03m 56s)
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T316186)', diff saved to https://phabricator.wikimedia.org/P33682 and previous config saved to /var/cache/conftool/dbconfig/20220830-131140-ladsgroup.json
  • 13:11 moritzm: installing libxslt security updates for stretch
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T316186)', diff saved to https://phabricator.wikimedia.org/P33681 and previous config saved to /var/cache/conftool/dbconfig/20220830-130521-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T316186)', diff saved to https://phabricator.wikimedia.org/P33680 and previous config saved to /var/cache/conftool/dbconfig/20220830-130457-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P33679 and previous config saved to /var/cache/conftool/dbconfig/20220830-124951-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P33678 and previous config saved to /var/cache/conftool/dbconfig/20220830-123445-ladsgroup.json
  • 12:31 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite1004.eqiad.wmnet
  • 12:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:24 btullis: committing updated switch configuration https://gerrit.wikimedia.org/r/c/operations/homer/public/+/827979
  • 12:20 godog: rollback and reboot graphite1004 with linux-image-5.10.0-16-amd64
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T316186)', diff saved to https://phabricator.wikimedia.org/P33677 and previous config saved to /var/cache/conftool/dbconfig/20220830-121938-ladsgroup.json
  • 12:19 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T316186)', diff saved to https://phabricator.wikimedia.org/P33676 and previous config saved to /var/cache/conftool/dbconfig/20220830-121421-ladsgroup.json
  • 12:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 12:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T316186)', diff saved to https://phabricator.wikimedia.org/P33675 and previous config saved to /var/cache/conftool/dbconfig/20220830-121357-ladsgroup.json
  • 12:04 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite1004.eqiad.wmnet
  • 12:01 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P33674 and previous config saved to /var/cache/conftool/dbconfig/20220830-115851-ladsgroup.json
  • 11:53 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:52 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 11:52 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P33673 and previous config saved to /var/cache/conftool/dbconfig/20220830-114345-ladsgroup.json
  • 11:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:36 moritzm: uploaded libxslt 1.1.29-2.1+deb9u2+wmf1 to apt.wikimedia.org
  • 11:32 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host idp1002.wikimedia.org
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T316186)', diff saved to https://phabricator.wikimedia.org/P33672 and previous config saved to /var/cache/conftool/dbconfig/20220830-112838-ladsgroup.json
  • 11:24 volans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
  • 11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp1002.wikimedia.org
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T316186)', diff saved to https://phabricator.wikimedia.org/P33671 and previous config saved to /var/cache/conftool/dbconfig/20220830-112117-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T316186)', diff saved to https://phabricator.wikimedia.org/P33670 and previous config saved to /var/cache/conftool/dbconfig/20220830-112048-ladsgroup.json
  • 11:07 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P33669 and previous config saved to /var/cache/conftool/dbconfig/20220830-110542-ladsgroup.json
  • 11:04 volans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P33668 and previous config saved to /var/cache/conftool/dbconfig/20220830-105036-ladsgroup.json
  • 10:50 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling db2096 after maint', diff saved to https://phabricator.wikimedia.org/P33667 and previous config saved to /var/cache/conftool/dbconfig/20220830-104616-ladsgroup.json
  • 10:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 10:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 10:39 btullis: committing updated switch configuration https://gerrit.wikimedia.org/r/c/operations/homer/public/+/826579
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T316186)', diff saved to https://phabricator.wikimedia.org/P33666 and previous config saved to /var/cache/conftool/dbconfig/20220830-103530-ladsgroup.json
  • 10:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T316186)', diff saved to https://phabricator.wikimedia.org/P33665 and previous config saved to /var/cache/conftool/dbconfig/20220830-103012-ladsgroup.json
  • 10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling db1167 (T316186)', diff saved to https://phabricator.wikimedia.org/P33664 and previous config saved to /var/cache/conftool/dbconfig/20220830-102342-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T316186)', diff saved to https://phabricator.wikimedia.org/P33663 and previous config saved to /var/cache/conftool/dbconfig/20220830-102220-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 10:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 10:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 10:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 10:11 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 10:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2003.codfw.wmnet
  • 10:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:08 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
  • 10:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:03 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1011 to pc1 master (duration: 03m 44s)
  • 10:03 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
  • 10:02 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2003.codfw.wmnet
  • 10:01 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
  • 09:55 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 09:53 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@ff76338]: Add sd-alerts notifications to image_suggestions_weekly (duration: 02m 05s)
  • 09:53 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host centrallog2002.codfw.wmnet
  • 09:53 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 09:51 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@ff76338]: Add sd-alerts notifications to image_suggestions_weekly
  • 09:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 09:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2001.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 09:41 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to old templatelinks fields in s6 (T312865) (duration: 03m 57s)
  • 09:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:34 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to pc1 master (duration: 03m 50s)
  • 09:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:31 moritzm: draining ganeti2022 for eventual reimage T311686
  • 09:18 moritzm: installing perf updates on Bullseye hosts
  • 09:12 moritzm: upgrading ganeti2027,ganeti2028 to 3.0.2 T312637
  • 09:07 jynus: restart dbprov* hosts
  • 08:58 _joe_: powercycling parse1002, blank console
  • 08:58 moritzm: upgrading ganeti2010,ganeti2012,ganeti2024 to 3.0.2 T312637
  • 08:53 moritzm: failover Ganeti master in codfw to ganeti2020 T311686
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to current x1 codfw master', diff saved to https://phabricator.wikimedia.org/P33661 and previous config saved to /var/cache/conftool/dbconfig/20220830-084945-root.json
  • 08:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 08:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2096 T316522', diff saved to https://phabricator.wikimedia.org/P33660 and previous config saved to /var/cache/conftool/dbconfig/20220830-083845-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2115 to x1 codfw primary T316522', diff saved to https://phabricator.wikimedia.org/P33659 and previous config saved to /var/cache/conftool/dbconfig/20220830-083654-root.json
  • 08:36 marostegui: Starting x1 codfw failover from db2096 to db2115 - T316522
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2115 with weight 0 T316522', diff saved to https://phabricator.wikimedia.org/P33658 and previous config saved to /var/cache/conftool/dbconfig/20220830-083103-root.json
  • 08:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: DC switchover x1 T316522
  • 08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: DC switchover x1 T316522
  • 08:24 vgutierrez: ATS: enforce per-request timeout globally (205 secs) - T315533
  • 07:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 07:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 07:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:17 tgr: UTC morning deploy window done
  • 07:17 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Declare mediawiki.accountcreation_block stream (T306018) (duration: 04m 11s)
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:53 _joe_: running scap pull on parse1* T316611
  • 06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T316186)', diff saved to https://phabricator.wikimedia.org/P33657 and previous config saved to /var/cache/conftool/dbconfig/20220830-063332-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T316186)', diff saved to https://phabricator.wikimedia.org/P33656 and previous config saved to /var/cache/conftool/dbconfig/20220830-062613-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 06:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T316186)', diff saved to https://phabricator.wikimedia.org/P33655 and previous config saved to /var/cache/conftool/dbconfig/20220830-062547-ladsgroup.json
  • 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T316186)', diff saved to https://phabricator.wikimedia.org/P33654 and previous config saved to /var/cache/conftool/dbconfig/20220830-061926-ladsgroup.json
  • 06:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 06:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T316186)', diff saved to https://phabricator.wikimedia.org/P33653 and previous config saved to /var/cache/conftool/dbconfig/20220830-061901-ladsgroup.json
  • 06:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T316186)', diff saved to https://phabricator.wikimedia.org/P33652 and previous config saved to /var/cache/conftool/dbconfig/20220830-061243-ladsgroup.json
  • 06:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 06:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2112 (T316186)', diff saved to https://phabricator.wikimedia.org/P33651 and previous config saved to /var/cache/conftool/dbconfig/20220830-061218-ladsgroup.json
  • 06:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2112 (T316186)', diff saved to https://phabricator.wikimedia.org/P33650 and previous config saved to /var/cache/conftool/dbconfig/20220830-060554-ladsgroup.json
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1173 T316110 T312984 T312863 T316186', diff saved to https://phabricator.wikimedia.org/P33649 and previous config saved to /var/cache/conftool/dbconfig/20220830-060543-ladsgroup.json
  • 06:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 06:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T316186)', diff saved to https://phabricator.wikimedia.org/P33648 and previous config saved to /var/cache/conftool/dbconfig/20220830-060509-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1131 to s6 primary and set section read-write T316110', diff saved to https://phabricator.wikimedia.org/P33647 and previous config saved to /var/cache/conftool/dbconfig/20220830-060109-ladsgroup.json
  • 06:00 Amir1: Starting s6 eqiad failover from db1173 to db1131 - T316110
  • 05:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T316186)', diff saved to https://phabricator.wikimedia.org/P33645 and previous config saved to /var/cache/conftool/dbconfig/20220830-055948-ladsgroup.json
  • 05:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 05:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 05:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T316186)', diff saved to https://phabricator.wikimedia.org/P33644 and previous config saved to /var/cache/conftool/dbconfig/20220830-055555-ladsgroup.json
  • 05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T316186)', diff saved to https://phabricator.wikimedia.org/P33643 and previous config saved to /var/cache/conftool/dbconfig/20220830-054924-ladsgroup.json
  • 05:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 05:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T316186)', diff saved to https://phabricator.wikimedia.org/P33642 and previous config saved to /var/cache/conftool/dbconfig/20220830-054859-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T316186)', diff saved to https://phabricator.wikimedia.org/P33641 and previous config saved to /var/cache/conftool/dbconfig/20220830-054242-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 05:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T316186)', diff saved to https://phabricator.wikimedia.org/P33640 and previous config saved to /var/cache/conftool/dbconfig/20220830-054217-ladsgroup.json
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T316186)', diff saved to https://phabricator.wikimedia.org/P33639 and previous config saved to /var/cache/conftool/dbconfig/20220830-053559-ladsgroup.json
  • 05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 05:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 05:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T316186)', diff saved to https://phabricator.wikimedia.org/P33638 and previous config saved to /var/cache/conftool/dbconfig/20220830-053529-ladsgroup.json
  • 05:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 05:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T316186)', diff saved to https://phabricator.wikimedia.org/P33637 and previous config saved to /var/cache/conftool/dbconfig/20220830-052930-ladsgroup.json
  • 05:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1131 with weight 0 T316110', diff saved to https://phabricator.wikimedia.org/P33636 and previous config saved to /var/cache/conftool/dbconfig/20220830-051106-ladsgroup.json
  • 05:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s6 T316110
  • 05:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s6 T316110
  • 05:03 ryankemper: T306899 T316496 Deployed WCQS `0.3.115`. That should (hopefully) resolve these tickets.
  • 05:01 ryankemper: [WCQS Deploy] Restarted `wcqs-updater` across all hosts: `sudo -E cumin 'A:wcqs-public' 'systemctl restart wcqs-updater'`
  • 05:00 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@2d34f5c] (wcqs): Deploy 0.3.115 to WCQS (duration: 02m 00s)
  • 04:58 ryankemper@deploy1002: Started deploy [wdqs/wdqs@2d34f5c] (wcqs): Deploy 0.3.115 to WCQS
  • 04:58 ryankemper: [WCQS Deploy] Gearing up for deploy of wcqs `0.3.115`
  • 04:58 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 04:58 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 04:57 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 04:45 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@2d34f5c]: 0.3.115 (duration: 09m 01s)
  • 04:37 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.115` on canary `wdqs1003`; proceeding to rest of fleet
  • 04:36 ryankemper@deploy1002: Started deploy [wdqs/wdqs@2d34f5c]: 0.3.115
  • 04:35 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.115`. Pre-deploy tests passing on canary `wdqs1003`
  • 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:29 ejegg: payments-wiki upgraded from dc6d899d to 80657b06
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:04 TimStarling: setting scaling_governor=performance on all mediawiki servers, via puppet gerrit 826405
  • 00:12 aaron@deploy1002: Finished deploy [performance/arc-lamp@40cb764]: T315056 (duration: 00m 07s)
  • 00:12 aaron@deploy1002: Started deploy [performance/arc-lamp@40cb764]: T315056

2022-08-29

  • 23:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:40 krinkle@deploy1002: Synchronized wmf-config/: I9f17d80d9d91 (duration: 03m 53s)
  • 23:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:32 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I15a334 (duration: 03m 42s)
  • 23:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:24 krinkle@deploy1002: Synchronized wmf-config/: I5e0e5a (duration: 03m 27s)
  • 23:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:13 krinkle@deploy1002: Synchronized wmf-config/: Id9707d (duration: 03m 48s)
  • 23:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:41 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@57fb704]: force re-deploy HEAD to attempt to get artifacts directory populated on an-airflow1001 (duration: 02m 01s)
  • 22:40 tgr: UTC late backport window done
  • 22:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:39 tgr@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/GrowthExperiments/extension.json: Backport: Fix WelcomeSurvey CentralAuthPostLoginRedirect hook (step 2) (duration: 03m 53s)
  • 22:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:39 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@57fb704]: force re-deploy HEAD to attempt to get artifacts directory populated on an-airflow1001
  • 22:16 ejegg: payments-wiki upgraded from a63b300e to dc6d899d
  • 22:13 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@57fb704]: re-deploy HEAD to attempt to get artifacts directory populated on an-airflow1001 (duration: 00m 04s)
  • 22:13 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@57fb704]: re-deploy HEAD to attempt to get artifacts directory populated on an-airflow1001
  • 21:53 tgr@deploy1002: Synchronized static/images/project-logos: Config: Adjust width-height ratio of logos for bewikisource, euwikisource, cswikisource to fix display issue (T310961) (duration: 03m 59s)
  • 21:48 tgr@deploy1002: Synchronized wmf-config/logos.php: Config: Adjust width-height ratio of logos for bewikisource, euwikisource, cswikisource to fix display issue (T310961) (duration: 03m 34s)
  • 21:44 tgr@deploy1002: Synchronized logos/config.yaml: Config: Adjust width-height ratio of logos for bewikisource, euwikisource, cswikisource to fix display issue (T310961) (duration: 03m 45s)
  • 21:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:08 cjming@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/GrowthExperiments/tests/selenium/specs/homepage.js: Backport: Temporarily disable change tag test (T316596) (duration: 03m 49s)
  • 21:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:01 cjming@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/GrowthExperiments/includes/WelcomeSurveyHooks.php: Backport: Fix WelcomeSurvey CentralAuthPostLoginRedirect hook (step 1) (T315583 T316311) (duration: 03m 36s)
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:53 cjming@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/ConfirmEdit/includes/Auth/CaptchaAuthenticationRequest.php: Backport: Restore auth request ID from before namespacing (T316410) (duration: 03m 45s)
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:42 cjming@deploy1002: Synchronized php-1.39.0-wmf.26/skins/Vector: Backport: Fix site notice spacing (T315595) (duration: 03m 46s)
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:31 cjming@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/DiscussionTools/maintenance: Backport: Fix boilerplate in maintenance scripts for WMF production (T316548) (duration: 03m 41s)
  • 20:27 cjming@deploy1002: sync-file aborted: Backport: Fix boilerplate in maintenance scripts for WMF production (T316548) (duration: 00m 05s)
  • 20:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:14 bblack: Revert of cookie-related changes https://gerrit.wikimedia.org/r/c/operations/puppet/+/827566/ pushing to all cp-text
  • 20:14 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@57fb704]: Deploy mjolnir 1.1 for elasticsearch 7.x compatability (duration: 00m 24s)
  • 20:13 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@57fb704]: Deploy mjolnir 1.1 for elasticsearch 7.x compatability
  • 20:08 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Enable new Vector skin on select pages (take 2)" (T309973) (duration: 03m 34s)
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:04 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@57fb704]: Deploy mjolnir 1.1 for elasticsearch 7.x compatability (duration: 00m 11s)
  • 20:04 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@57fb704]: Deploy mjolnir 1.1 for elasticsearch 7.x compatability
  • 19:34 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@5c0af35]: Update to work with elasticsearch 7.x (duration: 00m 54s)
  • 19:33 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@5c0af35]: Update to work with elasticsearch 7.x
  • 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P33634 and previous config saved to /var/cache/conftool/dbconfig/20220829-192608-ladsgroup.json
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P33633 and previous config saved to /var/cache/conftool/dbconfig/20220829-191950-ladsgroup.json
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T316186)', diff saved to https://phabricator.wikimedia.org/P33632 and previous config saved to /var/cache/conftool/dbconfig/20220829-190444-ladsgroup.json
  • 19:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T316186)', diff saved to https://phabricator.wikimedia.org/P33631 and previous config saved to /var/cache/conftool/dbconfig/20220829-185723-ladsgroup.json
  • 18:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T316186)', diff saved to https://phabricator.wikimedia.org/P33630 and previous config saved to /var/cache/conftool/dbconfig/20220829-185659-ladsgroup.json
  • 18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P33629 and previous config saved to /var/cache/conftool/dbconfig/20220829-184153-ladsgroup.json
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P33628 and previous config saved to /var/cache/conftool/dbconfig/20220829-182646-ladsgroup.json
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T316186)', diff saved to https://phabricator.wikimedia.org/P33627 and previous config saved to /var/cache/conftool/dbconfig/20220829-181140-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T316186)', diff saved to https://phabricator.wikimedia.org/P33626 and previous config saved to /var/cache/conftool/dbconfig/20220829-180421-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 18:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 18:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T316186)', diff saved to https://phabricator.wikimedia.org/P33625 and previous config saved to /var/cache/conftool/dbconfig/20220829-180358-ladsgroup.json
  • 17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33624 and previous config saved to /var/cache/conftool/dbconfig/20220829-174851-ladsgroup.json
  • 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P33623 and previous config saved to /var/cache/conftool/dbconfig/20220829-173345-ladsgroup.json
  • 17:25 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/SecurePoll/includes/Pages/VoterEligibilityPage.php: 2d6c378: Add missing comma (T316150) (duration: 03m 47s)
  • 17:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T316186)', diff saved to https://phabricator.wikimedia.org/P33622 and previous config saved to /var/cache/conftool/dbconfig/20220829-171839-ladsgroup.json
  • 17:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T316186)', diff saved to https://phabricator.wikimedia.org/P33621 and previous config saved to /var/cache/conftool/dbconfig/20220829-171116-ladsgroup.json
  • 17:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33620 and previous config saved to /var/cache/conftool/dbconfig/20220829-171035-ladsgroup.json
  • 17:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:03 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on restbase[1031-1033].eqiad.wmnet with reason: New hosts - awaiting cassandra joins
  • 17:03 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on restbase[1031-1033].eqiad.wmnet with reason: New hosts - awaiting cassandra joins
  • 17:02 krinkle@deploy1002: Synchronized wmf-config/: I1f79f21cbf8 (duration: 03m 42s)
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P33619 and previous config saved to /var/cache/conftool/dbconfig/20220829-165529-ladsgroup.json
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P33618 and previous config saved to /var/cache/conftool/dbconfig/20220829-164022-ladsgroup.json
  • 16:38 krinkle@deploy1002: Synchronized wmf-config/: I23c221 (duration: 03m 57s)
  • 16:34 krinkle@deploy1002: sync-file aborted: (no justification provided) (duration: 00m 01s)
  • 16:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33617 and previous config saved to /var/cache/conftool/dbconfig/20220829-162516-ladsgroup.json
  • 16:24 claime: repooled wtp1034.eqiad.wmnet and depooled parse1001.eqiad.wmnet
  • 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33616 and previous config saved to /var/cache/conftool/dbconfig/20220829-161959-ladsgroup.json
  • 16:12 claime: depooled wtp1034.eqiad.wmnet from parsoid cluster https://phabricator.wikimedia.org/T312638
  • 16:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:08 claime: pooled parse1001.eqiad.wmnet (php 7.4 only) in parsoid cluster https://phabricator.wikimedia.org/T312638
  • 16:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1033.eqiad.wmnet with OS buster
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P33615 and previous config saved to /var/cache/conftool/dbconfig/20220829-160452-ladsgroup.json
  • 16:02 cgoubert@puppetmaster1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=parsoid,name=parse1001.eqiad.wmnet
  • 16:02 cgoubert@puppetmaster1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=parsoid,name=parse1001.eqiad.wmnet
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P33614 and previous config saved to /var/cache/conftool/dbconfig/20220829-154946-ladsgroup.json
  • 15:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33613 and previous config saved to /var/cache/conftool/dbconfig/20220829-153440-ladsgroup.json
  • 15:31 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1033.eqiad.wmnet with reason: host reimage
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33612 and previous config saved to /var/cache/conftool/dbconfig/20220829-152741-ladsgroup.json
  • 15:27 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1033.eqiad.wmnet with reason: host reimage
  • 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33611 and previous config saved to /var/cache/conftool/dbconfig/20220829-152612-ladsgroup.json
  • 15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T316186)', diff saved to https://phabricator.wikimedia.org/P33610 and previous config saved to /var/cache/conftool/dbconfig/20220829-152549-ladsgroup.json
  • 15:14 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1033.eqiad.wmnet with OS buster
  • 15:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1032.eqiad.wmnet with OS buster
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P33609 and previous config saved to /var/cache/conftool/dbconfig/20220829-151042-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P33608 and previous config saved to /var/cache/conftool/dbconfig/20220829-145536-ladsgroup.json
  • 14:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
  • 14:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on restbase1031.eqiad.wmnet with reason: New host
  • 14:41 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on restbase1031.eqiad.wmnet with reason: New host
  • 14:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1032.eqiad.wmnet with reason: host reimage
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T316186)', diff saved to https://phabricator.wikimedia.org/P33607 and previous config saved to /var/cache/conftool/dbconfig/20220829-144030-ladsgroup.json
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T316186)', diff saved to https://phabricator.wikimedia.org/P33606 and previous config saved to /var/cache/conftool/dbconfig/20220829-143319-ladsgroup.json
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T316186)', diff saved to https://phabricator.wikimedia.org/P33605 and previous config saved to /var/cache/conftool/dbconfig/20220829-143255-ladsgroup.json
  • 14:28 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1032.eqiad.wmnet with OS buster
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P33604 and previous config saved to /var/cache/conftool/dbconfig/20220829-141749-ladsgroup.json
  • 14:06 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/: Config: Remove unused SearchSettingsForSDC.php (2/2, no-op; syncing deleted file requires syncing entire directory AFAICT) (duration: 03m 37s)
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P33603 and previous config saved to /var/cache/conftool/dbconfig/20220829-140243-ladsgroup.json
  • 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForWikibase.php: Config: Remove unused SearchSettingsForSDC.php (1/2, no-op) (duration: 03m 32s)
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T316186)', diff saved to https://phabricator.wikimedia.org/P33602 and previous config saved to /var/cache/conftool/dbconfig/20220829-134736-ladsgroup.json
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T316186)', diff saved to https://phabricator.wikimedia.org/P33601 and previous config saved to /var/cache/conftool/dbconfig/20220829-134014-ladsgroup.json
  • 13:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 13:33 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable wgDiscussionToolsEnablePermalinksBackend on testwiki (T315353) (duration: 03m 48s)
  • 13:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:31 marostegui: Failover m5 master
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:29 taavi@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php: Backport: persistRevisionThreadItems: Allow processing current revisions only (T315510) (duration: 03m 40s)
  • 13:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:23 taavi: taavi@mwmaint1002 ~ $ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki testwiki discussiontools
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:21 taavi@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/SecurePoll/: T316150 (duration: 03m 44s)
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:14 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Moving 1% of users to php 7.4 (duration: 04m 18s)
  • 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:12 vgutierrez: Increase roll-out of query-sorting to 75% - T314868
  • 13:06 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 13:00 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 12:14 vgutierrez: rolling restart of ats-be fleet wide to apply "Hide non session cookies during cache lookup" - T316338 T316337
  • 12:08 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host restbase1031.eqiad.wmnet with OS buster
  • 12:03 hnowlan: joining restbase1031-a to cassandra cluster
  • 12:03 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on restbase1031.eqiad.wmnet with reason: New host
  • 12:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on restbase1031.eqiad.wmnet with reason: New host
  • 11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T316186)', diff saved to https://phabricator.wikimedia.org/P33600 and previous config saved to /var/cache/conftool/dbconfig/20220829-115107-ladsgroup.json
  • 11:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1031.eqiad.wmnet with reason: host reimage
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P33599 and previous config saved to /var/cache/conftool/dbconfig/20220829-113600-ladsgroup.json
  • 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1031.eqiad.wmnet with reason: host reimage
  • 11:21 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1031.eqiad.wmnet with OS buster
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P33598 and previous config saved to /var/cache/conftool/dbconfig/20220829-112054-ladsgroup.json
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T316186)', diff saved to https://phabricator.wikimedia.org/P33597 and previous config saved to /var/cache/conftool/dbconfig/20220829-110548-ladsgroup.json
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T316186)', diff saved to https://phabricator.wikimedia.org/P33596 and previous config saved to /var/cache/conftool/dbconfig/20220829-105928-ladsgroup.json
  • 10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33595 and previous config saved to /var/cache/conftool/dbconfig/20220829-105904-ladsgroup.json
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P33593 and previous config saved to /var/cache/conftool/dbconfig/20220829-104358-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P33592 and previous config saved to /var/cache/conftool/dbconfig/20220829-102851-ladsgroup.json
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33591 and previous config saved to /var/cache/conftool/dbconfig/20220829-101345-ladsgroup.json
  • 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33590 and previous config saved to /var/cache/conftool/dbconfig/20220829-101029-ladsgroup.json
  • 10:09 vgutierrez: test trafficserver: Hide non session cookies during cache lookup in drmrs - T316338 T316337
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P33589 and previous config saved to /var/cache/conftool/dbconfig/20220829-095523-ladsgroup.json
  • 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P33587 and previous config saved to /var/cache/conftool/dbconfig/20220829-094017-ladsgroup.json
  • 09:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33586 and previous config saved to /var/cache/conftool/dbconfig/20220829-092511-ladsgroup.json
  • 09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33585 and previous config saved to /var/cache/conftool/dbconfig/20220829-092005-ladsgroup.json
  • 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33584 and previous config saved to /var/cache/conftool/dbconfig/20220829-091840-ladsgroup.json
  • 09:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33583 and previous config saved to /var/cache/conftool/dbconfig/20220829-091816-ladsgroup.json
  • 09:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
  • 09:10 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
  • 09:10 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
  • 09:03 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
  • 09:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P33582 and previous config saved to /var/cache/conftool/dbconfig/20220829-090310-ladsgroup.json
  • 08:55 vgutierrez: test trafficserver: Hide non session cookies during cache lookup in cp6016 - T316338 T316337
  • 08:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P33581 and previous config saved to /var/cache/conftool/dbconfig/20220829-084804-ladsgroup.json
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33580 and previous config saved to /var/cache/conftool/dbconfig/20220829-083258-ladsgroup.json
  • 08:31 marostegui: Failover m2 from db1159 to db1164 - T316202
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33579 and previous config saved to /var/cache/conftool/dbconfig/20220829-082643-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P33578 and previous config saved to /var/cache/conftool/dbconfig/20220829-081136-ladsgroup.json
  • 08:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:05 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Moving 0.1% of users to php 7.4 (duration: 03m 52s)
  • 08:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:58 vgutierrez: Increase roll-out of query-sorting to 50% - T314868
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P33577 and previous config saved to /var/cache/conftool/dbconfig/20220829-075630-ladsgroup.json
  • 07:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33576 and previous config saved to /var/cache/conftool/dbconfig/20220829-074124-ladsgroup.json
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33575 and previous config saved to /var/cache/conftool/dbconfig/20220829-073516-ladsgroup.json
  • 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33574 and previous config saved to /var/cache/conftool/dbconfig/20220829-073354-ladsgroup.json
  • 07:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 07:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 07:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T316186)', diff saved to https://phabricator.wikimedia.org/P33573 and previous config saved to /var/cache/conftool/dbconfig/20220829-073330-ladsgroup.json
  • 07:30 marostegui: Failover m3-master
  • 07:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P33572 and previous config saved to /var/cache/conftool/dbconfig/20220829-071824-ladsgroup.json
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2133,2160].codfw.wmnet,db[1117,1159,1164].eqiad.wmnet with reason: Switchover m2 T316202
  • 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2133,2160].codfw.wmnet,db[1117,1159,1164].eqiad.wmnet with reason: Switchover m2 T316202
  • 07:14 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 88b3ce8: Revert "testwiki: Growth: Assign enrollasmentor to *" (T310905, T314414) (duration: 03m 32s)
  • 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 20d6238: cswiki: fix extendedconfirmed permission for bot group (duration: 03m 43s)
  • 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P33571 and previous config saved to /var/cache/conftool/dbconfig/20220829-070318-ladsgroup.json
  • 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T316186)', diff saved to https://phabricator.wikimedia.org/P33570 and previous config saved to /var/cache/conftool/dbconfig/20220829-064811-ladsgroup.json
  • 06:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T316186)', diff saved to https://phabricator.wikimedia.org/P33569 and previous config saved to /var/cache/conftool/dbconfig/20220829-064154-ladsgroup.json
  • 06:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T316186)', diff saved to https://phabricator.wikimedia.org/P33568 and previous config saved to /var/cache/conftool/dbconfig/20220829-064113-ladsgroup.json
  • 06:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P33567 and previous config saved to /var/cache/conftool/dbconfig/20220829-062607-ladsgroup.json
  • 06:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:22 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to old templatelinks fields in commons (T312865) (duration: 03m 43s)
  • 06:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P33566 and previous config saved to /var/cache/conftool/dbconfig/20220829-061100-ladsgroup.json
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T316186)', diff saved to https://phabricator.wikimedia.org/P33565 and previous config saved to /var/cache/conftool/dbconfig/20220829-055554-ladsgroup.json
  • 05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T316186)', diff saved to https://phabricator.wikimedia.org/P33564 and previous config saved to /var/cache/conftool/dbconfig/20220829-054939-ladsgroup.json
  • 05:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 05:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 05:44 hashar: Restarted Gerrit for 3.4.5 upgrade
  • 05:40 hashar@deploy1002: Finished deploy [gerrit/gerrit@f1a820b]: Gerrit to 3.4.5 on gerrit1001 (duration: 00m 09s)
  • 05:40 hashar@deploy1002: Started deploy [gerrit/gerrit@f1a820b]: Gerrit to 3.4.5 on gerrit1001
  • 05:37 hashar@deploy1002: Finished deploy [gerrit/gerrit@f1a820b]: Gerrit to 3.4.5 on gerrit2002 (duration: 00m 11s)
  • 05:36 hashar@deploy1002: Started deploy [gerrit/gerrit@f1a820b]: Gerrit to 3.4.5 on gerrit2002
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust weights on s1 T316481', diff saved to https://phabricator.wikimedia.org/P33563 and previous config saved to /var/cache/conftool/dbconfig/20220829-051206-marostegui.json
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2103 as master in dbctl T316481', diff saved to https://phabricator.wikimedia.org/P33562 and previous config saved to /var/cache/conftool/dbconfig/20220829-051020-marostegui.json

2022-08-28

  • 21:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P33561 and previous config saved to /var/cache/conftool/dbconfig/20220828-210336-ladsgroup.json
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P33560 and previous config saved to /var/cache/conftool/dbconfig/20220828-210235-ladsgroup.json
  • 20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P33559 and previous config saved to /var/cache/conftool/dbconfig/20220828-204729-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T316186)', diff saved to https://phabricator.wikimedia.org/P33558 and previous config saved to /var/cache/conftool/dbconfig/20220828-203223-ladsgroup.json
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T316186)', diff saved to https://phabricator.wikimedia.org/P33557 and previous config saved to /var/cache/conftool/dbconfig/20220828-202701-ladsgroup.json
  • 20:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 20:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T316186)', diff saved to https://phabricator.wikimedia.org/P33556 and previous config saved to /var/cache/conftool/dbconfig/20220828-202638-ladsgroup.json
  • 20:18 ori: mw1411, mw1413, mw1419, mw1429, mw1431, mw1433: set energy-performance preference to 0 via 'x86_energy_perf_policy --hwp-epp 0' T315398
  • 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P33555 and previous config saved to /var/cache/conftool/dbconfig/20220828-201131-ladsgroup.json
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P33554 and previous config saved to /var/cache/conftool/dbconfig/20220828-195625-ladsgroup.json
  • 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T316186)', diff saved to https://phabricator.wikimedia.org/P33553 and previous config saved to /var/cache/conftool/dbconfig/20220828-194119-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T316186)', diff saved to https://phabricator.wikimedia.org/P33552 and previous config saved to /var/cache/conftool/dbconfig/20220828-193500-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 19:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 19:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33551 and previous config saved to /var/cache/conftool/dbconfig/20220828-192705-ladsgroup.json
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33550 and previous config saved to /var/cache/conftool/dbconfig/20220828-192550-ladsgroup.json
  • 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33549 and previous config saved to /var/cache/conftool/dbconfig/20220828-192042-ladsgroup.json
  • 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33548 and previous config saved to /var/cache/conftool/dbconfig/20220828-192016-ladsgroup.json
  • 19:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T316186)', diff saved to https://phabricator.wikimedia.org/P33547 and previous config saved to /var/cache/conftool/dbconfig/20220828-191951-ladsgroup.json
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T316186)', diff saved to https://phabricator.wikimedia.org/P33546 and previous config saved to /var/cache/conftool/dbconfig/20220828-191440-ladsgroup.json
  • 19:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 19:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2107 (T316186)', diff saved to https://phabricator.wikimedia.org/P33545 and previous config saved to /var/cache/conftool/dbconfig/20220828-191414-ladsgroup.json
  • 19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2107 (T316186)', diff saved to https://phabricator.wikimedia.org/P33544 and previous config saved to /var/cache/conftool/dbconfig/20220828-190849-ladsgroup.json
  • 19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 19:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T316186)', diff saved to https://phabricator.wikimedia.org/P33543 and previous config saved to /var/cache/conftool/dbconfig/20220828-190824-ladsgroup.json
  • 19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T316186)', diff saved to https://phabricator.wikimedia.org/P33542 and previous config saved to /var/cache/conftool/dbconfig/20220828-190303-ladsgroup.json
  • 19:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 19:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T316186)', diff saved to https://phabricator.wikimedia.org/P33541 and previous config saved to /var/cache/conftool/dbconfig/20220828-190238-ladsgroup.json
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T316186)', diff saved to https://phabricator.wikimedia.org/P33540 and previous config saved to /var/cache/conftool/dbconfig/20220828-185606-ladsgroup.json
  • 18:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 18:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 18:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 18:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T316186)', diff saved to https://phabricator.wikimedia.org/P33539 and previous config saved to /var/cache/conftool/dbconfig/20220828-185536-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T316186)', diff saved to https://phabricator.wikimedia.org/P33538 and previous config saved to /var/cache/conftool/dbconfig/20220828-185022-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T316186)', diff saved to https://phabricator.wikimedia.org/P33537 and previous config saved to /var/cache/conftool/dbconfig/20220828-184542-ladsgroup.json
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T316186)', diff saved to https://phabricator.wikimedia.org/P33536 and previous config saved to /var/cache/conftool/dbconfig/20220828-183915-ladsgroup.json
  • 18:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 18:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T316186)', diff saved to https://phabricator.wikimedia.org/P33535 and previous config saved to /var/cache/conftool/dbconfig/20220828-183850-ladsgroup.json
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T316186)', diff saved to https://phabricator.wikimedia.org/P33534 and previous config saved to /var/cache/conftool/dbconfig/20220828-183226-ladsgroup.json
  • 18:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 18:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 18:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 18:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T316186)', diff saved to https://phabricator.wikimedia.org/P33533 and previous config saved to /var/cache/conftool/dbconfig/20220828-183156-ladsgroup.json
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T316186)', diff saved to https://phabricator.wikimedia.org/P33532 and previous config saved to /var/cache/conftool/dbconfig/20220828-182630-ladsgroup.json
  • 18:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 18:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33531 and previous config saved to /var/cache/conftool/dbconfig/20220828-182605-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33530 and previous config saved to /var/cache/conftool/dbconfig/20220828-182350-ladsgroup.json
  • 18:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33529 and previous config saved to /var/cache/conftool/dbconfig/20220828-181830-ladsgroup.json
  • 18:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33528 and previous config saved to /var/cache/conftool/dbconfig/20220828-181805-ladsgroup.json
  • 18:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 18:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T316186)', diff saved to https://phabricator.wikimedia.org/P33527 and previous config saved to /var/cache/conftool/dbconfig/20220828-181421-ladsgroup.json
  • 18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T316186)', diff saved to https://phabricator.wikimedia.org/P33526 and previous config saved to /var/cache/conftool/dbconfig/20220828-180751-ladsgroup.json
  • 18:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 18:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T316186)', diff saved to https://phabricator.wikimedia.org/P33525 and previous config saved to /var/cache/conftool/dbconfig/20220828-180725-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T316186)', diff saved to https://phabricator.wikimedia.org/P33524 and previous config saved to /var/cache/conftool/dbconfig/20220828-180108-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T316186)', diff saved to https://phabricator.wikimedia.org/P33523 and previous config saved to /var/cache/conftool/dbconfig/20220828-180042-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T316186)', diff saved to https://phabricator.wikimedia.org/P33522 and previous config saved to /var/cache/conftool/dbconfig/20220828-175311-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 17:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T316186)', diff saved to https://phabricator.wikimedia.org/P33521 and previous config saved to /var/cache/conftool/dbconfig/20220828-175246-ladsgroup.json
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T316186)', diff saved to https://phabricator.wikimedia.org/P33520 and previous config saved to /var/cache/conftool/dbconfig/20220828-174655-ladsgroup.json
  • 17:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 17:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T316186)', diff saved to https://phabricator.wikimedia.org/P33519 and previous config saved to /var/cache/conftool/dbconfig/20220828-174630-ladsgroup.json
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T316186)', diff saved to https://phabricator.wikimedia.org/P33518 and previous config saved to /var/cache/conftool/dbconfig/20220828-174059-ladsgroup.json
  • 17:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 17:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 17:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling failed', diff saved to https://phabricator.wikimedia.org/P33517 and previous config saved to /var/cache/conftool/dbconfig/20220828-174002-ladsgroup.json
  • 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T316186)', diff saved to https://phabricator.wikimedia.org/P33516 and previous config saved to /var/cache/conftool/dbconfig/20220828-173304-ladsgroup.json
  • 17:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 17:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T316186)', diff saved to https://phabricator.wikimedia.org/P33515 and previous config saved to /var/cache/conftool/dbconfig/20220828-173241-ladsgroup.json
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P33514 and previous config saved to /var/cache/conftool/dbconfig/20220828-171734-ladsgroup.json
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P33513 and previous config saved to /var/cache/conftool/dbconfig/20220828-170228-ladsgroup.json
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T316186)', diff saved to https://phabricator.wikimedia.org/P33512 and previous config saved to /var/cache/conftool/dbconfig/20220828-164722-ladsgroup.json
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T316186)', diff saved to https://phabricator.wikimedia.org/P33511 and previous config saved to /var/cache/conftool/dbconfig/20220828-164211-ladsgroup.json
  • 16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T316186)', diff saved to https://phabricator.wikimedia.org/P33510 and previous config saved to /var/cache/conftool/dbconfig/20220828-164004-ladsgroup.json
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T316186)', diff saved to https://phabricator.wikimedia.org/P33509 and previous config saved to /var/cache/conftool/dbconfig/20220828-163447-ladsgroup.json
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T316186)', diff saved to https://phabricator.wikimedia.org/P33508 and previous config saved to /var/cache/conftool/dbconfig/20220828-163211-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T316186)', diff saved to https://phabricator.wikimedia.org/P33507 and previous config saved to /var/cache/conftool/dbconfig/20220828-162906-ladsgroup.json
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T316186)', diff saved to https://phabricator.wikimedia.org/P33506 and previous config saved to /var/cache/conftool/dbconfig/20220828-162349-ladsgroup.json
  • 16:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 16:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33505 and previous config saved to /var/cache/conftool/dbconfig/20220828-162324-ladsgroup.json
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P33504 and previous config saved to /var/cache/conftool/dbconfig/20220828-160818-ladsgroup.json
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P33503 and previous config saved to /var/cache/conftool/dbconfig/20220828-155312-ladsgroup.json
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33502 and previous config saved to /var/cache/conftool/dbconfig/20220828-153806-ladsgroup.json
  • 15:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33501 and previous config saved to /var/cache/conftool/dbconfig/20220828-153349-ladsgroup.json
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P33499 and previous config saved to /var/cache/conftool/dbconfig/20220828-150336-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33498 and previous config saved to /var/cache/conftool/dbconfig/20220828-144830-ladsgroup.json
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33497 and previous config saved to /var/cache/conftool/dbconfig/20220828-144319-ladsgroup.json
  • 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T316186)', diff saved to https://phabricator.wikimedia.org/P33496 and previous config saved to /var/cache/conftool/dbconfig/20220828-144257-ladsgroup.json
  • 14:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 14:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 14:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T316186)', diff saved to https://phabricator.wikimedia.org/P33495 and previous config saved to /var/cache/conftool/dbconfig/20220828-144232-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P33494 and previous config saved to /var/cache/conftool/dbconfig/20220828-142726-ladsgroup.json
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P33493 and previous config saved to /var/cache/conftool/dbconfig/20220828-141220-ladsgroup.json
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T316186)', diff saved to https://phabricator.wikimedia.org/P33492 and previous config saved to /var/cache/conftool/dbconfig/20220828-135713-ladsgroup.json
  • 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T316186)', diff saved to https://phabricator.wikimedia.org/P33491 and previous config saved to /var/cache/conftool/dbconfig/20220828-135158-ladsgroup.json
  • 13:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 13:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T316186)', diff saved to https://phabricator.wikimedia.org/P33490 and previous config saved to /var/cache/conftool/dbconfig/20220828-135133-ladsgroup.json
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P33489 and previous config saved to /var/cache/conftool/dbconfig/20220828-133627-ladsgroup.json
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P33488 and previous config saved to /var/cache/conftool/dbconfig/20220828-132120-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T316186)', diff saved to https://phabricator.wikimedia.org/P33487 and previous config saved to /var/cache/conftool/dbconfig/20220828-130614-ladsgroup.json
  • 13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T316186)', diff saved to https://phabricator.wikimedia.org/P33486 and previous config saved to /var/cache/conftool/dbconfig/20220828-130059-ladsgroup.json
  • 13:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 13:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T316186)', diff saved to https://phabricator.wikimedia.org/P33485 and previous config saved to /var/cache/conftool/dbconfig/20220828-130033-ladsgroup.json
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P33484 and previous config saved to /var/cache/conftool/dbconfig/20220828-124527-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P33483 and previous config saved to /var/cache/conftool/dbconfig/20220828-123021-ladsgroup.json
  • 12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T316186)', diff saved to https://phabricator.wikimedia.org/P33482 and previous config saved to /var/cache/conftool/dbconfig/20220828-121515-ladsgroup.json
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T316186)', diff saved to https://phabricator.wikimedia.org/P33481 and previous config saved to /var/cache/conftool/dbconfig/20220828-121000-ladsgroup.json
  • 12:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T316186)', diff saved to https://phabricator.wikimedia.org/P33480 and previous config saved to /var/cache/conftool/dbconfig/20220828-120931-ladsgroup.json
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P33479 and previous config saved to /var/cache/conftool/dbconfig/20220828-115424-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P33478 and previous config saved to /var/cache/conftool/dbconfig/20220828-113918-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T316186)', diff saved to https://phabricator.wikimedia.org/P33477 and previous config saved to /var/cache/conftool/dbconfig/20220828-112412-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T316186)', diff saved to https://phabricator.wikimedia.org/P33476 and previous config saved to /var/cache/conftool/dbconfig/20220828-111857-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T316186)', diff saved to https://phabricator.wikimedia.org/P33475 and previous config saved to /var/cache/conftool/dbconfig/20220828-111832-ladsgroup.json
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P33474 and previous config saved to /var/cache/conftool/dbconfig/20220828-110326-ladsgroup.json
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P33473 and previous config saved to /var/cache/conftool/dbconfig/20220828-104820-ladsgroup.json
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T316186)', diff saved to https://phabricator.wikimedia.org/P33472 and previous config saved to /var/cache/conftool/dbconfig/20220828-103314-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T316186)', diff saved to https://phabricator.wikimedia.org/P33471 and previous config saved to /var/cache/conftool/dbconfig/20220828-102800-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T316186)', diff saved to https://phabricator.wikimedia.org/P33470 and previous config saved to /var/cache/conftool/dbconfig/20220828-102423-ladsgroup.json
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P33469 and previous config saved to /var/cache/conftool/dbconfig/20220828-100917-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P33468 and previous config saved to /var/cache/conftool/dbconfig/20220828-095411-ladsgroup.json
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T316186)', diff saved to https://phabricator.wikimedia.org/P33467 and previous config saved to /var/cache/conftool/dbconfig/20220828-093904-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T316186)', diff saved to https://phabricator.wikimedia.org/P33466 and previous config saved to /var/cache/conftool/dbconfig/20220828-093346-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 09:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33465 and previous config saved to /var/cache/conftool/dbconfig/20220828-082851-ladsgroup.json
  • 08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P33464 and previous config saved to /var/cache/conftool/dbconfig/20220828-081344-ladsgroup.json
  • 07:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P33463 and previous config saved to /var/cache/conftool/dbconfig/20220828-075838-ladsgroup.json
  • 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33462 and previous config saved to /var/cache/conftool/dbconfig/20220828-074332-ladsgroup.json
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33461 and previous config saved to /var/cache/conftool/dbconfig/20220828-074116-ladsgroup.json
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P33460 and previous config saved to /var/cache/conftool/dbconfig/20220828-072610-ladsgroup.json
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P33459 and previous config saved to /var/cache/conftool/dbconfig/20220828-071103-ladsgroup.json
  • 06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33458 and previous config saved to /var/cache/conftool/dbconfig/20220828-065557-ladsgroup.json
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33457 and previous config saved to /var/cache/conftool/dbconfig/20220828-064952-ladsgroup.json
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33456 and previous config saved to /var/cache/conftool/dbconfig/20220828-064920-ladsgroup.json
  • 06:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 06:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2113 (T316186)', diff saved to https://phabricator.wikimedia.org/P33455 and previous config saved to /var/cache/conftool/dbconfig/20220828-064855-ladsgroup.json
  • 06:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2113', diff saved to https://phabricator.wikimedia.org/P33454 and previous config saved to /var/cache/conftool/dbconfig/20220828-063348-ladsgroup.json
  • 06:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2113', diff saved to https://phabricator.wikimedia.org/P33453 and previous config saved to /var/cache/conftool/dbconfig/20220828-061842-ladsgroup.json
  • 06:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2113 (T316186)', diff saved to https://phabricator.wikimedia.org/P33452 and previous config saved to /var/cache/conftool/dbconfig/20220828-060336-ladsgroup.json
  • 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2113 (T316186)', diff saved to https://phabricator.wikimedia.org/P33451 and previous config saved to /var/cache/conftool/dbconfig/20220828-055821-ladsgroup.json
  • 05:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T316186)', diff saved to https://phabricator.wikimedia.org/P33450 and previous config saved to /var/cache/conftool/dbconfig/20220828-055756-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P33449 and previous config saved to /var/cache/conftool/dbconfig/20220828-054249-ladsgroup.json
  • 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P33448 and previous config saved to /var/cache/conftool/dbconfig/20220828-052743-ladsgroup.json
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T316186)', diff saved to https://phabricator.wikimedia.org/P33447 and previous config saved to /var/cache/conftool/dbconfig/20220828-051237-ladsgroup.json
  • 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T316186)', diff saved to https://phabricator.wikimedia.org/P33446 and previous config saved to /var/cache/conftool/dbconfig/20220828-050729-ladsgroup.json
  • 05:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 05:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T316186)', diff saved to https://phabricator.wikimedia.org/P33445 and previous config saved to /var/cache/conftool/dbconfig/20220828-050704-ladsgroup.json
  • 04:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P33444 and previous config saved to /var/cache/conftool/dbconfig/20220828-045157-ladsgroup.json
  • 04:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P33443 and previous config saved to /var/cache/conftool/dbconfig/20220828-043651-ladsgroup.json
  • 04:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T316186)', diff saved to https://phabricator.wikimedia.org/P33442 and previous config saved to /var/cache/conftool/dbconfig/20220828-042145-ladsgroup.json
  • 04:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T316186)', diff saved to https://phabricator.wikimedia.org/P33441 and previous config saved to /var/cache/conftool/dbconfig/20220828-041622-ladsgroup.json
  • 04:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 04:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 04:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 04:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 04:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 04:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 04:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T316186)', diff saved to https://phabricator.wikimedia.org/P33440 and previous config saved to /var/cache/conftool/dbconfig/20220828-041231-ladsgroup.json
  • 03:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P33439 and previous config saved to /var/cache/conftool/dbconfig/20220828-035725-ladsgroup.json
  • 03:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P33438 and previous config saved to /var/cache/conftool/dbconfig/20220828-034219-ladsgroup.json
  • 03:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T316186)', diff saved to https://phabricator.wikimedia.org/P33437 and previous config saved to /var/cache/conftool/dbconfig/20220828-032713-ladsgroup.json
  • 03:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T316186)', diff saved to https://phabricator.wikimedia.org/P33436 and previous config saved to /var/cache/conftool/dbconfig/20220828-032202-ladsgroup.json
  • 03:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 03:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T316186)', diff saved to https://phabricator.wikimedia.org/P33435 and previous config saved to /var/cache/conftool/dbconfig/20220828-032137-ladsgroup.json
  • 03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P33434 and previous config saved to /var/cache/conftool/dbconfig/20220828-030631-ladsgroup.json
  • 02:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P33433 and previous config saved to /var/cache/conftool/dbconfig/20220828-025124-ladsgroup.json
  • 02:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T316186)', diff saved to https://phabricator.wikimedia.org/P33432 and previous config saved to /var/cache/conftool/dbconfig/20220828-023618-ladsgroup.json
  • 02:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T316186)', diff saved to https://phabricator.wikimedia.org/P33431 and previous config saved to /var/cache/conftool/dbconfig/20220828-023111-ladsgroup.json
  • 02:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 02:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 02:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 02:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 02:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T316186)', diff saved to https://phabricator.wikimedia.org/P33430 and previous config saved to /var/cache/conftool/dbconfig/20220828-022620-ladsgroup.json
  • 02:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P33429 and previous config saved to /var/cache/conftool/dbconfig/20220828-021114-ladsgroup.json
  • 01:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P33428 and previous config saved to /var/cache/conftool/dbconfig/20220828-015608-ladsgroup.json
  • 01:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T316186)', diff saved to https://phabricator.wikimedia.org/P33427 and previous config saved to /var/cache/conftool/dbconfig/20220828-014101-ladsgroup.json
  • 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T316186)', diff saved to https://phabricator.wikimedia.org/P33426 and previous config saved to /var/cache/conftool/dbconfig/20220828-013558-ladsgroup.json
  • 01:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 01:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T316186)', diff saved to https://phabricator.wikimedia.org/P33425 and previous config saved to /var/cache/conftool/dbconfig/20220828-013534-ladsgroup.json
  • 01:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P33424 and previous config saved to /var/cache/conftool/dbconfig/20220828-012028-ladsgroup.json
  • 01:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P33423 and previous config saved to /var/cache/conftool/dbconfig/20220828-010522-ladsgroup.json
  • 00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T316186)', diff saved to https://phabricator.wikimedia.org/P33422 and previous config saved to /var/cache/conftool/dbconfig/20220828-005015-ladsgroup.json
  • 00:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T316186)', diff saved to https://phabricator.wikimedia.org/P33421 and previous config saved to /var/cache/conftool/dbconfig/20220828-004410-ladsgroup.json
  • 00:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 00:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 00:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33420 and previous config saved to /var/cache/conftool/dbconfig/20220828-004329-ladsgroup.json
  • 00:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P33419 and previous config saved to /var/cache/conftool/dbconfig/20220828-002823-ladsgroup.json
  • 00:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P33418 and previous config saved to /var/cache/conftool/dbconfig/20220828-001317-ladsgroup.json

2022-08-27

  • 23:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33417 and previous config saved to /var/cache/conftool/dbconfig/20220827-235810-ladsgroup.json
  • 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33416 and previous config saved to /var/cache/conftool/dbconfig/20220827-235556-ladsgroup.json
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P33415 and previous config saved to /var/cache/conftool/dbconfig/20220827-234050-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P33414 and previous config saved to /var/cache/conftool/dbconfig/20220827-232544-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33413 and previous config saved to /var/cache/conftool/dbconfig/20220827-231038-ladsgroup.json
  • 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33412 and previous config saved to /var/cache/conftool/dbconfig/20220827-230339-ladsgroup.json
  • 23:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T316186)', diff saved to https://phabricator.wikimedia.org/P33411 and previous config saved to /var/cache/conftool/dbconfig/20220827-230214-ladsgroup.json
  • 23:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T316186)', diff saved to https://phabricator.wikimedia.org/P33410 and previous config saved to /var/cache/conftool/dbconfig/20220827-230150-ladsgroup.json
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P33408 and previous config saved to /var/cache/conftool/dbconfig/20220827-223137-ladsgroup.json
  • 22:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149', diff saved to https://phabricator.wikimedia.org/P33407 and previous config saved to /var/cache/conftool/dbconfig/20220827-221749-ladsgroup.json
  • 22:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2149.codfw.wmnet with reason: Sad disk
  • 22:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2149.codfw.wmnet with reason: Sad disk
  • 22:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T316186)', diff saved to https://phabricator.wikimedia.org/P33406 and previous config saved to /var/cache/conftool/dbconfig/20220827-221631-ladsgroup.json
  • 22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T316186)', diff saved to https://phabricator.wikimedia.org/P33405 and previous config saved to /var/cache/conftool/dbconfig/20220827-221118-ladsgroup.json
  • 22:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 22:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T316186)', diff saved to https://phabricator.wikimedia.org/P33404 and previous config saved to /var/cache/conftool/dbconfig/20220827-205809-ladsgroup.json
  • 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P33403 and previous config saved to /var/cache/conftool/dbconfig/20220827-204303-ladsgroup.json
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P33402 and previous config saved to /var/cache/conftool/dbconfig/20220827-202757-ladsgroup.json
  • 20:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T316186)', diff saved to https://phabricator.wikimedia.org/P33401 and previous config saved to /var/cache/conftool/dbconfig/20220827-201250-ladsgroup.json
  • 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T316186)', diff saved to https://phabricator.wikimedia.org/P33400 and previous config saved to /var/cache/conftool/dbconfig/20220827-200639-ladsgroup.json
  • 20:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 20:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 20:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 20:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T316186)', diff saved to https://phabricator.wikimedia.org/P33399 and previous config saved to /var/cache/conftool/dbconfig/20220827-200559-ladsgroup.json
  • 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P33398 and previous config saved to /var/cache/conftool/dbconfig/20220827-195053-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P33397 and previous config saved to /var/cache/conftool/dbconfig/20220827-193546-ladsgroup.json
  • 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T316186)', diff saved to https://phabricator.wikimedia.org/P33396 and previous config saved to /var/cache/conftool/dbconfig/20220827-192040-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T316186)', diff saved to https://phabricator.wikimedia.org/P33395 and previous config saved to /var/cache/conftool/dbconfig/20220827-191515-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 19:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T316186)', diff saved to https://phabricator.wikimedia.org/P33394 and previous config saved to /var/cache/conftool/dbconfig/20220827-191450-ladsgroup.json
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P33393 and previous config saved to /var/cache/conftool/dbconfig/20220827-185944-ladsgroup.json
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P33392 and previous config saved to /var/cache/conftool/dbconfig/20220827-184438-ladsgroup.json
  • 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T316186)', diff saved to https://phabricator.wikimedia.org/P33391 and previous config saved to /var/cache/conftool/dbconfig/20220827-182931-ladsgroup.json
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T316186)', diff saved to https://phabricator.wikimedia.org/P33390 and previous config saved to /var/cache/conftool/dbconfig/20220827-182408-ladsgroup.json
  • 18:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T316186)', diff saved to https://phabricator.wikimedia.org/P33389 and previous config saved to /var/cache/conftool/dbconfig/20220827-182343-ladsgroup.json
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P33388 and previous config saved to /var/cache/conftool/dbconfig/20220827-180836-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P33387 and previous config saved to /var/cache/conftool/dbconfig/20220827-175330-ladsgroup.json
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T316186)', diff saved to https://phabricator.wikimedia.org/P33386 and previous config saved to /var/cache/conftool/dbconfig/20220827-173824-ladsgroup.json
  • 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T316186)', diff saved to https://phabricator.wikimedia.org/P33385 and previous config saved to /var/cache/conftool/dbconfig/20220827-173305-ladsgroup.json
  • 17:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 17:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T316186)', diff saved to https://phabricator.wikimedia.org/P33384 and previous config saved to /var/cache/conftool/dbconfig/20220827-173240-ladsgroup.json
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P33383 and previous config saved to /var/cache/conftool/dbconfig/20220827-171734-ladsgroup.json
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P33382 and previous config saved to /var/cache/conftool/dbconfig/20220827-170227-ladsgroup.json
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T316186)', diff saved to https://phabricator.wikimedia.org/P33381 and previous config saved to /var/cache/conftool/dbconfig/20220827-164721-ladsgroup.json
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T316186)', diff saved to https://phabricator.wikimedia.org/P33380 and previous config saved to /var/cache/conftool/dbconfig/20220827-164156-ladsgroup.json
  • 16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T316186)', diff saved to https://phabricator.wikimedia.org/P33379 and previous config saved to /var/cache/conftool/dbconfig/20220827-163528-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P33378 and previous config saved to /var/cache/conftool/dbconfig/20220827-162022-ladsgroup.json
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P33377 and previous config saved to /var/cache/conftool/dbconfig/20220827-160516-ladsgroup.json
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T316186)', diff saved to https://phabricator.wikimedia.org/P33376 and previous config saved to /var/cache/conftool/dbconfig/20220827-155010-ladsgroup.json
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T316186)', diff saved to https://phabricator.wikimedia.org/P33375 and previous config saved to /var/cache/conftool/dbconfig/20220827-154452-ladsgroup.json
  • 15:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 15:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T316186)', diff saved to https://phabricator.wikimedia.org/P33374 and previous config saved to /var/cache/conftool/dbconfig/20220827-154410-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P33373 and previous config saved to /var/cache/conftool/dbconfig/20220827-152903-ladsgroup.json
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P33372 and previous config saved to /var/cache/conftool/dbconfig/20220827-151357-ladsgroup.json
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T316186)', diff saved to https://phabricator.wikimedia.org/P33371 and previous config saved to /var/cache/conftool/dbconfig/20220827-145851-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T316186)', diff saved to https://phabricator.wikimedia.org/P33370 and previous config saved to /var/cache/conftool/dbconfig/20220827-145224-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T316186)', diff saved to https://phabricator.wikimedia.org/P33369 and previous config saved to /var/cache/conftool/dbconfig/20220827-145201-ladsgroup.json
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P33368 and previous config saved to /var/cache/conftool/dbconfig/20220827-143654-ladsgroup.json
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P33367 and previous config saved to /var/cache/conftool/dbconfig/20220827-142148-ladsgroup.json
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T316186)', diff saved to https://phabricator.wikimedia.org/P33366 and previous config saved to /var/cache/conftool/dbconfig/20220827-140642-ladsgroup.json
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T316186)', diff saved to https://phabricator.wikimedia.org/P33365 and previous config saved to /var/cache/conftool/dbconfig/20220827-135719-ladsgroup.json
  • 13:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 13:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T316186)', diff saved to https://phabricator.wikimedia.org/P33364 and previous config saved to /var/cache/conftool/dbconfig/20220827-135655-ladsgroup.json
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P33363 and previous config saved to /var/cache/conftool/dbconfig/20220827-134149-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P33362 and previous config saved to /var/cache/conftool/dbconfig/20220827-132643-ladsgroup.json
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T316186)', diff saved to https://phabricator.wikimedia.org/P33361 and previous config saved to /var/cache/conftool/dbconfig/20220827-131136-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T316186)', diff saved to https://phabricator.wikimedia.org/P33360 and previous config saved to /var/cache/conftool/dbconfig/20220827-121121-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 12:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T316186)', diff saved to https://phabricator.wikimedia.org/P33359 and previous config saved to /var/cache/conftool/dbconfig/20220827-120059-ladsgroup.json
  • 11:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P33358 and previous config saved to /var/cache/conftool/dbconfig/20220827-114552-ladsgroup.json
  • 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P33357 and previous config saved to /var/cache/conftool/dbconfig/20220827-113046-ladsgroup.json
  • 11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T316186)', diff saved to https://phabricator.wikimedia.org/P33356 and previous config saved to /var/cache/conftool/dbconfig/20220827-111540-ladsgroup.json
  • 10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T316186)', diff saved to https://phabricator.wikimedia.org/P33355 and previous config saved to /var/cache/conftool/dbconfig/20220827-101523-ladsgroup.json
  • 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T316186)', diff saved to https://phabricator.wikimedia.org/P33354 and previous config saved to /var/cache/conftool/dbconfig/20220827-101459-ladsgroup.json
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P33353 and previous config saved to /var/cache/conftool/dbconfig/20220827-095953-ladsgroup.json
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P33352 and previous config saved to /var/cache/conftool/dbconfig/20220827-094446-ladsgroup.json
  • 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T316186)', diff saved to https://phabricator.wikimedia.org/P33351 and previous config saved to /var/cache/conftool/dbconfig/20220827-092940-ladsgroup.json
  • 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T316186)', diff saved to https://phabricator.wikimedia.org/P33350 and previous config saved to /var/cache/conftool/dbconfig/20220827-082924-ladsgroup.json
  • 08:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 08:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 01:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T316186)', diff saved to https://phabricator.wikimedia.org/P33349 and previous config saved to /var/cache/conftool/dbconfig/20220827-014831-ladsgroup.json
  • 01:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P33348 and previous config saved to /var/cache/conftool/dbconfig/20220827-013325-ladsgroup.json
  • 01:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P33347 and previous config saved to /var/cache/conftool/dbconfig/20220827-011819-ladsgroup.json
  • 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T316186)', diff saved to https://phabricator.wikimedia.org/P33346 and previous config saved to /var/cache/conftool/dbconfig/20220827-010313-ladsgroup.json
  • 00:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T316186)', diff saved to https://phabricator.wikimedia.org/P33345 and previous config saved to /var/cache/conftool/dbconfig/20220827-005555-ladsgroup.json
  • 00:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 00:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 00:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 00:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 00:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T316186)', diff saved to https://phabricator.wikimedia.org/P33344 and previous config saved to /var/cache/conftool/dbconfig/20220827-005525-ladsgroup.json
  • 00:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P33343 and previous config saved to /var/cache/conftool/dbconfig/20220827-004019-ladsgroup.json
  • 00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P33342 and previous config saved to /var/cache/conftool/dbconfig/20220827-002513-ladsgroup.json
  • 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T316186)', diff saved to https://phabricator.wikimedia.org/P33341 and previous config saved to /var/cache/conftool/dbconfig/20220827-001006-ladsgroup.json
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T316186)', diff saved to https://phabricator.wikimedia.org/P33340 and previous config saved to /var/cache/conftool/dbconfig/20220827-000442-ladsgroup.json
  • 00:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33339 and previous config saved to /var/cache/conftool/dbconfig/20220827-000415-ladsgroup.json

2022-08-26

  • 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P33338 and previous config saved to /var/cache/conftool/dbconfig/20220826-234908-ladsgroup.json
  • 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P33337 and previous config saved to /var/cache/conftool/dbconfig/20220826-233402-ladsgroup.json
  • 23:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33336 and previous config saved to /var/cache/conftool/dbconfig/20220826-231856-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33335 and previous config saved to /var/cache/conftool/dbconfig/20220826-231540-ladsgroup.json
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P33334 and previous config saved to /var/cache/conftool/dbconfig/20220826-230033-ladsgroup.json
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P33333 and previous config saved to /var/cache/conftool/dbconfig/20220826-224527-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33332 and previous config saved to /var/cache/conftool/dbconfig/20220826-223021-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33331 and previous config saved to /var/cache/conftool/dbconfig/20220826-222409-ladsgroup.json
  • 22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33330 and previous config saved to /var/cache/conftool/dbconfig/20220826-222345-ladsgroup.json
  • 22:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 22:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T316186)', diff saved to https://phabricator.wikimedia.org/P33329 and previous config saved to /var/cache/conftool/dbconfig/20220826-222320-ladsgroup.json
  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P33328 and previous config saved to /var/cache/conftool/dbconfig/20220826-220814-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118', diff saved to https://phabricator.wikimedia.org/P33327 and previous config saved to /var/cache/conftool/dbconfig/20220826-215307-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2118 (T316186)', diff saved to https://phabricator.wikimedia.org/P33326 and previous config saved to /var/cache/conftool/dbconfig/20220826-213801-ladsgroup.json
  • 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2118 (T316186)', diff saved to https://phabricator.wikimedia.org/P33325 and previous config saved to /var/cache/conftool/dbconfig/20220826-213140-ladsgroup.json
  • 21:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 21:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T316186)', diff saved to https://phabricator.wikimedia.org/P33324 and previous config saved to /var/cache/conftool/dbconfig/20220826-213115-ladsgroup.json
  • 21:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P33323 and previous config saved to /var/cache/conftool/dbconfig/20220826-211608-ladsgroup.json
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P33322 and previous config saved to /var/cache/conftool/dbconfig/20220826-210102-ladsgroup.json
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T316186)', diff saved to https://phabricator.wikimedia.org/P33321 and previous config saved to /var/cache/conftool/dbconfig/20220826-204555-ladsgroup.json
  • 20:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T316186)', diff saved to https://phabricator.wikimedia.org/P33320 and previous config saved to /var/cache/conftool/dbconfig/20220826-203935-ladsgroup.json
  • 20:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 20:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 20:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T316186)', diff saved to https://phabricator.wikimedia.org/P33319 and previous config saved to /var/cache/conftool/dbconfig/20220826-203910-ladsgroup.json
  • 20:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P33318 and previous config saved to /var/cache/conftool/dbconfig/20220826-202404-ladsgroup.json
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P33316 and previous config saved to /var/cache/conftool/dbconfig/20220826-200858-ladsgroup.json
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T316186)', diff saved to https://phabricator.wikimedia.org/P33315 and previous config saved to /var/cache/conftool/dbconfig/20220826-195351-ladsgroup.json
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T316186)', diff saved to https://phabricator.wikimedia.org/P33314 and previous config saved to /var/cache/conftool/dbconfig/20220826-194734-ladsgroup.json
  • 19:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 19:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T316186)', diff saved to https://phabricator.wikimedia.org/P33313 and previous config saved to /var/cache/conftool/dbconfig/20220826-194709-ladsgroup.json
  • 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P33312 and previous config saved to /var/cache/conftool/dbconfig/20220826-193203-ladsgroup.json
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P33311 and previous config saved to /var/cache/conftool/dbconfig/20220826-191657-ladsgroup.json
  • 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T316186)', diff saved to https://phabricator.wikimedia.org/P33310 and previous config saved to /var/cache/conftool/dbconfig/20220826-190151-ladsgroup.json
  • 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T316186)', diff saved to https://phabricator.wikimedia.org/P33309 and previous config saved to /var/cache/conftool/dbconfig/20220826-185527-ladsgroup.json
  • 18:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 18:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T316186)', diff saved to https://phabricator.wikimedia.org/P33308 and previous config saved to /var/cache/conftool/dbconfig/20220826-185502-ladsgroup.json
  • 18:40 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@c5f46a4]: (no justification provided) (duration: 00m 10s)
  • 18:40 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@c5f46a4]: (no justification provided)
  • 18:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P33307 and previous config saved to /var/cache/conftool/dbconfig/20220826-183956-ladsgroup.json
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P33306 and previous config saved to /var/cache/conftool/dbconfig/20220826-182450-ladsgroup.json
  • 18:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T316186)', diff saved to https://phabricator.wikimedia.org/P33305 and previous config saved to /var/cache/conftool/dbconfig/20220826-180943-ladsgroup.json
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T316186)', diff saved to https://phabricator.wikimedia.org/P33304 and previous config saved to /var/cache/conftool/dbconfig/20220826-180223-ladsgroup.json
  • 18:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 18:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T316186)', diff saved to https://phabricator.wikimedia.org/P33303 and previous config saved to /var/cache/conftool/dbconfig/20220826-180157-ladsgroup.json
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P33302 and previous config saved to /var/cache/conftool/dbconfig/20220826-174651-ladsgroup.json
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P33301 and previous config saved to /var/cache/conftool/dbconfig/20220826-173144-ladsgroup.json
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T316186)', diff saved to https://phabricator.wikimedia.org/P33300 and previous config saved to /var/cache/conftool/dbconfig/20220826-171638-ladsgroup.json
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T316186)', diff saved to https://phabricator.wikimedia.org/P33299 and previous config saved to /var/cache/conftool/dbconfig/20220826-170911-ladsgroup.json
  • 17:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 17:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T316186)', diff saved to https://phabricator.wikimedia.org/P33298 and previous config saved to /var/cache/conftool/dbconfig/20220826-170538-ladsgroup.json
  • 16:56 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@5d95fe5]: Add job for MediaWiki history dumps. (duration: 00m 13s)
  • 16:56 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@5d95fe5]: Add job for MediaWiki history dumps.
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P33297 and previous config saved to /var/cache/conftool/dbconfig/20220826-165032-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P33296 and previous config saved to /var/cache/conftool/dbconfig/20220826-163525-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T316186)', diff saved to https://phabricator.wikimedia.org/P33295 and previous config saved to /var/cache/conftool/dbconfig/20220826-162019-ladsgroup.json
  • 15:50 jynus: rolling restart of ms-backup1001,2, ms-backup2001,2
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T316186)', diff saved to https://phabricator.wikimedia.org/P33293 and previous config saved to /var/cache/conftool/dbconfig/20220826-152003-ladsgroup.json
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33292 and previous config saved to /var/cache/conftool/dbconfig/20220826-151921-ladsgroup.json
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P33291 and previous config saved to /var/cache/conftool/dbconfig/20220826-150415-ladsgroup.json
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P33290 and previous config saved to /var/cache/conftool/dbconfig/20220826-144908-ladsgroup.json
  • 14:38 jynus: rolling restart of backup1004-9, backup2004-9
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33289 and previous config saved to /var/cache/conftool/dbconfig/20220826-143402-ladsgroup.json
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33288 and previous config saved to /var/cache/conftool/dbconfig/20220826-142945-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P33286 and previous config saved to /var/cache/conftool/dbconfig/20220826-141438-ladsgroup.json
  • 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P33285 and previous config saved to /var/cache/conftool/dbconfig/20220826-135932-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33284 and previous config saved to /var/cache/conftool/dbconfig/20220826-134426-ladsgroup.json
  • 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T316186)', diff saved to https://phabricator.wikimedia.org/P33283 and previous config saved to /var/cache/conftool/dbconfig/20220826-133318-ladsgroup.json
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33281 and previous config saved to /var/cache/conftool/dbconfig/20220826-132817-root.json
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33280 and previous config saved to /var/cache/conftool/dbconfig/20220826-132751-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 13:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33279 and previous config saved to /var/cache/conftool/dbconfig/20220826-132304-ladsgroup.json
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33278 and previous config saved to /var/cache/conftool/dbconfig/20220826-131312-root.json
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P33277 and previous config saved to /var/cache/conftool/dbconfig/20220826-130756-ladsgroup.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33276 and previous config saved to /var/cache/conftool/dbconfig/20220826-125808-root.json
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P33275 and previous config saved to /var/cache/conftool/dbconfig/20220826-125250-ladsgroup.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33274 and previous config saved to /var/cache/conftool/dbconfig/20220826-124303-root.json
  • 12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33273 and previous config saved to /var/cache/conftool/dbconfig/20220826-123743-ladsgroup.json
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33272 and previous config saved to /var/cache/conftool/dbconfig/20220826-122758-root.json
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33271 and previous config saved to /var/cache/conftool/dbconfig/20220826-122527-ladsgroup.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33270 and previous config saved to /var/cache/conftool/dbconfig/20220826-121253-root.json
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P33269 and previous config saved to /var/cache/conftool/dbconfig/20220826-121021-ladsgroup.json
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33268 and previous config saved to /var/cache/conftool/dbconfig/20220826-115748-root.json
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P33267 and previous config saved to /var/cache/conftool/dbconfig/20220826-115514-ladsgroup.json
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33266 and previous config saved to /var/cache/conftool/dbconfig/20220826-114243-root.json
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33265 and previous config saved to /var/cache/conftool/dbconfig/20220826-114008-ladsgroup.json
  • 11:37 moritzm: installing intel-microcode updates on stretch hosts
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33264 and previous config saved to /var/cache/conftool/dbconfig/20220826-113511-ladsgroup.json
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T316186)', diff saved to https://phabricator.wikimedia.org/P33263 and previous config saved to /var/cache/conftool/dbconfig/20220826-113347-ladsgroup.json
  • 11:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T316186)', diff saved to https://phabricator.wikimedia.org/P33262 and previous config saved to /var/cache/conftool/dbconfig/20220826-112946-ladsgroup.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33261 and previous config saved to /var/cache/conftool/dbconfig/20220826-112739-root.json
  • 11:19 moritzm: uploaded intel-microcode 3.20220510.1~wmf9u1 to apt.wikimedia.org
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P33260 and previous config saved to /var/cache/conftool/dbconfig/20220826-111440-ladsgroup.json
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1194 (re)pooling @ 1%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33259 and previous config saved to /var/cache/conftool/dbconfig/20220826-111234-root.json
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P33258 and previous config saved to /var/cache/conftool/dbconfig/20220826-105934-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T316186)', diff saved to https://phabricator.wikimedia.org/P33257 and previous config saved to /var/cache/conftool/dbconfig/20220826-104427-ladsgroup.json
  • 10:44 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T316186)', diff saved to https://phabricator.wikimedia.org/P33256 and previous config saved to /var/cache/conftool/dbconfig/20220826-103707-ladsgroup.json
  • 10:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 10:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 10:33 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-test-eqiad cluster: Roll restart of jvm daemons.
  • 10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33255 and previous config saved to /var/cache/conftool/dbconfig/20220826-102510-ladsgroup.json
  • 10:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33254 and previous config saved to /var/cache/conftool/dbconfig/20220826-102334-ladsgroup.json
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33253 and previous config saved to /var/cache/conftool/dbconfig/20220826-102117-ladsgroup.json
  • 10:13 vgutierrez: stop testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/826785 in cp6016
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P33252 and previous config saved to /var/cache/conftool/dbconfig/20220826-100611-ladsgroup.json
  • 09:56 vgutierrez: testing https://gerrit.wikimedia.org/r/c/operations/puppet/+/826785 in cp6016
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P33251 and previous config saved to /var/cache/conftool/dbconfig/20220826-095104-ladsgroup.json
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33250 and previous config saved to /var/cache/conftool/dbconfig/20220826-093558-ladsgroup.json
  • 09:33 vgutierrez: disable origin coalescing in cp6007 and cp6008 - T315911
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33249 and previous config saved to /var/cache/conftool/dbconfig/20220826-093051-ladsgroup.json
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33248 and previous config saved to /var/cache/conftool/dbconfig/20220826-093034-ladsgroup.json
  • 09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33247 and previous config saved to /var/cache/conftool/dbconfig/20220826-093000-ladsgroup.json
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P33246 and previous config saved to /var/cache/conftool/dbconfig/20220826-091454-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P33245 and previous config saved to /var/cache/conftool/dbconfig/20220826-085947-ladsgroup.json
  • 08:47 vgutierrez: Increase roll-out of query-sorting to 30% - T314868
  • 08:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33244 and previous config saved to /var/cache/conftool/dbconfig/20220826-084441-ladsgroup.json
  • 08:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33243 and previous config saved to /var/cache/conftool/dbconfig/20220826-083424-ladsgroup.json
  • 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2025.codfw.wmnet to cluster codfw and group D
  • 08:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2025.codfw.wmnet to cluster codfw and group D
  • 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P33242 and previous config saved to /var/cache/conftool/dbconfig/20220826-081918-ladsgroup.json
  • 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 08:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P33241 and previous config saved to /var/cache/conftool/dbconfig/20220826-080411-ladsgroup.json
  • 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2025.codfw.wmnet with OS bullseye
  • 07:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33240 and previous config saved to /var/cache/conftool/dbconfig/20220826-074905-ladsgroup.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33239 and previous config saved to /var/cache/conftool/dbconfig/20220826-074801-root.json
  • 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2025.codfw.wmnet with reason: host reimage
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33238 and previous config saved to /var/cache/conftool/dbconfig/20220826-074434-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33237 and previous config saved to /var/cache/conftool/dbconfig/20220826-074412-root.json
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33236 and previous config saved to /var/cache/conftool/dbconfig/20220826-074252-ladsgroup.json
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33235 and previous config saved to /var/cache/conftool/dbconfig/20220826-074140-root.json
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33234 and previous config saved to /var/cache/conftool/dbconfig/20220826-074126-ladsgroup.json
  • 07:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2025.codfw.wmnet with reason: host reimage
  • 07:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 07:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33233 and previous config saved to /var/cache/conftool/dbconfig/20220826-074052-ladsgroup.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33232 and previous config saved to /var/cache/conftool/dbconfig/20220826-073256-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33231 and previous config saved to /var/cache/conftool/dbconfig/20220826-072929-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33230 and previous config saved to /var/cache/conftool/dbconfig/20220826-072908-root.json
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33229 and previous config saved to /var/cache/conftool/dbconfig/20220826-072635-root.json
  • 07:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P33228 and previous config saved to /var/cache/conftool/dbconfig/20220826-072545-ladsgroup.json
  • 07:24 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2025.codfw.wmnet with OS bullseye
  • 07:23 vgutierrez: Increase roll-out of query-sorting to 15% - T314868
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33227 and previous config saved to /var/cache/conftool/dbconfig/20220826-071751-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33226 and previous config saved to /var/cache/conftool/dbconfig/20220826-071424-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33225 and previous config saved to /var/cache/conftool/dbconfig/20220826-071403-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33224 and previous config saved to /var/cache/conftool/dbconfig/20220826-071131-root.json
  • 07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P33223 and previous config saved to /var/cache/conftool/dbconfig/20220826-071039-ladsgroup.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33222 and previous config saved to /var/cache/conftool/dbconfig/20220826-070247-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33221 and previous config saved to /var/cache/conftool/dbconfig/20220826-065919-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33220 and previous config saved to /var/cache/conftool/dbconfig/20220826-065858-root.json
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33219 and previous config saved to /var/cache/conftool/dbconfig/20220826-065626-root.json
  • 06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33218 and previous config saved to /var/cache/conftool/dbconfig/20220826-065533-ladsgroup.json
  • 06:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33217 and previous config saved to /var/cache/conftool/dbconfig/20220826-065217-ladsgroup.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33216 and previous config saved to /var/cache/conftool/dbconfig/20220826-064742-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33215 and previous config saved to /var/cache/conftool/dbconfig/20220826-064414-root.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33214 and previous config saved to /var/cache/conftool/dbconfig/20220826-064353-root.json
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33213 and previous config saved to /var/cache/conftool/dbconfig/20220826-064121-root.json
  • 06:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P33212 and previous config saved to /var/cache/conftool/dbconfig/20220826-063711-ladsgroup.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33211 and previous config saved to /var/cache/conftool/dbconfig/20220826-063237-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33210 and previous config saved to /var/cache/conftool/dbconfig/20220826-062910-root.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33209 and previous config saved to /var/cache/conftool/dbconfig/20220826-062849-root.json
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33208 and previous config saved to /var/cache/conftool/dbconfig/20220826-062616-root.json
  • 06:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P33207 and previous config saved to /var/cache/conftool/dbconfig/20220826-062205-ladsgroup.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 3%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33206 and previous config saved to /var/cache/conftool/dbconfig/20220826-061732-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33205 and previous config saved to /var/cache/conftool/dbconfig/20220826-061405-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33204 and previous config saved to /var/cache/conftool/dbconfig/20220826-061344-root.json
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33203 and previous config saved to /var/cache/conftool/dbconfig/20220826-061112-root.json
  • 06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P33202 and previous config saved to /var/cache/conftool/dbconfig/20220826-060734-ladsgroup.json
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33201 and previous config saved to /var/cache/conftool/dbconfig/20220826-060658-ladsgroup.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 2%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33200 and previous config saved to /var/cache/conftool/dbconfig/20220826-060227-root.json
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33199 and previous config saved to /var/cache/conftool/dbconfig/20220826-060203-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T316186)', diff saved to https://phabricator.wikimedia.org/P33198 and previous config saved to /var/cache/conftool/dbconfig/20220826-060146-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 05:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33197 and previous config saved to /var/cache/conftool/dbconfig/20220826-055900-root.json
  • 05:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33196 and previous config saved to /var/cache/conftool/dbconfig/20220826-055839-root.json
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33195 and previous config saved to /var/cache/conftool/dbconfig/20220826-055607-root.json
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P33194 and previous config saved to /var/cache/conftool/dbconfig/20220826-055553-ladsgroup.json
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P33193 and previous config saved to /var/cache/conftool/dbconfig/20220826-055420-ladsgroup.json
  • 05:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P33192 and previous config saved to /var/cache/conftool/dbconfig/20220826-055229-ladsgroup.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P33191 and previous config saved to /var/cache/conftool/dbconfig/20220826-054722-root.json
  • 05:47 marostegui: Failover m2-master
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1193 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33190 and previous config saved to /var/cache/conftool/dbconfig/20220826-054356-root.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1192 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33189 and previous config saved to /var/cache/conftool/dbconfig/20220826-054334-root.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1185 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33188 and previous config saved to /var/cache/conftool/dbconfig/20220826-054102-root.json
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P33186 and previous config saved to /var/cache/conftool/dbconfig/20220826-054048-ladsgroup.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P33185 and previous config saved to /var/cache/conftool/dbconfig/20220826-054023-root.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1194 for the first time in s7 T313569', diff saved to https://phabricator.wikimedia.org/P33184 and previous config saved to /var/cache/conftool/dbconfig/20220826-053954-marostegui.json
  • 05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P33183 and previous config saved to /var/cache/conftool/dbconfig/20220826-053915-ladsgroup.json
  • 05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P33182 and previous config saved to /var/cache/conftool/dbconfig/20220826-053724-ladsgroup.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1193 to dbctl T313569', diff saved to https://phabricator.wikimedia.org/P33181 and previous config saved to /var/cache/conftool/dbconfig/20220826-052715-marostegui.json
  • 05:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P33180 and previous config saved to /var/cache/conftool/dbconfig/20220826-052544-ladsgroup.json
  • 05:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P33179 and previous config saved to /var/cache/conftool/dbconfig/20220826-052410-ladsgroup.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1192 to dbctl T313569', diff saved to https://phabricator.wikimedia.org/P33178 and previous config saved to /var/cache/conftool/dbconfig/20220826-052233-marostegui.json
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1138 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P33177 and previous config saved to /var/cache/conftool/dbconfig/20220826-052219-ladsgroup.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1185 for the first time in s5 T313569', diff saved to https://phabricator.wikimedia.org/P33176 and previous config saved to /var/cache/conftool/dbconfig/20220826-051721-marostegui.json
  • 05:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P33175 and previous config saved to /var/cache/conftool/dbconfig/20220826-051039-ladsgroup.json
  • 05:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P33174 and previous config saved to /var/cache/conftool/dbconfig/20220826-050906-ladsgroup.json
  • 05:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling for maintenance', diff saved to https://phabricator.wikimedia.org/P33173 and previous config saved to /var/cache/conftool/dbconfig/20220826-050652-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 05:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T312160)', diff saved to https://phabricator.wikimedia.org/P33172 and previous config saved to /var/cache/conftool/dbconfig/20220826-003819-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P33171 and previous config saved to /var/cache/conftool/dbconfig/20220826-002313-ladsgroup.json
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P33170 and previous config saved to /var/cache/conftool/dbconfig/20220826-000807-ladsgroup.json

2022-08-25

  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T312160)', diff saved to https://phabricator.wikimedia.org/P33169 and previous config saved to /var/cache/conftool/dbconfig/20220825-235300-ladsgroup.json
  • 22:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T316186)', diff saved to https://phabricator.wikimedia.org/P33168 and previous config saved to /var/cache/conftool/dbconfig/20220825-223805-ladsgroup.json
  • 22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P33167 and previous config saved to /var/cache/conftool/dbconfig/20220825-222259-ladsgroup.json
  • 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T312160)', diff saved to https://phabricator.wikimedia.org/P33165 and previous config saved to /var/cache/conftool/dbconfig/20220825-220937-ladsgroup.json
  • 22:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 22:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P33164 and previous config saved to /var/cache/conftool/dbconfig/20220825-220753-ladsgroup.json
  • 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T316186)', diff saved to https://phabricator.wikimedia.org/P33163 and previous config saved to /var/cache/conftool/dbconfig/20220825-215247-ladsgroup.json
  • 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T316186)', diff saved to https://phabricator.wikimedia.org/P33162 and previous config saved to /var/cache/conftool/dbconfig/20220825-214722-ladsgroup.json
  • 21:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 21:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 21:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T316186)', diff saved to https://phabricator.wikimedia.org/P33161 and previous config saved to /var/cache/conftool/dbconfig/20220825-214649-ladsgroup.json
  • 21:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P33160 and previous config saved to /var/cache/conftool/dbconfig/20220825-213143-ladsgroup.json
  • 21:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P33159 and previous config saved to /var/cache/conftool/dbconfig/20220825-211637-ladsgroup.json
  • 21:12 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
  • 21:02 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T316186)', diff saved to https://phabricator.wikimedia.org/P33158 and previous config saved to /var/cache/conftool/dbconfig/20220825-210130-ladsgroup.json
  • 20:56 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
  • 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:47 urbanecm: UTC late B&C window done
  • 20:46 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1aafdf0: cswiki: Add extendedconfirmed group/protection level (T316283) (duration: 03m 42s)
  • 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2067.codfw.wmnet
  • 20:45 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2067.codfw.wmnet
  • 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:39 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/VisualEditor/: 223e81f: Update VE core submodule to master (d4c438548; T316219) (duration: 03m 42s)
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:35 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.26/skins/Timeless/: ba0e981: Hide new associatedPages navigation items (T316196) (duration: 03m 41s)
  • 20:33 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:31 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.26/skins/Vector/resources/skins.vector.styles/layouts/screen.less: fe3382e: Add clearfix to .mw-body-subheader (T316134, T316095) (duration: 03m 25s)
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T316186)', diff saved to https://phabricator.wikimedia.org/P33157 and previous config saved to /var/cache/conftool/dbconfig/20220825-202716-ladsgroup.json
  • 20:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 20:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 20:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T316186)', diff saved to https://phabricator.wikimedia.org/P33156 and previous config saved to /var/cache/conftool/dbconfig/20220825-202647-ladsgroup.json
  • 20:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f37eff3: Make DiscussionTools autotopicsub also opt-out on A/B test wikis (T314693) (duration: 03m 37s)
  • 20:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 20:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T312160)', diff saved to https://phabricator.wikimedia.org/P33155 and previous config saved to /var/cache/conftool/dbconfig/20220825-201756-ladsgroup.json
  • 20:17 urbanecm: [urbanecm@deploy1002 ~]$ rm /var/lock/scap.operations_mediawiki-config.lock # connection to deploy1002 handled, to let me re-sync
  • 20:14 urandom: re-rebooting ms-be2067 to "fix" disk enumeration(?) -- T314049
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:11 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
  • 20:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P33154 and previous config saved to /var/cache/conftool/dbconfig/20220825-201141-ladsgroup.json
  • 20:07 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P33153 and previous config saved to /var/cache/conftool/dbconfig/20220825-200250-ladsgroup.json
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P33152 and previous config saved to /var/cache/conftool/dbconfig/20220825-195635-ladsgroup.json
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P33151 and previous config saved to /var/cache/conftool/dbconfig/20220825-194744-ladsgroup.json
  • 19:42 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
  • 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T316186)', diff saved to https://phabricator.wikimedia.org/P33150 and previous config saved to /var/cache/conftool/dbconfig/20220825-194129-ladsgroup.json
  • 19:41 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
  • 19:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices1003
  • 19:37 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:36 urandom: rebooting ms-be2067 to "fix" disk enumeration(?) -- T314049
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T316186)', diff saved to https://phabricator.wikimedia.org/P33149 and previous config saved to /var/cache/conftool/dbconfig/20220825-193513-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T316186)', diff saved to https://phabricator.wikimedia.org/P33148 and previous config saved to /var/cache/conftool/dbconfig/20220825-193430-ladsgroup.json
  • 19:33 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 19:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T312160)', diff saved to https://phabricator.wikimedia.org/P33147 and previous config saved to /var/cache/conftool/dbconfig/20220825-193238-ladsgroup.json
  • 19:29 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1003
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P33146 and previous config saved to /var/cache/conftool/dbconfig/20220825-191924-ladsgroup.json
  • 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P33145 and previous config saved to /var/cache/conftool/dbconfig/20220825-190417-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T316186)', diff saved to https://phabricator.wikimedia.org/P33144 and previous config saved to /var/cache/conftool/dbconfig/20220825-184911-ladsgroup.json
  • 18:48 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
  • 18:48 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@d00af45]: bump elasticsearch-hadoop to 7.10.2 (duration: 02m 07s)
  • 18:47 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic elasticsearch and plugin upgrade - bking@cumin2002 - T316159
  • 18:45 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@d00af45]: bump elasticsearch-hadoop to 7.10.2
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T316186)', diff saved to https://phabricator.wikimedia.org/P33143 and previous config saved to /var/cache/conftool/dbconfig/20220825-184301-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T316186)', diff saved to https://phabricator.wikimedia.org/P33142 and previous config saved to /var/cache/conftool/dbconfig/20220825-184233-ladsgroup.json
  • 18:36 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 18:36 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
  • 18:35 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 18:34 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
  • 18:34 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 18:33 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: sync
  • 18:33 ottomata: rolling restart of eventgate-analytics-external to pick up retroactive schema change for android schemas in T316047
  • 18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P33141 and previous config saved to /var/cache/conftool/dbconfig/20220825-182727-ladsgroup.json
  • 18:19 dancy@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
  • 18:18 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
  • 18:18 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
  • 18:13 dancy@deploy1002: Installation of scap version "4.15.0" completed for 557 hosts
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P33140 and previous config saved to /var/cache/conftool/dbconfig/20220825-181221-ladsgroup.json
  • 18:11 dancy@deploy1002: Installing scap version "4.15.0" for 557 hosts
  • 18:11 dancy@deploy1002: install-world aborted: (duration: 00m 02s)
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T316186)', diff saved to https://phabricator.wikimedia.org/P33139 and previous config saved to /var/cache/conftool/dbconfig/20220825-175715-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T316186)', diff saved to https://phabricator.wikimedia.org/P33138 and previous config saved to /var/cache/conftool/dbconfig/20220825-174946-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 17:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2115 (T312160)', diff saved to https://phabricator.wikimedia.org/P33137 and previous config saved to /var/cache/conftool/dbconfig/20220825-174826-ladsgroup.json
  • 17:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2115.codfw.wmnet with reason: Maintenance
  • 17:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2115.codfw.wmnet with reason: Maintenance
  • 17:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33136 and previous config saved to /var/cache/conftool/dbconfig/20220825-173731-ladsgroup.json
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P33135 and previous config saved to /var/cache/conftool/dbconfig/20220825-172225-ladsgroup.json
  • 17:10 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:08 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P33133 and previous config saved to /var/cache/conftool/dbconfig/20220825-170719-ladsgroup.json
  • 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33132 and previous config saved to /var/cache/conftool/dbconfig/20220825-165213-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33131 and previous config saved to /var/cache/conftool/dbconfig/20220825-164556-ladsgroup.json
  • 16:40 urandom: shutting down ms-be2067.codfw.wmnet for backplane replacement -- T314049
  • 16:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: backplane replacement
  • 16:37 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: backplane replacement
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P33130 and previous config saved to /var/cache/conftool/dbconfig/20220825-163050-ladsgroup.json
  • 16:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P33129 and previous config saved to /var/cache/conftool/dbconfig/20220825-161544-ladsgroup.json
  • 16:07 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
  • 16:07 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
  • 16:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 16:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120 (T312160)', diff saved to https://phabricator.wikimedia.org/P33128 and previous config saved to /var/cache/conftool/dbconfig/20220825-160250-ladsgroup.json
  • 16:00 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
  • 16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33127 and previous config saved to /var/cache/conftool/dbconfig/20220825-160036-ladsgroup.json
  • 16:00 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T316186)', diff saved to https://phabricator.wikimedia.org/P33126 and previous config saved to /var/cache/conftool/dbconfig/20220825-155529-ladsgroup.json
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T316186)', diff saved to https://phabricator.wikimedia.org/P33125 and previous config saved to /var/cache/conftool/dbconfig/20220825-155506-ladsgroup.json
  • 15:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P33124 and previous config saved to /var/cache/conftool/dbconfig/20220825-155401-ladsgroup.json
  • 15:52 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
  • 15:52 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
  • 15:50 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
  • 15:50 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120', diff saved to https://phabricator.wikimedia.org/P33123 and previous config saved to /var/cache/conftool/dbconfig/20220825-154743-ladsgroup.json
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P33122 and previous config saved to /var/cache/conftool/dbconfig/20220825-154438-ladsgroup.json
  • 15:42 jynus: restart backup1002 (interrupted before), backup1003, backup2003
  • 15:41 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
  • 15:41 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120', diff saved to https://phabricator.wikimedia.org/P33121 and previous config saved to /var/cache/conftool/dbconfig/20220825-153237-ladsgroup.json
  • 15:31 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 09s)
  • 15:31 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T316186)', diff saved to https://phabricator.wikimedia.org/P33120 and previous config saved to /var/cache/conftool/dbconfig/20220825-152932-ladsgroup.json
  • 15:27 bmansurov@deploy1002: Finished deploy [airflow-dags/research@5712187]: (no justification provided) (duration: 00m 20s)
  • 15:26 bmansurov@deploy1002: Started deploy [airflow-dags/research@5712187]: (no justification provided)
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T316186)', diff saved to https://phabricator.wikimedia.org/P33119 and previous config saved to /var/cache/conftool/dbconfig/20220825-152417-ladsgroup.json
  • 15:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 15:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 15:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120 (T312160)', diff saved to https://phabricator.wikimedia.org/P33118 and previous config saved to /var/cache/conftool/dbconfig/20220825-151731-ladsgroup.json
  • 14:44 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
  • 14:43 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
  • 14:42 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
  • 14:42 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
  • 14:36 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
  • 14:36 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
  • 14:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on ganeti2025.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 14:35 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on ganeti2025.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 14:32 vgutierrez: enable origin coalescing in ats-be@cp600[78] [expect crashes] - T315911
  • 14:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1004.eqiad.wmnet
  • 14:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1004.eqiad.wmnet
  • 14:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 14:15 claime: finished rebooting people1003 (people.wikimedia.org)
  • 14:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 14:13 claime: rebooting people1003 (people.wikimedia.org)
  • 14:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T316186)', diff saved to https://phabricator.wikimedia.org/P33117 and previous config saved to /var/cache/conftool/dbconfig/20220825-140915-ladsgroup.json
  • 14:07 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
  • 13:57 hashar@deploy1002: Finished scap: Backport for CX3 Build 0.2.0+20220825 (T309986 T301222) (duration: 24m 56s)
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P33116 and previous config saved to /var/cache/conftool/dbconfig/20220825-135408-ladsgroup.json
  • 13:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1003.eqiad.wmnet
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1120 (T312160)', diff saved to https://phabricator.wikimedia.org/P33115 and previous config saved to /var/cache/conftool/dbconfig/20220825-134318-ladsgroup.json
  • 13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1120.eqiad.wmnet with reason: Maintenance
  • 13:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1120.eqiad.wmnet with reason: Maintenance
  • 13:39 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1003.eqiad.wmnet
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P33114 and previous config saved to /var/cache/conftool/dbconfig/20220825-133902-ladsgroup.json
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:32 hashar@deploy1002: Started scap: Backport for CX3 Build 0.2.0+20220825 (T309986 T301222)
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T316186)', diff saved to https://phabricator.wikimedia.org/P33113 and previous config saved to /var/cache/conftool/dbconfig/20220825-132356-ladsgroup.json
  • 13:19 vgutierrez: disable origin coalescing in ats-be globally - T315911
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T316186)', diff saved to https://phabricator.wikimedia.org/P33112 and previous config saved to /var/cache/conftool/dbconfig/20220825-131735-ladsgroup.json
  • 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P33111 and previous config saved to /var/cache/conftool/dbconfig/20220825-130950-ladsgroup.json
  • 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T316186)', diff saved to https://phabricator.wikimedia.org/P33110 and previous config saved to /var/cache/conftool/dbconfig/20220825-130235-ladsgroup.json
  • 13:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 13:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 13:00 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P33109 and previous config saved to /var/cache/conftool/dbconfig/20220825-125806-ladsgroup.json
  • 12:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1002.eqiad.wmnet
  • 12:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:49 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1002.eqiad.wmnet
  • 12:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1001.eqiad.wmnet
  • 12:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:46 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db2114.codfw.wmnet
  • 12:45 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.26 refs T314187
  • 12:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1001.eqiad.wmnet
  • 12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.reboot-single for host db2114.codfw.wmnet
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T316186)', diff saved to https://phabricator.wikimedia.org/P33108 and previous config saved to /var/cache/conftool/dbconfig/20220825-123448-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 12:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 12:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Testing a script
  • 12:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Testing a script
  • 12:06 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 11 days, 0:00:00 on ms-fe1012.eqiad.wmnet with reason: known depooled, left for investigation
  • 12:06 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 11 days, 0:00:00 on ms-fe1012.eqiad.wmnet with reason: known depooled, left for investigation
  • 11:57 godog: roll-restart swift-proxy on thanos-fe* and ms-fe* (not ms-fe1012)
  • 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 11:40 godog: depool ms-fe1012, leave swift-proxy alone for investigation
  • 11:32 godog: restart swift-proxy on ms-fe1010
  • 11:29 marostegui: Failover m1-master
  • 11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T314041)', diff saved to https://phabricator.wikimedia.org/P33106 and previous config saved to /var/cache/conftool/dbconfig/20220825-110448-ladsgroup.json
  • 10:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P33105 and previous config saved to /var/cache/conftool/dbconfig/20220825-104942-ladsgroup.json
  • 10:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P33104 and previous config saved to /var/cache/conftool/dbconfig/20220825-103436-ladsgroup.json
  • 10:23 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T314041)', diff saved to https://phabricator.wikimedia.org/P33103 and previous config saved to /var/cache/conftool/dbconfig/20220825-101930-ladsgroup.json
  • 10:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
  • 10:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T312160)', diff saved to https://phabricator.wikimedia.org/P33102 and previous config saved to /var/cache/conftool/dbconfig/20220825-100915-ladsgroup.json
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
  • 10:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P33100 and previous config saved to /var/cache/conftool/dbconfig/20220825-100010-root.json
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T314041)', diff saved to https://phabricator.wikimedia.org/P33099 and previous config saved to /var/cache/conftool/dbconfig/20220825-095942-ladsgroup.json
  • 09:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P33098 and previous config saved to /var/cache/conftool/dbconfig/20220825-095611-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P33097 and previous config saved to /var/cache/conftool/dbconfig/20220825-095408-ladsgroup.json
  • 09:51 moritzm: installing libxslt security updates on bullseye
  • 09:50 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 09:49 jynus: restart backup1002, backup2002
  • 09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P33096 and previous config saved to /var/cache/conftool/dbconfig/20220825-094646-ladsgroup.json
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33095 and previous config saved to /var/cache/conftool/dbconfig/20220825-094438-root.json
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33094 and previous config saved to /var/cache/conftool/dbconfig/20220825-094401-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33093 and previous config saved to /var/cache/conftool/dbconfig/20220825-094353-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33092 and previous config saved to /var/cache/conftool/dbconfig/20220825-094345-root.json
  • 09:39 marostegui: Reboot stand by dbproxy hosts
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P33091 and previous config saved to /var/cache/conftool/dbconfig/20220825-093902-ladsgroup.json
  • 09:35 jynus: restart backup2001
  • 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P33090 and previous config saved to /var/cache/conftool/dbconfig/20220825-093140-ladsgroup.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33089 and previous config saved to /var/cache/conftool/dbconfig/20220825-092933-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33088 and previous config saved to /var/cache/conftool/dbconfig/20220825-092856-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33087 and previous config saved to /var/cache/conftool/dbconfig/20220825-092848-root.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33086 and previous config saved to /var/cache/conftool/dbconfig/20220825-092840-root.json
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T312160)', diff saved to https://phabricator.wikimedia.org/P33085 and previous config saved to /var/cache/conftool/dbconfig/20220825-092356-ladsgroup.json
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T314041)', diff saved to https://phabricator.wikimedia.org/P33084 and previous config saved to /var/cache/conftool/dbconfig/20220825-091633-ladsgroup.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P33083 and previous config saved to /var/cache/conftool/dbconfig/20220825-091448-root.json
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T314041)', diff saved to https://phabricator.wikimedia.org/P33082 and previous config saved to /var/cache/conftool/dbconfig/20220825-091447-ladsgroup.json
  • 09:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33081 and previous config saved to /var/cache/conftool/dbconfig/20220825-091428-root.json
  • 09:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33080 and previous config saved to /var/cache/conftool/dbconfig/20220825-091351-root.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33079 and previous config saved to /var/cache/conftool/dbconfig/20220825-091344-root.json
  • 09:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33078 and previous config saved to /var/cache/conftool/dbconfig/20220825-091336-root.json
  • 09:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T314041)', diff saved to https://phabricator.wikimedia.org/P33077 and previous config saved to /var/cache/conftool/dbconfig/20220825-091325-ladsgroup.json
  • 09:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:05 hashar@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.26 refs T314187 (duration: 03m 30s)
  • 09:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:02 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.26 refs T314187
  • 09:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P33075 and previous config saved to /var/cache/conftool/dbconfig/20220825-085943-root.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33074 and previous config saved to /var/cache/conftool/dbconfig/20220825-085924-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33073 and previous config saved to /var/cache/conftool/dbconfig/20220825-085847-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33072 and previous config saved to /var/cache/conftool/dbconfig/20220825-085839-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33071 and previous config saved to /var/cache/conftool/dbconfig/20220825-085831-root.json
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P33070 and previous config saved to /var/cache/conftool/dbconfig/20220825-085819-ladsgroup.json
  • 08:54 moritzm: installing curl security updates on bullseye
  • 08:50 moritzm: installing gnutls28 security updates on bullseye
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P33069 and previous config saved to /var/cache/conftool/dbconfig/20220825-084438-root.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33068 and previous config saved to /var/cache/conftool/dbconfig/20220825-084419-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33067 and previous config saved to /var/cache/conftool/dbconfig/20220825-084342-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33066 and previous config saved to /var/cache/conftool/dbconfig/20220825-084334-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33065 and previous config saved to /var/cache/conftool/dbconfig/20220825-084326-root.json
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P33064 and previous config saved to /var/cache/conftool/dbconfig/20220825-084313-ladsgroup.json
  • 08:39 jynus: restarting backupmon1001
  • 08:30 marostegui: Failover m1 from db1164 to db1195 - T315864
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P33063 and previous config saved to /var/cache/conftool/dbconfig/20220825-082933-root.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33062 and previous config saved to /var/cache/conftool/dbconfig/20220825-082915-root.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33061 and previous config saved to /var/cache/conftool/dbconfig/20220825-082837-root.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33060 and previous config saved to /var/cache/conftool/dbconfig/20220825-082830-root.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33059 and previous config saved to /var/cache/conftool/dbconfig/20220825-082821-root.json
  • 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T314041)', diff saved to https://phabricator.wikimedia.org/P33058 and previous config saved to /var/cache/conftool/dbconfig/20220825-082807-ladsgroup.json
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T314041)', diff saved to https://phabricator.wikimedia.org/P33057 and previous config saved to /var/cache/conftool/dbconfig/20220825-082621-ladsgroup.json
  • 08:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 08:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T314041)', diff saved to https://phabricator.wikimedia.org/P33056 and previous config saved to /var/cache/conftool/dbconfig/20220825-082559-ladsgroup.json
  • 08:23 vgutierrez: Increase roll-out of query-sorting to 5% - T314868
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P33055 and previous config saved to /var/cache/conftool/dbconfig/20220825-081429-root.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33054 and previous config saved to /var/cache/conftool/dbconfig/20220825-081410-root.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33053 and previous config saved to /var/cache/conftool/dbconfig/20220825-081333-root.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33052 and previous config saved to /var/cache/conftool/dbconfig/20220825-081325-root.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33051 and previous config saved to /var/cache/conftool/dbconfig/20220825-081316-root.json
  • 08:13 jynus: stopping bacula services on backup1001 T315864
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P33050 and previous config saved to /var/cache/conftool/dbconfig/20220825-081053-ladsgroup.json
  • 08:09 marostegui: Reboot db1195 for kernel upgrade T315864
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P33049 and previous config saved to /var/cache/conftool/dbconfig/20220825-075924-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33048 and previous config saved to /var/cache/conftool/dbconfig/20220825-075905-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33047 and previous config saved to /var/cache/conftool/dbconfig/20220825-075828-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33046 and previous config saved to /var/cache/conftool/dbconfig/20220825-075820-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33045 and previous config saved to /var/cache/conftool/dbconfig/20220825-075811-root.json
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P33044 and previous config saved to /var/cache/conftool/dbconfig/20220825-075547-ladsgroup.json
  • 07:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Switchover m1 T315864
  • 07:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db[2132,2160].codfw.wmnet,db[1117,1164,1195].eqiad.wmnet with reason: Switchover m1 T315864
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1191 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33042 and previous config saved to /var/cache/conftool/dbconfig/20220825-074400-root.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1190 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33041 and previous config saved to /var/cache/conftool/dbconfig/20220825-074323-root.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1188 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33040 and previous config saved to /var/cache/conftool/dbconfig/20220825-074315-root.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1186 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P33039 and previous config saved to /var/cache/conftool/dbconfig/20220825-074307-root.json
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T312160)', diff saved to https://phabricator.wikimedia.org/P33038 and previous config saved to /var/cache/conftool/dbconfig/20220825-074220-ladsgroup.json
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T314041)', diff saved to https://phabricator.wikimedia.org/P33037 and previous config saved to /var/cache/conftool/dbconfig/20220825-074041-ladsgroup.json
  • 07:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T314041)', diff saved to https://phabricator.wikimedia.org/P33036 and previous config saved to /var/cache/conftool/dbconfig/20220825-073855-ladsgroup.json
  • 07:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 07:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T314041)', diff saved to https://phabricator.wikimedia.org/P33035 and previous config saved to /var/cache/conftool/dbconfig/20220825-073834-ladsgroup.json
  • 07:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:36 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1012 to pc2 master T315526 (duration: 03m 39s)
  • 07:34 marostegui: Promote pc1012 back as pc2 master T315526
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repooling after cloning db1185', diff saved to https://phabricator.wikimedia.org/P33034 and previous config saved to /var/cache/conftool/dbconfig/20220825-072340-root.json
  • 07:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P33033 and previous config saved to /var/cache/conftool/dbconfig/20220825-072327-ladsgroup.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repooling after cloning db1185', diff saved to https://phabricator.wikimedia.org/P33032 and previous config saved to /var/cache/conftool/dbconfig/20220825-070835-root.json
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P33031 and previous config saved to /var/cache/conftool/dbconfig/20220825-070821-ladsgroup.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repooling after cloning db1185', diff saved to https://phabricator.wikimedia.org/P33030 and previous config saved to /var/cache/conftool/dbconfig/20220825-065331-root.json
  • 06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T314041)', diff saved to https://phabricator.wikimedia.org/P33029 and previous config saved to /var/cache/conftool/dbconfig/20220825-065315-ladsgroup.json
  • 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T314041)', diff saved to https://phabricator.wikimedia.org/P33028 and previous config saved to /var/cache/conftool/dbconfig/20220825-065128-ladsgroup.json
  • 06:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 06:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 06:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 06:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repooling after cloning db1185', diff saved to https://phabricator.wikimedia.org/P33027 and previous config saved to /var/cache/conftool/dbconfig/20220825-063826-root.json
  • 06:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 06:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 06:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 06:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 06:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maint on s4 old master
  • 06:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1138.eqiad.wmnet with reason: Maint on s4 old master
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1138 T315419', diff saved to https://phabricator.wikimedia.org/P33026 and previous config saved to /var/cache/conftool/dbconfig/20220825-062852-ladsgroup.json
  • 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1160 to s4 primary and set section read-write T315419', diff saved to https://phabricator.wikimedia.org/P33025 and previous config saved to /var/cache/conftool/dbconfig/20220825-062425-ladsgroup.json
  • 06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T315419', diff saved to https://phabricator.wikimedia.org/P33024 and previous config saved to /var/cache/conftool/dbconfig/20220825-062353-ladsgroup.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Repooling after cloning db1185', diff saved to https://phabricator.wikimedia.org/P33023 and previous config saved to /var/cache/conftool/dbconfig/20220825-062321-root.json
  • 06:22 Amir1: Starting s4 eqiad failover from db1138 to db1160 - T315419
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 5%: Repooling after cloning db1185', diff saved to https://phabricator.wikimedia.org/P33022 and previous config saved to /var/cache/conftool/dbconfig/20220825-060816-root.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P33020 and previous config saved to /var/cache/conftool/dbconfig/20220825-060601-root.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1191 with minimal weight in s7 T313569', diff saved to https://phabricator.wikimedia.org/P33019 and previous config saved to /var/cache/conftool/dbconfig/20220825-055057-root.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1191 to dbctl T313569', diff saved to https://phabricator.wikimedia.org/P33018 and previous config saved to /var/cache/conftool/dbconfig/20220825-055038-marostegui.json
  • 05:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 05:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 05:46 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.26/includes/page/Article.php: Backport: Display page namespace with spaces instead of underscores when page doesn't exist (T316092) (duration: 03m 32s)
  • 05:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 05:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1190 with minimal weight in s4 T313569', diff saved to https://phabricator.wikimedia.org/P33017 and previous config saved to /var/cache/conftool/dbconfig/20220825-053310-root.json
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1190 to dbctl T313569', diff saved to https://phabricator.wikimedia.org/P33016 and previous config saved to /var/cache/conftool/dbconfig/20220825-053253-marostegui.json
  • 05:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1160 with weight 0 T315419', diff saved to https://phabricator.wikimedia.org/P33015 and previous config saved to /var/cache/conftool/dbconfig/20220825-052415-ladsgroup.json
  • 05:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s4 T315419
  • 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s4 T315419
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1188 with minimal weight in s2 T313569', diff saved to https://phabricator.wikimedia.org/P33013 and previous config saved to /var/cache/conftool/dbconfig/20220825-051754-root.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1188 to dbctl T313569', diff saved to https://phabricator.wikimedia.org/P33012 and previous config saved to /var/cache/conftool/dbconfig/20220825-051737-marostegui.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1186 with minimal weight in s1 T313569', diff saved to https://phabricator.wikimedia.org/P33011 and previous config saved to /var/cache/conftool/dbconfig/20220825-051155-root.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1186 to dbctl', diff saved to https://phabricator.wikimedia.org/P33010 and previous config saved to /var/cache/conftool/dbconfig/20220825-051130-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P33008 and previous config saved to /var/cache/conftool/dbconfig/20220825-050713-root.json
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T314041)', diff saved to https://phabricator.wikimedia.org/P33007 and previous config saved to /var/cache/conftool/dbconfig/20220825-050539-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P33006 and previous config saved to /var/cache/conftool/dbconfig/20220825-045033-ladsgroup.json
  • 04:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P33005 and previous config saved to /var/cache/conftool/dbconfig/20220825-043527-ladsgroup.json
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T314041)', diff saved to https://phabricator.wikimedia.org/P33004 and previous config saved to /var/cache/conftool/dbconfig/20220825-042020-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T314041)', diff saved to https://phabricator.wikimedia.org/P33003 and previous config saved to /var/cache/conftool/dbconfig/20220825-041833-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 04:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T314041)', diff saved to https://phabricator.wikimedia.org/P33002 and previous config saved to /var/cache/conftool/dbconfig/20220825-041812-ladsgroup.json
  • 04:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P33001 and previous config saved to /var/cache/conftool/dbconfig/20220825-040306-ladsgroup.json
  • 03:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P33000 and previous config saved to /var/cache/conftool/dbconfig/20220825-034759-ladsgroup.json
  • 03:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T314041)', diff saved to https://phabricator.wikimedia.org/P32999 and previous config saved to /var/cache/conftool/dbconfig/20220825-033253-ladsgroup.json
  • 03:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2165 (T314041)', diff saved to https://phabricator.wikimedia.org/P32998 and previous config saved to /var/cache/conftool/dbconfig/20220825-033107-ladsgroup.json
  • 03:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 03:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32997 and previous config saved to /var/cache/conftool/dbconfig/20220825-033045-ladsgroup.json
  • 03:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P32996 and previous config saved to /var/cache/conftool/dbconfig/20220825-031539-ladsgroup.json
  • 03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P32995 and previous config saved to /var/cache/conftool/dbconfig/20220825-030033-ladsgroup.json
  • 02:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32994 and previous config saved to /var/cache/conftool/dbconfig/20220825-024527-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32993 and previous config saved to /var/cache/conftool/dbconfig/20220825-024339-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 02:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32992 and previous config saved to /var/cache/conftool/dbconfig/20220825-024318-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P32991 and previous config saved to /var/cache/conftool/dbconfig/20220825-022812-ladsgroup.json
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P32990 and previous config saved to /var/cache/conftool/dbconfig/20220825-021306-ladsgroup.json
  • 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32989 and previous config saved to /var/cache/conftool/dbconfig/20220825-015800-ladsgroup.json
  • 01:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32988 and previous config saved to /var/cache/conftool/dbconfig/20220825-015612-ladsgroup.json
  • 01:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T314041)', diff saved to https://phabricator.wikimedia.org/P32987 and previous config saved to /var/cache/conftool/dbconfig/20220825-015550-ladsgroup.json
  • 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P32986 and previous config saved to /var/cache/conftool/dbconfig/20220825-014044-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P32985 and previous config saved to /var/cache/conftool/dbconfig/20220825-012538-ladsgroup.json
  • 01:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T314041)', diff saved to https://phabricator.wikimedia.org/P32984 and previous config saved to /var/cache/conftool/dbconfig/20220825-011032-ladsgroup.json
  • 01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T314041)', diff saved to https://phabricator.wikimedia.org/P32983 and previous config saved to /var/cache/conftool/dbconfig/20220825-010845-ladsgroup.json
  • 01:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 01:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T314041)', diff saved to https://phabricator.wikimedia.org/P32982 and previous config saved to /var/cache/conftool/dbconfig/20220825-010824-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P32981 and previous config saved to /var/cache/conftool/dbconfig/20220825-005318-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P32980 and previous config saved to /var/cache/conftool/dbconfig/20220825-003812-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T314041)', diff saved to https://phabricator.wikimedia.org/P32979 and previous config saved to /var/cache/conftool/dbconfig/20220825-002306-ladsgroup.json
  • 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T314041)', diff saved to https://phabricator.wikimedia.org/P32978 and previous config saved to /var/cache/conftool/dbconfig/20220825-002120-ladsgroup.json
  • 00:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 00:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 00:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 00:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T314041)', diff saved to https://phabricator.wikimedia.org/P32977 and previous config saved to /var/cache/conftool/dbconfig/20220825-001949-ladsgroup.json
  • 00:15 ejegg: fundraising scheduled jobs re-enabled
  • 00:08 eileen: config revision changed from ab95bc89 to 2d10cc5f
  • 00:08 eileen: civicrm upgraded from ff9b377d to a31c7590
  • 00:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32976 and previous config saved to /var/cache/conftool/dbconfig/20220825-000443-ladsgroup.json

2022-08-24

  • 23:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32975 and previous config saved to /var/cache/conftool/dbconfig/20220824-234937-ladsgroup.json
  • 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T314041)', diff saved to https://phabricator.wikimedia.org/P32974 and previous config saved to /var/cache/conftool/dbconfig/20220824-233431-ladsgroup.json
  • 23:33 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Restarting to apply OpenJDK 8u342 - eevans@cumin1001
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T314041)', diff saved to https://phabricator.wikimedia.org/P32973 and previous config saved to /var/cache/conftool/dbconfig/20220824-233046-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32972 and previous config saved to /var/cache/conftool/dbconfig/20220824-233025-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32971 and previous config saved to /var/cache/conftool/dbconfig/20220824-231519-ladsgroup.json
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32970 and previous config saved to /var/cache/conftool/dbconfig/20220824-230013-ladsgroup.json
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32969 and previous config saved to /var/cache/conftool/dbconfig/20220824-224507-ladsgroup.json
  • 22:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32968 and previous config saved to /var/cache/conftool/dbconfig/20220824-224214-ladsgroup.json
  • 22:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 22:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T314041)', diff saved to https://phabricator.wikimedia.org/P32967 and previous config saved to /var/cache/conftool/dbconfig/20220824-224153-ladsgroup.json
  • 22:37 ryankemper: [Elastic] We're back to green in `cloudelastic-chi`, so cloudelastic is back to fully healthy
  • 22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32966 and previous config saved to /var/cache/conftool/dbconfig/20220824-222646-ladsgroup.json
  • 22:20 ryankemper: [Elastic] We've got the cloudelastic instances all back up. A bunch of shard recoveries ongoing; currently the cluster is red. It might go all the way back to green; hard to say until the shard recoveries complete.
  • 22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32965 and previous config saved to /var/cache/conftool/dbconfig/20220824-221140-ladsgroup.json
  • 21:58 ryankemper: [Elastic] `ryankemper@cloudelastic1003:~$ sudo systemctl restart elasticsearch_6@cloudelastic-chi-eqiad.service`, 1003 was also oom-killed: `[4165984.362182] Out of memory: Killed process 3759 (java) total-vm:2277062348kB, anon-rss:61648756kB, file-rss:0kB, shmem-rss:0kB, UID:113 pgtables:1448136kB oom_score_adj:0`
  • 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T314041)', diff saved to https://phabricator.wikimedia.org/P32964 and previous config saved to /var/cache/conftool/dbconfig/20220824-215634-ladsgroup.json
  • 21:54 ryankemper: [Elastic] `ryankemper@cloudelastic1004:~$ sudo systemctl restart elasticsearch_6@cloudelastic-chi-eqiad.service` Restarting 1004's chi eqiad, it died due to `Aug 24 21:43:21 cloudelastic1004 systemd[1]: elasticsearch_6@cloudelastic-chi-eqiad.service: Main process exited, code=killed, status=9/KILL`
  • 21:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T314041)', diff saved to https://phabricator.wikimedia.org/P32963 and previous config saved to /var/cache/conftool/dbconfig/20220824-215143-ladsgroup.json
  • 21:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 21:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 21:51 eileen: civicrm upgraded from 632d5f5f to ff9b377d
  • 21:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 21:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 21:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T314041)', diff saved to https://phabricator.wikimedia.org/P32962 and previous config saved to /var/cache/conftool/dbconfig/20220824-215025-ladsgroup.json
  • 21:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 6 hosts with reason: T316159
  • 21:48 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 6 hosts with reason: T316159
  • 21:48 eileen: config revision changed from c2aa4158 to ab95bc89
  • 21:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32961 and previous config saved to /var/cache/conftool/dbconfig/20220824-213519-ladsgroup.json
  • 21:23 dzahn@cumin2002: conftool action : set/weight=25; selector: name=mw134[1-8].eqiad.wmnet,cluster=api_appserver
  • 21:22 dzahn@cumin2002: conftool action : set/weight=25; selector: name=mw133[1-9].eqiad.wmnet,cluster=api_appserver
  • 21:22 dzahn@cumin2002: conftool action : set/weight=25; selector: name=mw133[1-9].eqiad.wmnet,cluster=appserver
  • 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32959 and previous config saved to /var/cache/conftool/dbconfig/20220824-212013-ladsgroup.json
  • 21:20 mutante: setting weight to 25 (from 30) for appservers and API servers in the range mw1307 through mw1348 because they are of an older hardware type (not changing weights of jobrunners/videoscalers even if in this range) (T304800)
  • 21:18 dzahn@cumin2002: conftool action : set/weight=25; selector: name=mw132[1-9].eqiad.wmnet
  • 21:15 dzahn@cumin2002: conftool action : set/weight=25; selector: name=mw131[2-7].eqiad.wmnet
  • 21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T314041)', diff saved to https://phabricator.wikimedia.org/P32958 and previous config saved to /var/cache/conftool/dbconfig/20220824-210507-ladsgroup.json
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T314041)', diff saved to https://phabricator.wikimedia.org/P32957 and previous config saved to /var/cache/conftool/dbconfig/20220824-210216-ladsgroup.json
  • 21:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 21:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 21:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32956 and previous config saved to /var/cache/conftool/dbconfig/20220824-210155-ladsgroup.json
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32955 and previous config saved to /var/cache/conftool/dbconfig/20220824-204649-ladsgroup.json
  • 20:44 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Restarting to apply OpenJDK 8u342 - eevans@cumin1001
  • 20:40 mutante: otrs1001 - systemctl reset failed
  • 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32954 and previous config saved to /var/cache/conftool/dbconfig/20220824-203143-ladsgroup.json
  • 20:21 ejegg: updated standalone SmashPig deploy from 13e9e9cc to 11ba0a1b
  • 20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32953 and previous config saved to /var/cache/conftool/dbconfig/20220824-201637-ladsgroup.json
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T314041)', diff saved to https://phabricator.wikimedia.org/P32952 and previous config saved to /var/cache/conftool/dbconfig/20220824-201344-ladsgroup.json
  • 20:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 20:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 20:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 20:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 20:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T314041)', diff saved to https://phabricator.wikimedia.org/P32951 and previous config saved to /var/cache/conftool/dbconfig/20220824-201224-ladsgroup.json
  • 19:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32950 and previous config saved to /var/cache/conftool/dbconfig/20220824-195717-ladsgroup.json
  • 19:55 ejegg: civicrm upgraded from edfe2f16 to 632d5f5f
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32949 and previous config saved to /var/cache/conftool/dbconfig/20220824-194211-ladsgroup.json
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T314041)', diff saved to https://phabricator.wikimedia.org/P32948 and previous config saved to /var/cache/conftool/dbconfig/20220824-192705-ladsgroup.json
  • 19:23 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/GeoCrumbs/includes/Hooks.php: Backport: Convert page title to variant properly (T316085) (duration: 02m 50s)
  • 19:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T314041)', diff saved to https://phabricator.wikimedia.org/P32947 and previous config saved to /var/cache/conftool/dbconfig/20220824-192119-ladsgroup.json
  • 19:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 19:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 19:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 19:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T314041)', diff saved to https://phabricator.wikimedia.org/P32946 and previous config saved to /var/cache/conftool/dbconfig/20220824-191943-ladsgroup.json
  • 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32945 and previous config saved to /var/cache/conftool/dbconfig/20220824-190437-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32944 and previous config saved to /var/cache/conftool/dbconfig/20220824-184931-ladsgroup.json
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T314041)', diff saved to https://phabricator.wikimedia.org/P32943 and previous config saved to /var/cache/conftool/dbconfig/20220824-183425-ladsgroup.json
  • 17:46 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Restarting to apply OpenJDK 8u342 - eevans@cumin1001
  • 17:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 17:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 17:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1103.eqiad.wmnet with reason: Maintenance
  • 17:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1103.eqiad.wmnet with reason: Maintenance
  • 17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T314041)', diff saved to https://phabricator.wikimedia.org/P32942 and previous config saved to /var/cache/conftool/dbconfig/20220824-173409-ladsgroup.json
  • 17:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 17:06 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1009.eqiad.wmnet with OS bullseye
  • 16:51 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.26/skins/Vector/resources/mediawiki.less.legacy/mediawiki.skin.variables.less: Backport: Vector legacy no longer imports variables from Vector modern (T213778) (duration: 02m 52s)
  • 16:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 16:30 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 16:26 mutante: mwmaint1002 systemctl start mediawiki_job_initsitestats T315121
  • 16:17 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1009.eqiad.wmnet with OS bullseye
  • 16:15 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1007.eqiad.wmnet with OS bullseye
  • 16:05 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
  • 16:00 hashar: Restarted CI Jenkins, Release Jenkins, Gerrit replica and Gerrit
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T312975)', diff saved to https://phabricator.wikimedia.org/P32941 and previous config saved to /var/cache/conftool/dbconfig/20220824-151445-ladsgroup.json
  • 15:12 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Restarting to apply OpenJDK 8u342 - eevans@cumin1001
  • 15:04 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Restarting to canary OpenJDK 8u342 - eevans@cumin1001
  • 15:01 btullis: restarting pybal on lvs1019
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P32940 and previous config saved to /var/cache/conftool/dbconfig/20220824-145939-ladsgroup.json
  • 14:57 btullis: restarting pybal on lvs1020
  • 14:55 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Restarting to canary OpenJDK 8u342 - eevans@cumin1001
  • 14:48 moritzm: powercycling krb2002
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P32939 and previous config saved to /var/cache/conftool/dbconfig/20220824-144432-ladsgroup.json
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T314041)', diff saved to https://phabricator.wikimedia.org/P32938 and previous config saved to /var/cache/conftool/dbconfig/20220824-143923-ladsgroup.json
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T312975)', diff saved to https://phabricator.wikimedia.org/P32937 and previous config saved to /var/cache/conftool/dbconfig/20220824-142926-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2115 (T312975)', diff saved to https://phabricator.wikimedia.org/P32936 and previous config saved to /var/cache/conftool/dbconfig/20220824-142715-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
  • 14:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 14:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
  • 14:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 14:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T312975)', diff saved to https://phabricator.wikimedia.org/P32935 and previous config saved to /var/cache/conftool/dbconfig/20220824-142623-ladsgroup.json
  • 14:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1185.eqiad.wmnet with OS bullseye
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32934 and previous config saved to /var/cache/conftool/dbconfig/20220824-142416-ladsgroup.json
  • 14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 14:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P32933 and previous config saved to /var/cache/conftool/dbconfig/20220824-141117-ladsgroup.json
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32932 and previous config saved to /var/cache/conftool/dbconfig/20220824-140910-ladsgroup.json
  • 14:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P32931 and previous config saved to /var/cache/conftool/dbconfig/20220824-135611-ladsgroup.json
  • 13:55 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T314041)', diff saved to https://phabricator.wikimedia.org/P32930 and previous config saved to /var/cache/conftool/dbconfig/20220824-135404-ladsgroup.json
  • 13:49 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert "Group 1 wikis to 1.39.0-wmf.26" # T316085 T314187
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T314041)', diff saved to https://phabricator.wikimedia.org/P32929 and previous config saved to /var/cache/conftool/dbconfig/20220824-134118-ladsgroup.json
  • 13:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T312975)', diff saved to https://phabricator.wikimedia.org/P32928 and previous config saved to /var/cache/conftool/dbconfig/20220824-134104-ladsgroup.json
  • 13:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T314041)', diff saved to https://phabricator.wikimedia.org/P32927 and previous config saved to /var/cache/conftool/dbconfig/20220824-134057-ladsgroup.json
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T312975)', diff saved to https://phabricator.wikimedia.org/P32926 and previous config saved to /var/cache/conftool/dbconfig/20220824-133953-ladsgroup.json
  • 13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T312975)', diff saved to https://phabricator.wikimedia.org/P32925 and previous config saved to /var/cache/conftool/dbconfig/20220824-133932-ladsgroup.json
  • 13:31 taavi: taavi@mwmaint1002 ~ $ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki mediawikiwiki "Africa Wikimedia Developers Project" "African Wikimedia Technical Community" "Taavi" --reason "per request phab:T316066" # T316066
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repooling after cloning db1191', diff saved to https://phabricator.wikimedia.org/P32924 and previous config saved to /var/cache/conftool/dbconfig/20220824-132908-root.json
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P32919 and previous config saved to /var/cache/conftool/dbconfig/20220824-130920-ladsgroup.json
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repooling after cloning db1191', diff saved to https://phabricator.wikimedia.org/P32918 and previous config saved to /var/cache/conftool/dbconfig/20220824-125858-root.json
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T314041)', diff saved to https://phabricator.wikimedia.org/P32917 and previous config saved to /var/cache/conftool/dbconfig/20220824-125537-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T312975)', diff saved to https://phabricator.wikimedia.org/P32916 and previous config saved to /var/cache/conftool/dbconfig/20220824-125414-ladsgroup.json
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T312975)', diff saved to https://phabricator.wikimedia.org/P32915 and previous config saved to /var/cache/conftool/dbconfig/20220824-125003-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 12:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 12:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 12:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 12:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 12:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120 (T312975)', diff saved to https://phabricator.wikimedia.org/P32914 and previous config saved to /var/cache/conftool/dbconfig/20220824-124905-ladsgroup.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repooling after cloning db1191', diff saved to https://phabricator.wikimedia.org/P32913 and previous config saved to /var/cache/conftool/dbconfig/20220824-124354-root.json
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T314041)', diff saved to https://phabricator.wikimedia.org/P32912 and previous config saved to /var/cache/conftool/dbconfig/20220824-124346-ladsgroup.json
  • 12:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 12:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 12:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120', diff saved to https://phabricator.wikimedia.org/P32911 and previous config saved to /var/cache/conftool/dbconfig/20220824-123358-ladsgroup.json
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Repooling after cloning db1191', diff saved to https://phabricator.wikimedia.org/P32910 and previous config saved to /var/cache/conftool/dbconfig/20220824-122848-root.json
  • 12:24 moritzm: installing containerd security updates
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120', diff saved to https://phabricator.wikimedia.org/P32909 and previous config saved to /var/cache/conftool/dbconfig/20220824-121852-ladsgroup.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Repooling after cloning db1191', diff saved to https://phabricator.wikimedia.org/P32908 and previous config saved to /var/cache/conftool/dbconfig/20220824-121343-root.json
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1120 (T312975)', diff saved to https://phabricator.wikimedia.org/P32907 and previous config saved to /var/cache/conftool/dbconfig/20220824-120346-ladsgroup.json
  • 12:01 Amir1: killed refresh links-recomm scripts in rowiki, cswiki, simplewiki, frwiki (T299021)
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1120 (T312975)', diff saved to https://phabricator.wikimedia.org/P32906 and previous config saved to /var/cache/conftool/dbconfig/20220824-115935-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1120.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1120.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:42 klausman@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching ml-cache*: Rolling restart to activate new JRE - klausman@cumin1001
  • 11:38 slyngs: Migrate mdadm array checks to systemd timers. Gerrit: 819577
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repooling after cloning db1190', diff saved to https://phabricator.wikimedia.org/P32905 and previous config saved to /var/cache/conftool/dbconfig/20220824-112938-root.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repooling after cloning db1190', diff saved to https://phabricator.wikimedia.org/P32904 and previous config saved to /var/cache/conftool/dbconfig/20220824-111433-root.json
  • 11:07 klausman@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching ml-cache*: Rolling restart to activate new JRE - klausman@cumin1001
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repooling after cloning db1190', diff saved to https://phabricator.wikimedia.org/P32903 and previous config saved to /var/cache/conftool/dbconfig/20220824-105928-root.json
  • 10:52 vgutierrez: disable origin coalescing in ats@cp600[78] - T315911
  • 10:46 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 10:46 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repooling after cloning db1190', diff saved to https://phabricator.wikimedia.org/P32902 and previous config saved to /var/cache/conftool/dbconfig/20220824-104424-root.json
  • 10:36 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 10:35 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 10:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 10:32 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 10%: Repooling after cloning db1190', diff saved to https://phabricator.wikimedia.org/P32901 and previous config saved to /var/cache/conftool/dbconfig/20220824-102919-root.json
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 5%: Repooling after cloning db1190', diff saved to https://phabricator.wikimedia.org/P32900 and previous config saved to /var/cache/conftool/dbconfig/20220824-101414-root.json
  • 09:46 vgutierrez: Restart incremental roll-out of query-sorting at 1% - T314868
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32899 and previous config saved to /var/cache/conftool/dbconfig/20220824-085902-root.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32898 and previous config saved to /var/cache/conftool/dbconfig/20220824-085639-root.json
  • 08:49 jayme: jayme@builder-envoy-03:~$ sudo apt-get remove --purge linux-image-4.19.0-6-amd64-dbg linux-image-4.19.0-14-amd64-dbg
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32897 and previous config saved to /var/cache/conftool/dbconfig/20220824-084357-root.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32896 and previous config saved to /var/cache/conftool/dbconfig/20220824-084134-root.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32895 and previous config saved to /var/cache/conftool/dbconfig/20220824-082852-root.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P32893 and previous config saved to /var/cache/conftool/dbconfig/20220824-082809-root.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32892 and previous config saved to /var/cache/conftool/dbconfig/20220824-082630-root.json
  • 08:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:19 hashar@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.26 refs T314187 (duration: 02m 46s)
  • 08:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:16 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.26 refs T314187
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32891 and previous config saved to /var/cache/conftool/dbconfig/20220824-081347-root.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32890 and previous config saved to /var/cache/conftool/dbconfig/20220824-081125-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32888 and previous config saved to /var/cache/conftool/dbconfig/20220824-075955-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32887 and previous config saved to /var/cache/conftool/dbconfig/20220824-075946-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P32886 and previous config saved to /var/cache/conftool/dbconfig/20220824-075927-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32885 and previous config saved to /var/cache/conftool/dbconfig/20220824-075843-root.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P32884 and previous config saved to /var/cache/conftool/dbconfig/20220824-075620-root.json
  • 07:47 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to pc2 master T315526 (duration: 02m 48s)
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32883 and previous config saved to /var/cache/conftool/dbconfig/20220824-074451-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32882 and previous config saved to /var/cache/conftool/dbconfig/20220824-074441-root.json
  • 07:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:41 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1014 to pc2 master T315526 (duration: 03m 03s)
  • 07:40 marostegui: Promote pc1014 to pc2 master T315526
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32880 and previous config saved to /var/cache/conftool/dbconfig/20220824-072946-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32879 and previous config saved to /var/cache/conftool/dbconfig/20220824-072937-root.json
  • 07:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32878 and previous config saved to /var/cache/conftool/dbconfig/20220824-071441-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32877 and previous config saved to /var/cache/conftool/dbconfig/20220824-071432-root.json
  • 07:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:13 tgr: UTC morning deploys done
  • 07:12 tgr@deploy1002: Synchronized wmf-config: Config: Drop unused wgGECampaignPattern (duration: 02m 57s)
  • 07:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32876 and previous config saved to /var/cache/conftool/dbconfig/20220824-065937-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32875 and previous config saved to /var/cache/conftool/dbconfig/20220824-065927-root.json
  • 06:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:50 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/FlaggedRevs/frontend/FlaggedRevsUIHooks.php: Backport: Changes list filter: don't add fields that are already in the query (T316026) (duration: 02m 57s)
  • 06:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:46 hashar: Restarted Gerrit to enable replication configuration autoloading
  • 06:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32874 and previous config saved to /var/cache/conftool/dbconfig/20220824-064432-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32873 and previous config saved to /var/cache/conftool/dbconfig/20220824-064423-root.json
  • 06:42 marostegui: dbmaint x1 codfw T312574
  • 06:41 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.26/extensions/FlaggedRevs/frontend/FlaggedRevsUIHooks.php: Backport: Changes list filter: don't add fields that are already in the query (T316026) (duration: 03m 07s)
  • 06:37 marostegui: dbmaint s3 T312160
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32872 and previous config saved to /var/cache/conftool/dbconfig/20220824-062927-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 4%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32871 and previous config saved to /var/cache/conftool/dbconfig/20220824-062918-root.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P32869 and previous config saved to /var/cache/conftool/dbconfig/20220824-061532-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32868 and previous config saved to /var/cache/conftool/dbconfig/20220824-061422-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32867 and previous config saved to /var/cache/conftool/dbconfig/20220824-061413-root.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32866 and previous config saved to /var/cache/conftool/dbconfig/20220824-055918-root.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 2%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32865 and previous config saved to /var/cache/conftool/dbconfig/20220824-055909-root.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P32863 and previous config saved to /var/cache/conftool/dbconfig/20220824-054719-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: Repooling for the first time', diff saved to https://phabricator.wikimedia.org/P32862 and previous config saved to /var/cache/conftool/dbconfig/20220824-054404-root.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1187 with minimal weight', diff saved to https://phabricator.wikimedia.org/P32861 and previous config saved to /var/cache/conftool/dbconfig/20220824-054018-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1189 with minimal weight', diff saved to https://phabricator.wikimedia.org/P32860 and previous config saved to /var/cache/conftool/dbconfig/20220824-053434-root.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Move db2180 from s4 to s6', diff saved to https://phabricator.wikimedia.org/P32859 and previous config saved to /var/cache/conftool/dbconfig/20220824-053311-root.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1189 with minimal weight', diff saved to https://phabricator.wikimedia.org/P32858 and previous config saved to /var/cache/conftool/dbconfig/20220824-053141-root.json

2022-08-23

  • 22:31 mutante: mwmaint1002 - find /var/lib/puppet/clientbucket -type f -size +100M -delete
  • 22:16 dancy@deploy1002: Testing. Ignore
  • 21:19 wfan: Updateing di-config from e447ff7c to 3c27af23
  • 21:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:08 urbanecm@deploy1002: Synchronized wmf-config/abusefilter.php: 8fb3575: trwikiquote: Enable block feature of abusefilter (T315736) (duration: 02m 57s)
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T315604
  • 19:34 bking@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T315604
  • 18:15 hashar: Restarting CI Jenkins
  • 18:07 ejegg: payments-wiki upgraded from fb50c013 to a63b300e
  • 17:41 hashar: Stopping Gerrit
  • 17:39 hashar@deploy1002: Finished deploy [gerrit/gerrit@cb7edfb]: Revert Gerrit from 3.4.5 to 3.4.4 # T315942 (duration: 00m 08s)
  • 17:39 hashar@deploy1002: Started deploy [gerrit/gerrit@cb7edfb]: Revert Gerrit from 3.4.5 to 3.4.4 # T315942
  • 17:39 hashar@deploy1002: Finished deploy [gerrit/gerrit@e11e6a7]: Revert Gerrit from 3.4.5 to 3.4.4 # T315942 (duration: 00m 04s)
  • 17:39 hashar@deploy1002: Started deploy [gerrit/gerrit@e11e6a7]: Revert Gerrit from 3.4.5 to 3.4.4 # T315942
  • 17:37 cwhite: restart tcpircbot T257861
  • 17:33 inflatador: 'bking@cumin starting thanos-swift cleanup for wdqs T316031'
  • 17:21 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/includes/specials/SpecialRecentChangesLinked.php: Backport: SpecialRecentChangesLinked: Pass query builder instead of SQL (duration: 03m 32s)
  • 17:17 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Backport: rdbms: Switch to getConnectionInternal() in getPrimaryPos() (duration: 03m 27s)
  • 17:00 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.26/includes/specials/SpecialRecentChangesLinked.php: Backport: SpecialRecentChangesLinked: Pass query builder instead of SQL (duration: 03m 34s)
  • 16:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:03 krinkle@deploy1002: Synchronized wmf-config/: Ifd90ae (duration: 03m 18s)
  • 16:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32854 and previous config saved to /var/cache/conftool/dbconfig/20220823-155013-root.json
  • 15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:44 krinkle@deploy1002: Synchronized wmf-config/: I1c5b05 (duration: 03m 25s)
  • 15:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:41 mutante: gerrit - service restart
  • 15:36 mutante: gerrit2002 - service restart
  • 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32853 and previous config saved to /var/cache/conftool/dbconfig/20220823-153508-root.json
  • 15:30 sbassett: Deployed security patch for T307278 to wmf.25
  • 15:25 sbassett: Deployed security patch for T307278 to wmf.26
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32852 and previous config saved to /var/cache/conftool/dbconfig/20220823-152003-root.json
  • 15:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dave Pifke out of all services on: 774 hosts
  • 15:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dave Pifke out of all services on: 774 hosts
  • 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dave Pifke out of all services on: 1238 hosts
  • 15:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dave Pifke out of all services on: 1238 hosts
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Erin Yener out of all services on: 774 hosts
  • 15:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Erin Yener out of all services on: 774 hosts
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Erin Yener out of all services on: 1238 hosts
  • 15:06 mutante: gerrit - service restart - T315942 - added sshd.enableDeprecatedKexAlgorithms = true
  • 15:05 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Erin Yener out of all services on: 1238 hosts
  • 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32851 and previous config saved to /var/cache/conftool/dbconfig/20220823-150459-root.json
  • 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Effeietsanders out of all services on: 1238 hosts
  • 15:04 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Effeietsanders out of all services on: 1238 hosts
  • 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Effeietsanders out of all services on: 774 hosts
  • 15:02 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Effeietsanders out of all services on: 774 hosts
  • 15:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32850 and previous config saved to /var/cache/conftool/dbconfig/20220823-144954-root.json
  • 14:49 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:48 moritzm: installing libtirpc security updates
  • 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:42 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: bcef1d5: Start writing to cuc_actor everywhere (T233004) (duration: 03m 18s)
  • 14:39 krinkle@deploy1002: Synchronized wmf-config/redis.php: Ib99479 (duration: 03m 47s)
  • 14:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32849 and previous config saved to /var/cache/conftool/dbconfig/20220823-143450-root.json
  • 14:21 marostegui: Run schema change on db1160 T303603
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P32848 and previous config saved to /var/cache/conftool/dbconfig/20220823-142011-root.json
  • 14:03 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:02 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:01 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:00 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:59 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:58 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b3b9e0a: Start writing to cuc_actor on s8 (T233004) (duration: 03m 31s)
  • 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 12:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 12:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Maintenance
  • 12:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Maintenance
  • 12:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 12:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 12:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T312972)', diff saved to https://phabricator.wikimedia.org/P32847 and previous config saved to /var/cache/conftool/dbconfig/20220823-125824-marostegui.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P32846 and previous config saved to /var/cache/conftool/dbconfig/20220823-124317-marostegui.json
  • 12:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2019.codfw.wmnet to cluster codfw and group B
  • 12:39 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2019.codfw.wmnet to cluster codfw and group B
  • 12:33 vgutierrez: Incremental roll-out of query-sorting (15%) - T314868
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P32845 and previous config saved to /var/cache/conftool/dbconfig/20220823-122811-marostegui.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T312972)', diff saved to https://phabricator.wikimedia.org/P32844 and previous config saved to /var/cache/conftool/dbconfig/20220823-121305-marostegui.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T312972)', diff saved to https://phabricator.wikimedia.org/P32843 and previous config saved to /var/cache/conftool/dbconfig/20220823-121159-marostegui.json
  • 12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312972)', diff saved to https://phabricator.wikimedia.org/P32842 and previous config saved to /var/cache/conftool/dbconfig/20220823-121055-marostegui.json
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P32841 and previous config saved to /var/cache/conftool/dbconfig/20220823-115549-marostegui.json
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P32840 and previous config saved to /var/cache/conftool/dbconfig/20220823-114220-root.json
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P32839 and previous config saved to /var/cache/conftool/dbconfig/20220823-114043-marostegui.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P32838 and previous config saved to /var/cache/conftool/dbconfig/20220823-112715-root.json
  • 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T312972)', diff saved to https://phabricator.wikimedia.org/P32837 and previous config saved to /var/cache/conftool/dbconfig/20220823-112537-marostegui.json
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T312972)', diff saved to https://phabricator.wikimedia.org/P32836 and previous config saved to /var/cache/conftool/dbconfig/20220823-112430-marostegui.json
  • 11:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 11:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312972)', diff saved to https://phabricator.wikimedia.org/P32835 and previous config saved to /var/cache/conftool/dbconfig/20220823-112408-marostegui.json
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P32834 and previous config saved to /var/cache/conftool/dbconfig/20220823-111210-root.json
  • 11:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32833 and previous config saved to /var/cache/conftool/dbconfig/20220823-111139-root.json
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P32832 and previous config saved to /var/cache/conftool/dbconfig/20220823-110902-marostegui.json
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P32831 and previous config saved to /var/cache/conftool/dbconfig/20220823-105706-root.json
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32830 and previous config saved to /var/cache/conftool/dbconfig/20220823-105634-root.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P32829 and previous config saved to /var/cache/conftool/dbconfig/20220823-105356-marostegui.json
  • 10:49 btullis@puppetmaster1001: conftool action : set/pooled=yes:weight=1; selector: cluster=dse-k8s,service=kubemaster
  • 10:46 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P32828 and previous config saved to /var/cache/conftool/dbconfig/20220823-104201-root.json
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 60%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32827 and previous config saved to /var/cache/conftool/dbconfig/20220823-104126-root.json
  • 10:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T312972)', diff saved to https://phabricator.wikimedia.org/P32826 and previous config saved to /var/cache/conftool/dbconfig/20220823-103850-marostegui.json
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T312972)', diff saved to https://phabricator.wikimedia.org/P32825 and previous config saved to /var/cache/conftool/dbconfig/20220823-103742-marostegui.json
  • 10:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 10:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T312972)', diff saved to https://phabricator.wikimedia.org/P32824 and previous config saved to /var/cache/conftool/dbconfig/20220823-103704-marostegui.json
  • 10:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: Repooling after cloning db1189', diff saved to https://phabricator.wikimedia.org/P32823 and previous config saved to /var/cache/conftool/dbconfig/20220823-102657-root.json
  • 10:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32822 and previous config saved to /var/cache/conftool/dbconfig/20220823-102622-root.json
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P32821 and previous config saved to /var/cache/conftool/dbconfig/20220823-102158-marostegui.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 40%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32820 and previous config saved to /var/cache/conftool/dbconfig/20220823-101117-root.json
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32819 and previous config saved to /var/cache/conftool/dbconfig/20220823-101048-root.json
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P32818 and previous config saved to /var/cache/conftool/dbconfig/20220823-100652-marostegui.json
  • 10:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:56 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.26 refs T314187
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 30%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32817 and previous config saved to /var/cache/conftool/dbconfig/20220823-095613-root.json
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32816 and previous config saved to /var/cache/conftool/dbconfig/20220823-095543-root.json
  • 09:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T312972)', diff saved to https://phabricator.wikimedia.org/P32815 and previous config saved to /var/cache/conftool/dbconfig/20220823-095146-marostegui.json
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T312972)', diff saved to https://phabricator.wikimedia.org/P32814 and previous config saved to /var/cache/conftool/dbconfig/20220823-095039-marostegui.json
  • 09:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 09:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312972)', diff saved to https://phabricator.wikimedia.org/P32813 and previous config saved to /var/cache/conftool/dbconfig/20220823-095018-marostegui.json
  • 09:49 hashar@deploy1002: Pruned MediaWiki: 1.39.0-wmf.23 (duration: 02m 20s)
  • 09:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:42 XioNoX: add NAT rule for frdev1002 on pfw3-eqiad - T315579
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 20%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32812 and previous config saved to /var/cache/conftool/dbconfig/20220823-094108-root.json
  • 09:41 hashar@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.26 refs T314187 (duration: 35m 32s)
  • 09:40 vgutierrez: Incremental roll-out of query-sorting (5%) - T314868
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32811 and previous config saved to /var/cache/conftool/dbconfig/20220823-094039-root.json
  • 09:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P32810 and previous config saved to /var/cache/conftool/dbconfig/20220823-093512-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32809 and previous config saved to /var/cache/conftool/dbconfig/20220823-092603-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P32808 and previous config saved to /var/cache/conftool/dbconfig/20220823-092534-root.json
  • 09:23 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 09:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 09:21 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 09:21 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P32807 and previous config saved to /var/cache/conftool/dbconfig/20220823-092006-marostegui.json
  • 09:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 8%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32806 and previous config saved to /var/cache/conftool/dbconfig/20220823-091059-root.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32805 and previous config saved to /var/cache/conftool/dbconfig/20220823-091029-root.json
  • 09:08 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:05 hashar@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.26 refs T314187
  • 09:05 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T312972)', diff saved to https://phabricator.wikimedia.org/P32804 and previous config saved to /var/cache/conftool/dbconfig/20220823-090500-marostegui.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T312972)', diff saved to https://phabricator.wikimedia.org/P32803 and previous config saved to /var/cache/conftool/dbconfig/20220823-090353-marostegui.json
  • 09:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 09:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T312972)', diff saved to https://phabricator.wikimedia.org/P32802 and previous config saved to /var/cache/conftool/dbconfig/20220823-090332-marostegui.json
  • 08:56 hashar@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/Translate/extension.json: Backport: Add declarations for TranslatablePage in extension.json (T315889) (duration: 03m 39s)
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32801 and previous config saved to /var/cache/conftool/dbconfig/20220823-085554-root.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32800 and previous config saved to /var/cache/conftool/dbconfig/20220823-085525-root.json
  • 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P32799 and previous config saved to /var/cache/conftool/dbconfig/20220823-084826-marostegui.json
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
  • 08:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2019.codfw.wmnet to cluster codfw and group B
  • 08:44 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2019.codfw.wmnet to cluster codfw and group B
  • 08:41 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 08:41 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 2%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32798 and previous config saved to /var/cache/conftool/dbconfig/20220823-084050-root.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P32797 and previous config saved to /var/cache/conftool/dbconfig/20220823-084020-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P32796 and previous config saved to /var/cache/conftool/dbconfig/20220823-083319-marostegui.json
  • 08:33 hashar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable message bundle on MetaWiki for WikiLearn (T311587) (duration: 03m 27s)
  • 08:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P32794 and previous config saved to /var/cache/conftool/dbconfig/20220823-082605-root.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32793 and previous config saved to /var/cache/conftool/dbconfig/20220823-082545-root.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32792 and previous config saved to /var/cache/conftool/dbconfig/20220823-082515-root.json
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162', diff saved to https://phabricator.wikimedia.org/P32790 and previous config saved to /var/cache/conftool/dbconfig/20220823-082336-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P32789 and previous config saved to /var/cache/conftool/dbconfig/20220823-082215-root.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T312972)', diff saved to https://phabricator.wikimedia.org/P32788 and previous config saved to /var/cache/conftool/dbconfig/20220823-081813-marostegui.json
  • 08:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T312972)', diff saved to https://phabricator.wikimedia.org/P32787 and previous config saved to /var/cache/conftool/dbconfig/20220823-081706-marostegui.json
  • 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312972)', diff saved to https://phabricator.wikimedia.org/P32786 and previous config saved to /var/cache/conftool/dbconfig/20220823-081645-marostegui.json
  • 08:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
  • 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32785 and previous config saved to /var/cache/conftool/dbconfig/20220823-080710-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P32784 and previous config saved to /var/cache/conftool/dbconfig/20220823-080139-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P32783 and previous config saved to /var/cache/conftool/dbconfig/20220823-074633-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T312972)', diff saved to https://phabricator.wikimedia.org/P32781 and previous config saved to /var/cache/conftool/dbconfig/20220823-073127-marostegui.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T312972)', diff saved to https://phabricator.wikimedia.org/P32780 and previous config saved to /var/cache/conftool/dbconfig/20220823-073020-marostegui.json
  • 07:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 07:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2019.codfw.wmnet with OS bullseye
  • 07:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T312972)', diff saved to https://phabricator.wikimedia.org/P32779 and previous config saved to /var/cache/conftool/dbconfig/20220823-072943-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P32778 and previous config saved to /var/cache/conftool/dbconfig/20220823-071437-marostegui.json
  • 07:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2019.codfw.wmnet with reason: host reimage
  • 07:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2019.codfw.wmnet with reason: host reimage
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P32777 and previous config saved to /var/cache/conftool/dbconfig/20220823-065931-marostegui.json
  • 06:50 kart_: Updated cxserver to 2022-08-22-093815-production (T308248, T308371)
  • 06:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2019.codfw.wmnet with OS bullseye
  • 06:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:48 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:45 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:45 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T312972)', diff saved to https://phabricator.wikimedia.org/P32776 and previous config saved to /var/cache/conftool/dbconfig/20220823-064425-marostegui.json
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1163 (T312972)', diff saved to https://phabricator.wikimedia.org/P32775 and previous config saved to /var/cache/conftool/dbconfig/20220823-064318-marostegui.json
  • 06:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T312972)', diff saved to https://phabricator.wikimedia.org/P32774 and previous config saved to /var/cache/conftool/dbconfig/20220823-064257-marostegui.json
  • 06:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2019.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 06:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2019.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 06:39 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:38 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P32773 and previous config saved to /var/cache/conftool/dbconfig/20220823-062751-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P32772 and previous config saved to /var/cache/conftool/dbconfig/20220823-061245-marostegui.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T312972)', diff saved to https://phabricator.wikimedia.org/P32771 and previous config saved to /var/cache/conftool/dbconfig/20220823-055739-marostegui.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T312972)', diff saved to https://phabricator.wikimedia.org/P32770 and previous config saved to /var/cache/conftool/dbconfig/20220823-053929-marostegui.json
  • 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 05:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 05:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T312972)', diff saved to https://phabricator.wikimedia.org/P32769 and previous config saved to /var/cache/conftool/dbconfig/20220823-053852-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P32768 and previous config saved to /var/cache/conftool/dbconfig/20220823-052346-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P32767 and previous config saved to /var/cache/conftool/dbconfig/20220823-050840-marostegui.json
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T312972)', diff saved to https://phabricator.wikimedia.org/P32765 and previous config saved to /var/cache/conftool/dbconfig/20220823-045334-marostegui.json
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131', diff saved to https://phabricator.wikimedia.org/P32764 and previous config saved to /var/cache/conftool/dbconfig/20220823-045322-root.json
  • 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T312972)', diff saved to https://phabricator.wikimedia.org/P32763 and previous config saved to /var/cache/conftool/dbconfig/20220823-045227-marostegui.json
  • 04:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 04:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 03:00 TimStarling: on wtp1025,wtp1027,wtp1029,wtp1031,wtp1033,wtp1035: set scaling_governor to performance T315398
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:41 TimStarling: on mw1411, mw1413, mw1419, mw1429, mw1431, mw1433: set energy_performance_preference to balance_performance T315398
  • 01:11 TimStarling: on mw1411, mw1413, mw1419, mw1429, mw1431, mw1433: set scaling_governor to powersave and energy_performance_preference to performance
  • 00:09 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1187.eqiad.wmnet with OS bullseye

2022-08-22

  • 23:55 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1187.eqiad.wmnet with reason: host reimage
  • 23:52 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1187.eqiad.wmnet with reason: host reimage
  • 23:39 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1187.eqiad.wmnet with OS bullseye
  • 23:10 tstarling@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(appservers|api)-ro
  • 23:04 TimStarling: Re-enable multi-DC mode on testwiki, test2wiki and mediawiki.org
  • 21:56 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin2002 - T315604
  • 21:55 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin2002 - T315604
  • 21:46 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin2002 - T315604
  • 21:45 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin2002 - T315604
  • 21:26 sbassett: Deployed security fix for T310763
  • 21:17 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin2002 - T315604
  • 21:17 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: relforge elasticsearch and plugin upgrade - bking@cumin2002 - T315604
  • 21:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf2002.codfw.wmnet with OS bullseye
  • 21:06 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host db1185.eqiad.wmnet
  • 21:04 pt1979@cumin1001: START - Cookbook sre.hosts.dhcp for host db1185.eqiad.wmnet
  • 21:02 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1185.eqiad.wmnet with OS bullseye
  • 21:01 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
  • 20:59 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1185.eqiad.wmnet with OS bullseye
  • 20:59 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
  • 20:59 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1195.eqiad.wmnet with OS bullseye
  • 20:58 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bullseye
  • 20:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf2002.codfw.wmnet with reason: host reimage
  • 20:53 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf2002.codfw.wmnet with reason: host reimage
  • 20:51 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.25/skins/Vector/: e0ff763: Layout: Restore disabling of max width on certain pages (T315460) (duration: 03m 37s)
  • 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc-wf2002.codfw.wmnet with OS bullseye
  • 20:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf2001.codfw.wmnet with OS bullseye
  • 20:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf2001.codfw.wmnet with reason: host reimage
  • 20:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf2001.codfw.wmnet with reason: host reimage
  • 20:04 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@5ac442f]: Use instance specific HDFS cache on analytics (duration: 00m 40s)
  • 20:03 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@5ac442f]: Use instance specific HDFS cache on analytics
  • 19:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mc-wf2001.codfw.wmnet with OS bullseye
  • 19:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-wf2002']
  • 19:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-wf2002']
  • 19:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-wf2001']
  • 19:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-wf2001']
  • 19:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:11 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics_test@5ac442f]: Use instance specific HDFS cache on analytics_test (duration: 00m 17s)
  • 19:11 xcollazo@deploy1002: Started deploy [airflow-dags/analytics_test@5ac442f]: Use instance specific HDFS cache on analytics_test
  • 19:04 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics_test@9edd1ab]: Use instance specific HDFS cache on analytics_test (duration: 00m 05s)
  • 19:04 xcollazo@deploy1002: Started deploy [airflow-dags/analytics_test@9edd1ab]: Use instance specific HDFS cache on analytics_test
  • 18:59 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@5ac442f]: Use instance specific HDFS cache on platform_eng (duration: 00m 10s)
  • 18:59 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@5ac442f]: Use instance specific HDFS cache on platform_eng
  • 18:54 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc-wf2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-wf2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:28 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mc-wf2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:26 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:19 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-wf2002
  • 18:19 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-wf2002
  • 18:18 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-wf2001
  • 18:18 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-wf2001
  • 18:12 vgutierrez: disable origin coalescing in ats@cp601[56] - T315911
  • 17:15 damilare: payments-wiki upgraded from f9f91f1f to fb50c013
  • 15:52 XioNoX: un-drain ulsfo-codfw circuit for Lumen hot cut - T300716
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32759 and previous config saved to /var/cache/conftool/dbconfig/20220822-152000-root.json
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32758 and previous config saved to /var/cache/conftool/dbconfig/20220822-150456-root.json
  • 14:54 XioNoX: drain ulsfo-codfw circuit for Lumen hot cut - T300716
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2144', diff saved to https://phabricator.wikimedia.org/P32757 and previous config saved to /var/cache/conftool/dbconfig/20220822-145040-marostegui.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 60%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32756 and previous config saved to /var/cache/conftool/dbconfig/20220822-144951-root.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32755 and previous config saved to /var/cache/conftool/dbconfig/20220822-144943-root.json
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Restore x2 weight', diff saved to https://phabricator.wikimedia.org/P32754 and previous config saved to /var/cache/conftool/dbconfig/20220822-144937-marostegui.json
  • 14:38 moritzm: draining ganeti2019 for reimage T311686
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2144 to x2 primary T315853', diff saved to https://phabricator.wikimedia.org/P32752 and previous config saved to /var/cache/conftool/dbconfig/20220822-143243-root.json
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32751 and previous config saved to /var/cache/conftool/dbconfig/20220822-143212-root.json
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32750 and previous config saved to /var/cache/conftool/dbconfig/20220822-143040-root.json
  • 14:24 marostegui: Starting x2 codfw failover from db2144 to db2142 - T315853
  • 14:23 marostegui@cumin1001: dbctl commit (dc=all): 'Set db2142 with weight 0 T313811', diff saved to https://phabricator.wikimedia.org/P32749 and previous config saved to /var/cache/conftool/dbconfig/20220822-142312-marostegui.json
  • 14:22 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 T313811
  • 14:22 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 T313811
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 40%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32748 and previous config saved to /var/cache/conftool/dbconfig/20220822-141708-root.json
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32747 and previous config saved to /var/cache/conftool/dbconfig/20220822-141535-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 30%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32746 and previous config saved to /var/cache/conftool/dbconfig/20220822-140203-root.json
  • 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32745 and previous config saved to /var/cache/conftool/dbconfig/20220822-140030-root.json
  • 13:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs[1014-1016].eqiad.wmnet
  • 13:48 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs[1014-1016].eqiad.wmnet
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 20%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32744 and previous config saved to /var/cache/conftool/dbconfig/20220822-134658-root.json
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 10%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32743 and previous config saved to /var/cache/conftool/dbconfig/20220822-134526-root.json
  • 13:44 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 13:39 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1002.eqiad.wmnet
  • 13:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wdqs[1014-1016].eqiad.wmnet with reason: T314890
  • 13:37 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on wdqs[1014-1016].eqiad.wmnet with reason: T314890
  • 13:37 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 10%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32742 and previous config saved to /var/cache/conftool/dbconfig/20220822-133154-root.json
  • 13:31 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:31 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
  • 13:31 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 5%: Repooling after schema change', diff saved to https://phabricator.wikimedia.org/P32741 and previous config saved to /var/cache/conftool/dbconfig/20220822-133021-root.json
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P32740 and previous config saved to /var/cache/conftool/dbconfig/20220822-132808-root.json
  • 13:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:25 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1001.eqiad.wmnet
  • 13:17 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 8%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32738 and previous config saved to /var/cache/conftool/dbconfig/20220822-131649-root.json
  • 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:13 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/includes: Backport: SiteStats: Make sure initSiteStats.php re-distribute values (T315693) (duration: 03m 32s)
  • 13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 13:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312972)', diff saved to https://phabricator.wikimedia.org/P32737 and previous config saved to /var/cache/conftool/dbconfig/20220822-130732-marostegui.json
  • 13:03 jynus: disabled backup scheduling for backup1002, backup2002 T315864
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32735 and previous config saved to /var/cache/conftool/dbconfig/20220822-130144-root.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P32734 and previous config saved to /var/cache/conftool/dbconfig/20220822-125226-marostegui.json
  • 12:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2002.codfw.wmnet
  • 12:48 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 2%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32732 and previous config saved to /var/cache/conftool/dbconfig/20220822-124640-root.json
  • 12:45 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
  • 12:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
  • 12:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P32731 and previous config saved to /var/cache/conftool/dbconfig/20220822-123720-marostegui.json
  • 12:33 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'es1023 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32730 and previous config saved to /var/cache/conftool/dbconfig/20220822-123135-root.json
  • 12:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2006.wikimedia.org
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T312972)', diff saved to https://phabricator.wikimedia.org/P32729 and previous config saved to /var/cache/conftool/dbconfig/20220822-122214-marostegui.json
  • 12:20 jayme: kubernetes1016:~$ sudo systemctl reset-failed ifup@ens13.service - T273026
  • 12:20 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2006.wikimedia.org
  • 12:20 moritzm: fix up network config for ldap-replica2006 T273026
  • 12:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1023 for reboot T315542', diff saved to https://phabricator.wikimedia.org/P32728 and previous config saved to /var/cache/conftool/dbconfig/20220822-121401-root.json
  • 12:13 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Enable writes on es5 T315542 (duration: 03m 18s)
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1024 to es5 primary T315542', diff saved to https://phabricator.wikimedia.org/P32727 and previous config saved to /var/cache/conftool/dbconfig/20220822-120611-root.json
  • 12:05 marostegui: Starting es5 eqiad failover from es1023 to es1024 - T315542
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1024 with weight 10 T315542', diff saved to https://phabricator.wikimedia.org/P32726 and previous config saved to /var/cache/conftool/dbconfig/20220822-120141-root.json
  • 12:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:51 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Disable writes on es5 T315542 (duration: 03m 08s)
  • 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Switchover es5 T315542
  • 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Switchover es5 T315542
  • 11:36 moritzm: installing libdatetime-timezone-perl updates from SUA update
  • 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 100%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32725 and previous config saved to /var/cache/conftool/dbconfig/20220822-113352-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T312972)', diff saved to https://phabricator.wikimedia.org/P32724 and previous config saved to /var/cache/conftool/dbconfig/20220822-112829-marostegui.json
  • 11:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312972)', diff saved to https://phabricator.wikimedia.org/P32723 and previous config saved to /var/cache/conftool/dbconfig/20220822-112808-marostegui.json
  • 11:25 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dse-k8s-ctrl1001.eqiad.wmnet
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 75%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32722 and previous config saved to /var/cache/conftool/dbconfig/20220822-111847-root.json
  • 11:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P32721 and previous config saved to /var/cache/conftool/dbconfig/20220822-111301-marostegui.json
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 60%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32720 and previous config saved to /var/cache/conftool/dbconfig/20220822-110342-root.json
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P32719 and previous config saved to /var/cache/conftool/dbconfig/20220822-105755-marostegui.json
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 50%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32718 and previous config saved to /var/cache/conftool/dbconfig/20220822-104838-root.json
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T312972)', diff saved to https://phabricator.wikimedia.org/P32717 and previous config saved to /var/cache/conftool/dbconfig/20220822-104249-marostegui.json
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 40%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32716 and previous config saved to /var/cache/conftool/dbconfig/20220822-103333-root.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 30%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32715 and previous config saved to /var/cache/conftool/dbconfig/20220822-101828-root.json
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 20%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32714 and previous config saved to /var/cache/conftool/dbconfig/20220822-100324-root.json
  • 10:00 vgutierrez: Incremental roll-out of query-sorting (1%) - T314868
  • 09:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster1001.eqiad.wmnet
  • 09:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:41 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
  • 09:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:38 XioNoX: push new policy on pfw3-eqiad - T315578
  • 09:36 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 8%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32709 and previous config saved to /var/cache/conftool/dbconfig/20220822-093314-root.json
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P32708 and previous config saved to /var/cache/conftool/dbconfig/20220822-092706-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 5%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32706 and previous config saved to /var/cache/conftool/dbconfig/20220822-091810-root.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P32705 and previous config saved to /var/cache/conftool/dbconfig/20220822-091200-marostegui.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 2%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32704 and previous config saved to /var/cache/conftool/dbconfig/20220822-090305-root.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T312972)', diff saved to https://phabricator.wikimedia.org/P32703 and previous config saved to /var/cache/conftool/dbconfig/20220822-085654-marostegui.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T312972)', diff saved to https://phabricator.wikimedia.org/P32702 and previous config saved to /var/cache/conftool/dbconfig/20220822-085014-marostegui.json
  • 08:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 08:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312972)', diff saved to https://phabricator.wikimedia.org/P32701 and previous config saved to /var/cache/conftool/dbconfig/20220822-084942-marostegui.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32700 and previous config saved to /var/cache/conftool/dbconfig/20220822-084800-root.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020 ', diff saved to https://phabricator.wikimedia.org/P32699 and previous config saved to /var/cache/conftool/dbconfig/20220822-084359-root.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1020 (re)pooling @ 1%: Repooling after reboot', diff saved to https://phabricator.wikimedia.org/P32698 and previous config saved to /var/cache/conftool/dbconfig/20220822-084335-root.json
  • 08:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P32697 and previous config saved to /var/cache/conftool/dbconfig/20220822-083436-marostegui.json
  • 08:33 moritzm: powercycling wdqs1014 (unresponsive via botched wdqs-categories process
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1020 for reboot T310485', diff saved to https://phabricator.wikimedia.org/P32696 and previous config saved to /var/cache/conftool/dbconfig/20220822-083341-root.json
  • 08:32 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Enable writes on es4 T315540 (duration: 03m 17s)
  • 08:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32695 and previous config saved to /var/cache/conftool/dbconfig/20220822-082958-root.json
  • 08:29 oblivian@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Introducing variables for php 7.4 migration (duration: 03m 39s)
  • 08:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1021 to es4 primary T315540', diff saved to https://phabricator.wikimedia.org/P32694 and previous config saved to /var/cache/conftool/dbconfig/20220822-082208-root.json
  • 08:21 marostegui: Starting es4 eqiad failover from es1020 to es1021 - T315540
  • 08:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P32693 and previous config saved to /var/cache/conftool/dbconfig/20220822-081930-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Set es1021 with weight 10 T315540', diff saved to https://phabricator.wikimedia.org/P32692 and previous config saved to /var/cache/conftool/dbconfig/20220822-081817-root.json
  • 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:18 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Switchover es4 T315540
  • 08:17 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Switchover es4 T315540
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32691 and previous config saved to /var/cache/conftool/dbconfig/20220822-081453-root.json
  • 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:11 marostegui@deploy1002: Synchronized wmf-config/db-production.php: Disable writes on es4 T315540 (duration: 03m 35s)
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T312972)', diff saved to https://phabricator.wikimedia.org/P32690 and previous config saved to /var/cache/conftool/dbconfig/20220822-080424-marostegui.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32689 and previous config saved to /var/cache/conftool/dbconfig/20220822-080020-root.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32688 and previous config saved to /var/cache/conftool/dbconfig/20220822-080012-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32687 and previous config saved to /var/cache/conftool/dbconfig/20220822-075949-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32686 and previous config saved to /var/cache/conftool/dbconfig/20220822-075941-root.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2182 to dbctl T311494', diff saved to https://phabricator.wikimedia.org/P32685 and previous config saved to /var/cache/conftool/dbconfig/20220822-075359-marostegui.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32684 and previous config saved to /var/cache/conftool/dbconfig/20220822-074515-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32683 and previous config saved to /var/cache/conftool/dbconfig/20220822-074507-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P32682 and previous config saved to /var/cache/conftool/dbconfig/20220822-074443-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32681 and previous config saved to /var/cache/conftool/dbconfig/20220822-074437-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32677 and previous config saved to /var/cache/conftool/dbconfig/20220822-073010-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32676 and previous config saved to /var/cache/conftool/dbconfig/20220822-073002-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32675 and previous config saved to /var/cache/conftool/dbconfig/20220822-072938-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32674 and previous config saved to /var/cache/conftool/dbconfig/20220822-072932-root.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T312972)', diff saved to https://phabricator.wikimedia.org/P32673 and previous config saved to /var/cache/conftool/dbconfig/20220822-072339-marostegui.json
  • 07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 07:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32672 and previous config saved to /var/cache/conftool/dbconfig/20220822-071506-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32671 and previous config saved to /var/cache/conftool/dbconfig/20220822-071458-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32670 and previous config saved to /var/cache/conftool/dbconfig/20220822-071433-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32669 and previous config saved to /var/cache/conftool/dbconfig/20220822-071427-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2181 to dbctl T311494', diff saved to https://phabricator.wikimedia.org/P32668 and previous config saved to /var/cache/conftool/dbconfig/20220822-071153-marostegui.json
  • 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 7 hosts with reason: Maintenance
  • 07:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 7 hosts with reason: Maintenance
  • 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 07:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312972)', diff saved to https://phabricator.wikimedia.org/P32667 and previous config saved to /var/cache/conftool/dbconfig/20220822-070804-marostegui.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32666 and previous config saved to /var/cache/conftool/dbconfig/20220822-070001-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32665 and previous config saved to /var/cache/conftool/dbconfig/20220822-065953-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P32664 and previous config saved to /var/cache/conftool/dbconfig/20220822-065929-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32663 and previous config saved to /var/cache/conftool/dbconfig/20220822-065923-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32662 and previous config saved to /var/cache/conftool/dbconfig/20220822-065258-marostegui.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32661 and previous config saved to /var/cache/conftool/dbconfig/20220822-064457-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32660 and previous config saved to /var/cache/conftool/dbconfig/20220822-064448-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32659 and previous config saved to /var/cache/conftool/dbconfig/20220822-064424-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32658 and previous config saved to /var/cache/conftool/dbconfig/20220822-064418-root.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 db1142 db1096', diff saved to https://phabricator.wikimedia.org/P32657 and previous config saved to /var/cache/conftool/dbconfig/20220822-063857-root.json
  • 06:38 marostegui: Install 10.4.26 on db1119, db1142, db1096 T315411
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32656 and previous config saved to /var/cache/conftool/dbconfig/20220822-063752-marostegui.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2180 to dbctl T311494', diff saved to https://phabricator.wikimedia.org/P32655 and previous config saved to /var/cache/conftool/dbconfig/20220822-063533-marostegui.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T312972)', diff saved to https://phabricator.wikimedia.org/P32654 and previous config saved to /var/cache/conftool/dbconfig/20220822-062246-marostegui.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T312972)', diff saved to https://phabricator.wikimedia.org/P32653 and previous config saved to /var/cache/conftool/dbconfig/20220822-061600-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2179 to dbctl T311494', diff saved to https://phabricator.wikimedia.org/P32652 and previous config saved to /var/cache/conftool/dbconfig/20220822-061553-marostegui.json
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 06:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 06:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2178 to dbctl T311494', diff saved to https://phabricator.wikimedia.org/P32651 and previous config saved to /var/cache/conftool/dbconfig/20220822-055446-marostegui.json
  • 00:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:25 tstarling@deploy1002: Synchronized php-1.39.0-wmf.25/includes/objectcache/SqlBagOStuff.php: fix modtoken comparison T315271 (duration: 03m 45s)

2022-08-21

  • 14:36 Krinkle: krinkle@mwmaint1002 foreachwikiindblist 'all - small' deleteEqualMessages.php
  • 14:33 Krinkle: krinkle@mwmaint1002 foreachwikiindblist 'small - closed' deleteEqualMessages.php
  • 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db[1111,1127,1132].eqiad.wmnet with reason: 10.6 being 10.6
  • 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db[1111,1127,1132].eqiad.wmnet with reason: 10.6 being 10.6
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool 10.6 hosts', diff saved to https://phabricator.wikimedia.org/P32649 and previous config saved to /var/cache/conftool/dbconfig/20220821-123038-ladsgroup.json
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P32648 and previous config saved to /var/cache/conftool/dbconfig/20220821-121140-root.json
  • 09:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T314041)', diff saved to https://phabricator.wikimedia.org/P32647 and previous config saved to /var/cache/conftool/dbconfig/20220821-092727-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32646 and previous config saved to /var/cache/conftool/dbconfig/20220821-091221-ladsgroup.json
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32645 and previous config saved to /var/cache/conftool/dbconfig/20220821-085716-ladsgroup.json
  • 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T314041)', diff saved to https://phabricator.wikimedia.org/P32644 and previous config saved to /var/cache/conftool/dbconfig/20220821-084209-ladsgroup.json
  • 04:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T314041)', diff saved to https://phabricator.wikimedia.org/P32643 and previous config saved to /var/cache/conftool/dbconfig/20220821-042415-ladsgroup.json
  • 04:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 04:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32642 and previous config saved to /var/cache/conftool/dbconfig/20220821-033020-ladsgroup.json
  • 03:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32641 and previous config saved to /var/cache/conftool/dbconfig/20220821-031514-ladsgroup.json
  • 03:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32640 and previous config saved to /var/cache/conftool/dbconfig/20220821-030008-ladsgroup.json
  • 02:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32639 and previous config saved to /var/cache/conftool/dbconfig/20220821-024502-ladsgroup.json
  • 01:35 rzl@cumin2002: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P32638 and previous config saved to /var/cache/conftool/dbconfig/20220821-013504-rzl.json

2022-08-20

  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32637 and previous config saved to /var/cache/conftool/dbconfig/20220820-221826-ladsgroup.json
  • 22:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
  • 17:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
  • 17:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 17:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32636 and previous config saved to /var/cache/conftool/dbconfig/20220820-173723-ladsgroup.json
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32635 and previous config saved to /var/cache/conftool/dbconfig/20220820-172217-ladsgroup.json
  • 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32634 and previous config saved to /var/cache/conftool/dbconfig/20220820-170711-ladsgroup.json
  • 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32633 and previous config saved to /var/cache/conftool/dbconfig/20220820-165203-ladsgroup.json
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32632 and previous config saved to /var/cache/conftool/dbconfig/20220820-115816-ladsgroup.json
  • 11:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T314041)', diff saved to https://phabricator.wikimedia.org/P32631 and previous config saved to /var/cache/conftool/dbconfig/20220820-115755-ladsgroup.json
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32630 and previous config saved to /var/cache/conftool/dbconfig/20220820-114249-ladsgroup.json
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32629 and previous config saved to /var/cache/conftool/dbconfig/20220820-112744-ladsgroup.json
  • 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T314041)', diff saved to https://phabricator.wikimedia.org/P32628 and previous config saved to /var/cache/conftool/dbconfig/20220820-111238-ladsgroup.json
  • 06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T314041)', diff saved to https://phabricator.wikimedia.org/P32627 and previous config saved to /var/cache/conftool/dbconfig/20220820-065528-ladsgroup.json
  • 06:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 06:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 06:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T314041)', diff saved to https://phabricator.wikimedia.org/P32626 and previous config saved to /var/cache/conftool/dbconfig/20220820-065507-ladsgroup.json
  • 06:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32625 and previous config saved to /var/cache/conftool/dbconfig/20220820-064001-ladsgroup.json
  • 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32624 and previous config saved to /var/cache/conftool/dbconfig/20220820-062455-ladsgroup.json
  • 06:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T314041)', diff saved to https://phabricator.wikimedia.org/P32623 and previous config saved to /var/cache/conftool/dbconfig/20220820-060949-ladsgroup.json
  • 01:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T314041)', diff saved to https://phabricator.wikimedia.org/P32622 and previous config saved to /var/cache/conftool/dbconfig/20220820-012602-ladsgroup.json
  • 01:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 01:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance

2022-08-19

  • 23:37 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on phab2002.codfw.wmnet with reason: new host in setup
  • 23:37 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on phab2002.codfw.wmnet with reason: new host in setup
  • 23:35 mutante: phab2002 - service phd: stopped phabricator_logmail: disabled, phabricator dumps: disabled, systemd::sysuser: not used (all via Hiera switches) - T280597
  • 23:33 mutante: phab2002 - re-enabled puppet, sshd config ListenAddress fixed by puppet gerrit:824797 - now has phabricator prod role but without LVS/git-ssh - no more error in puppet run - T280597
  • 23:02 mutante: phab2002 - disable puppet, fix sshd_config, restart sshd
  • 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 18:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T312972)', diff saved to https://phabricator.wikimedia.org/P32621 and previous config saved to /var/cache/conftool/dbconfig/20220819-182835-marostegui.json
  • 18:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P32620 and previous config saved to /var/cache/conftool/dbconfig/20220819-181329-marostegui.json
  • 17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P32619 and previous config saved to /var/cache/conftool/dbconfig/20220819-175823-marostegui.json
  • 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T312972)', diff saved to https://phabricator.wikimedia.org/P32618 and previous config saved to /var/cache/conftool/dbconfig/20220819-174317-marostegui.json
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T312972)', diff saved to https://phabricator.wikimedia.org/P32617 and previous config saved to /var/cache/conftool/dbconfig/20220819-171052-marostegui.json
  • 17:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 17:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 17:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32616 and previous config saved to /var/cache/conftool/dbconfig/20220819-171031-marostegui.json
  • 16:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P32615 and previous config saved to /var/cache/conftool/dbconfig/20220819-165525-marostegui.json
  • 16:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P32614 and previous config saved to /var/cache/conftool/dbconfig/20220819-164019-marostegui.json
  • 16:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32613 and previous config saved to /var/cache/conftool/dbconfig/20220819-162513-marostegui.json
  • 16:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32612 and previous config saved to /var/cache/conftool/dbconfig/20220819-162253-marostegui.json
  • 16:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 16:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 16:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32611 and previous config saved to /var/cache/conftool/dbconfig/20220819-162232-marostegui.json
  • 16:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P32610 and previous config saved to /var/cache/conftool/dbconfig/20220819-160726-marostegui.json
  • 15:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32609 and previous config saved to /var/cache/conftool/dbconfig/20220819-155611-ladsgroup.json
  • 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P32608 and previous config saved to /var/cache/conftool/dbconfig/20220819-155220-marostegui.json
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P32607 and previous config saved to /var/cache/conftool/dbconfig/20220819-154105-ladsgroup.json
  • 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32606 and previous config saved to /var/cache/conftool/dbconfig/20220819-153714-marostegui.json
  • 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32605 and previous config saved to /var/cache/conftool/dbconfig/20220819-153554-marostegui.json
  • 15:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 15:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32604 and previous config saved to /var/cache/conftool/dbconfig/20220819-153533-marostegui.json
  • 15:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2024.codfw.wmnet with OS bullseye
  • 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P32603 and previous config saved to /var/cache/conftool/dbconfig/20220819-152559-ladsgroup.json
  • 15:25 dancy@deploy1002: Installation of scap version "4.14.0" completed for 556 hosts
  • 15:23 dancy@deploy1002: Installing scap version "4.14.0" for 556 hosts
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P32602 and previous config saved to /var/cache/conftool/dbconfig/20220819-152027-marostegui.json
  • 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2024.codfw.wmnet with reason: host reimage
  • 15:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2023.codfw.wmnet with OS bullseye
  • 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:12 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2024.codfw.wmnet with reason: host reimage
  • 15:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:11 dancy@deploy1002: Finished scap: Backport for Add back fixed width to main content (T315653) (duration: 06m 59s)
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32601 and previous config saved to /var/cache/conftool/dbconfig/20220819-151053-ladsgroup.json
  • 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P32600 and previous config saved to /var/cache/conftool/dbconfig/20220819-150521-marostegui.json
  • 15:04 dancy@deploy1002: Started scap: Backport for Add back fixed width to main content (T315653)
  • 14:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2023.codfw.wmnet with reason: host reimage
  • 14:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2023.codfw.wmnet with reason: host reimage
  • 14:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2024.codfw.wmnet with OS bullseye
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32599 and previous config saved to /var/cache/conftool/dbconfig/20220819-145015-marostegui.json
  • 14:48 dancy@deploy1002: backport aborted: (duration: 03m 01s)
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T312972)', diff saved to https://phabricator.wikimedia.org/P32598 and previous config saved to /var/cache/conftool/dbconfig/20220819-144755-marostegui.json
  • 14:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312972)', diff saved to https://phabricator.wikimedia.org/P32597 and previous config saved to /var/cache/conftool/dbconfig/20220819-144734-marostegui.json
  • 14:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2023.codfw.wmnet with OS bullseye
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P32596 and previous config saved to /var/cache/conftool/dbconfig/20220819-143228-marostegui.json
  • 14:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2024']
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P32595 and previous config saved to /var/cache/conftool/dbconfig/20220819-141722-marostegui.json
  • 14:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2024']
  • 14:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2024.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T312972)', diff saved to https://phabricator.wikimedia.org/P32594 and previous config saved to /var/cache/conftool/dbconfig/20220819-140216-marostegui.json
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T312972)', diff saved to https://phabricator.wikimedia.org/P32593 and previous config saved to /var/cache/conftool/dbconfig/20220819-135956-marostegui.json
  • 13:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 13:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312972)', diff saved to https://phabricator.wikimedia.org/P32592 and previous config saved to /var/cache/conftool/dbconfig/20220819-135917-marostegui.json
  • 13:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2024.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:45 marostegui: Install 10.4.26 on db2111 db2148 db2124
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P32591 and previous config saved to /var/cache/conftool/dbconfig/20220819-134411-marostegui.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P32590 and previous config saved to /var/cache/conftool/dbconfig/20220819-132905-marostegui.json
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T312972)', diff saved to https://phabricator.wikimedia.org/P32589 and previous config saved to /var/cache/conftool/dbconfig/20220819-131359-marostegui.json
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T312972)', diff saved to https://phabricator.wikimedia.org/P32588 and previous config saved to /var/cache/conftool/dbconfig/20220819-131139-marostegui.json
  • 13:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 13:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 13:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 13:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 13:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 9 hosts with reason: Maintenance
  • 13:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 9 hosts with reason: Maintenance
  • 13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 12:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:44 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I0c45b6 (duration: 03m 24s)
  • 11:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 11 hosts with reason: Maintenance
  • 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 11 hosts with reason: Maintenance
  • 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 11:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32587 and previous config saved to /var/cache/conftool/dbconfig/20220819-114703-marostegui.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P32586 and previous config saved to /var/cache/conftool/dbconfig/20220819-113157-marostegui.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P32584 and previous config saved to /var/cache/conftool/dbconfig/20220819-111651-marostegui.json
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32583 and previous config saved to /var/cache/conftool/dbconfig/20220819-110145-marostegui.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32582 and previous config saved to /var/cache/conftool/dbconfig/20220819-105934-marostegui.json
  • 10:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 10:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32581 and previous config saved to /var/cache/conftool/dbconfig/20220819-105906-marostegui.json
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T314041)', diff saved to https://phabricator.wikimedia.org/P32580 and previous config saved to /var/cache/conftool/dbconfig/20220819-105212-ladsgroup.json
  • 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T314041)', diff saved to https://phabricator.wikimedia.org/P32579 and previous config saved to /var/cache/conftool/dbconfig/20220819-105151-ladsgroup.json
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P32578 and previous config saved to /var/cache/conftool/dbconfig/20220819-104400-marostegui.json
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P32577 and previous config saved to /var/cache/conftool/dbconfig/20220819-103645-ladsgroup.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P32576 and previous config saved to /var/cache/conftool/dbconfig/20220819-102854-marostegui.json
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P32575 and previous config saved to /var/cache/conftool/dbconfig/20220819-102139-ladsgroup.json
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32574 and previous config saved to /var/cache/conftool/dbconfig/20220819-101348-marostegui.json
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T314041)', diff saved to https://phabricator.wikimedia.org/P32573 and previous config saved to /var/cache/conftool/dbconfig/20220819-100633-ladsgroup.json
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32572 and previous config saved to /var/cache/conftool/dbconfig/20220819-095035-marostegui.json
  • 09:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 09:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 09:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32571 and previous config saved to /var/cache/conftool/dbconfig/20220819-095014-marostegui.json
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P32570 and previous config saved to /var/cache/conftool/dbconfig/20220819-093508-marostegui.json
  • 09:23 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 09:21 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:21 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P32569 and previous config saved to /var/cache/conftool/dbconfig/20220819-092002-marostegui.json
  • 09:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:14 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:11 topranks: running authdns-update on auth1001 to add new include to 0.0.5.e.2.f.d.0.1.0.0.2.ip6.arpa. zone
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32568 and previous config saved to /var/cache/conftool/dbconfig/20220819-090456-marostegui.json
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T312972)', diff saved to https://phabricator.wikimedia.org/P32567 and previous config saved to /var/cache/conftool/dbconfig/20220819-090146-marostegui.json
  • 09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312972)', diff saved to https://phabricator.wikimedia.org/P32566 and previous config saved to /var/cache/conftool/dbconfig/20220819-090124-marostegui.json
  • 08:56 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 08:56 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P32565 and previous config saved to /var/cache/conftool/dbconfig/20220819-084618-marostegui.json
  • 08:44 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P32564 and previous config saved to /var/cache/conftool/dbconfig/20220819-083112-marostegui.json
  • 08:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2067.codfw.wmnet
  • 08:16 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2067.codfw.wmnet
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T312972)', diff saved to https://phabricator.wikimedia.org/P32563 and previous config saved to /var/cache/conftool/dbconfig/20220819-081606-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T312972)', diff saved to https://phabricator.wikimedia.org/P32562 and previous config saved to /var/cache/conftool/dbconfig/20220819-081356-marostegui.json
  • 08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T312972)', diff saved to https://phabricator.wikimedia.org/P32561 and previous config saved to /var/cache/conftool/dbconfig/20220819-081317-marostegui.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P32559 and previous config saved to /var/cache/conftool/dbconfig/20220819-075812-marostegui.json
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P32558 and previous config saved to /var/cache/conftool/dbconfig/20220819-074306-marostegui.json
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T312972)', diff saved to https://phabricator.wikimedia.org/P32557 and previous config saved to /var/cache/conftool/dbconfig/20220819-072800-marostegui.json
  • 07:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T312972)', diff saved to https://phabricator.wikimedia.org/P32556 and previous config saved to /var/cache/conftool/dbconfig/20220819-072422-marostegui.json
  • 07:20 Amir1: killing cswiki's refreshlinksrecom script T299021
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T314041)', diff saved to https://phabricator.wikimedia.org/P32555 and previous config saved to /var/cache/conftool/dbconfig/20220819-071934-ladsgroup.json
  • 07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P32553 and previous config saved to /var/cache/conftool/dbconfig/20220819-070916-marostegui.json
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P32552 and previous config saved to /var/cache/conftool/dbconfig/20220819-065409-marostegui.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T312972)', diff saved to https://phabricator.wikimedia.org/P32551 and previous config saved to /var/cache/conftool/dbconfig/20220819-063903-marostegui.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1181 (T312972)', diff saved to https://phabricator.wikimedia.org/P32550 and previous config saved to /var/cache/conftool/dbconfig/20220819-061649-marostegui.json
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312972)', diff saved to https://phabricator.wikimedia.org/P32549 and previous config saved to /var/cache/conftool/dbconfig/20220819-061628-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P32548 and previous config saved to /var/cache/conftool/dbconfig/20220819-061515-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P32547 and previous config saved to /var/cache/conftool/dbconfig/20220819-060122-marostegui.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P32546 and previous config saved to /var/cache/conftool/dbconfig/20220819-054616-marostegui.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T312972)', diff saved to https://phabricator.wikimedia.org/P32544 and previous config saved to /var/cache/conftool/dbconfig/20220819-053110-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T312972)', diff saved to https://phabricator.wikimedia.org/P32543 and previous config saved to /var/cache/conftool/dbconfig/20220819-052900-marostegui.json
  • 05:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 05:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 05:20 marostegui: Install 10.6.9 on db2122 and db2146
  • 04:26 hashar@deploy1002: Finished deploy [integration/docroot@09eb565]: zuul: Fix/remove links to non-existent Grafana graphs - T307405 (duration: 00m 13s)
  • 04:26 hashar@deploy1002: Started deploy [integration/docroot@09eb565]: zuul: Fix/remove links to non-existent Grafana graphs - T307405
  • 01:38 tstarling@deploy1002: Synchronized php-1.39.0-wmf.25/includes/objectcache/SqlBagOStuff.php: fix potential mainstash exception file 2 T315274 (duration: 03m 30s)
  • 01:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:31 tstarling@deploy1002: Synchronized php-1.39.0-wmf.25/includes/libs/rdbms/database/DBConnRef.php: fix potential mainstash exception file 1 T315274 (duration: 03m 21s)
  • 01:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-08-18

  • 23:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:19 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.39.0-wmf.25 refs T314186
  • 23:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:12 dancy@deploy1002: Finished scap: Backport for gerrit:824573 Revert "Set initial-zoom via JavaScript to avoid font-scaling issue in iPad" (duration: 15m 27s)
  • 23:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:57 dancy@deploy1002: Started scap: Backport for gerrit:824573 Revert "Set initial-zoom via JavaScript to avoid font-scaling issue in iPad"
  • 22:53 mutante: phab1001, phab2001: sudo rm /usr/local/sbin/phab_deploy_ensure_config_ownership (follow-up gerrit:824547 T313953)
  • 22:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:36 dancy@deploy1002: backport aborted: (duration: 00m 12s)
  • 22:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:32 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:31 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.39.0-wmf.23 refs T314186
  • 22:25 dancy: Rolling the train back to group1 due to T315620
  • 22:25 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@ff0a0e2]: (no justification provided) (duration: 00m 19s)
  • 22:24 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@ff0a0e2]: (no justification provided)
  • 22:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2024.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:09 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2024.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:05 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:02 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 21:48 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes2024
  • 21:47 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes2024
  • 21:20 brennen: end of UTC late backport and config window
  • 21:20 brennen@deploy1002: Finished scap: Set initial-zoom via JavaScript to avoid font-scaling issue in iPad (T311795) (duration: 10m 16s)
  • 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: elastic 7 upgrade
  • 21:14 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: elastic 7 upgrade
  • 21:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:09 brennen@deploy1002: Started scap: Set initial-zoom via JavaScript to avoid font-scaling issue in iPad (T311795)
  • 21:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-stretch2002.codfw.wmnet with OS bullseye
  • 20:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-stretch2002.codfw.wmnet with OS bullseye
  • 20:39 brennen@deploy1002: Finished scap: Allow admin to grant/revoke "transwiki" group on zh(wikt|wb|wq|ws) (T313657) (duration: 07m 09s)
  • 20:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch2002.codfw.wmnet with OS bullseye
  • 20:32 brennen@deploy1002: Started scap: Allow admin to grant/revoke "transwiki" group on zh(wikt|wb|wq|ws) (T313657)
  • 20:29 brennen@deploy1002: Finished scap: Deploy partial action blocks to cswiki (T315525) (duration: 19m 16s)
  • 20:20 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-stretch2002.codfw.wmnet with OS bullseye
  • 20:09 brennen@deploy1002: Started scap: Deploy partial action blocks to cswiki (T315525)
  • 20:00 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 19:57 ottomata: renable puppet on an-master*
  • 19:47 ottomata: temporarily disable puppet on an-master100* while applying change in test cluster - T312858
  • 19:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 19:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 19:16 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1007.eqiad.wmnet with reason: host reimage
  • 19:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 19:00 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1007.eqiad.wmnet with OS bullseye
  • 18:58 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 18:57 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1006.eqiad.wmnet with OS bullseye
  • 18:55 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-stretch2001.codfw.wmnet with OS bullseye
  • 18:52 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:40 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-stretch2001.codfw.wmnet with reason: host reimage
  • 18:36 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-stretch2001.codfw.wmnet with reason: host reimage
  • 18:17 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch2001.codfw.wmnet with OS bullseye
  • 18:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:13 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.39.0-wmf.25 refs T314186
  • 18:08 dancy: Testing stashbot behavior #2. T315444, T314613
  • 18:07 dancy: Testing stashbot behavior #1 T315444
  • 17:56 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host kafka-stretch2001.codfw.wmnet with OS bullseye
  • 17:54 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:53 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:53 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:52 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:52 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:52 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:48 robh@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch2001.codfw.wmnet with OS bullseye
  • 17:46 dancy@deploy1002: backport aborted: (duration: 00m 21s)
  • 17:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch2001.codfw.wmnet with OS bullseye
  • 17:08 hashar@deploy1002: Finished deploy [integration/docroot@1aca57b]: doc: update links from /mw-tools-scap/ to /scap/ - T315541 (duration: 00m 09s)
  • 17:08 hashar@deploy1002: Started deploy [integration/docroot@1aca57b]: doc: update links from /mw-tools-scap/ to /scap/ - T315541
  • 16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
  • 16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 9 hosts with reason: Maintenance
  • 16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:47 demon@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.25 refs T314186 (duration: 03m 20s)
  • 16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 13 hosts with reason: Maintenance
  • 16:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 13 hosts with reason: Maintenance
  • 16:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 16:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 16:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T312972)', diff saved to https://phabricator.wikimedia.org/P32541 and previous config saved to /var/cache/conftool/dbconfig/20220818-164456-marostegui.json
  • 16:44 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.25 refs T314186
  • 16:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P32540 and previous config saved to /var/cache/conftool/dbconfig/20220818-162950-marostegui.json
  • 16:26 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: disk fault investigation
  • 16:26 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: disk fault investigation
  • 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-stretch2001.codfw.wmnet with OS bullseye
  • 16:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-stretch2001.codfw.wmnet with OS bullseye
  • 16:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P32539 and previous config saved to /var/cache/conftool/dbconfig/20220818-161444-marostegui.json
  • 15:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T312972)', diff saved to https://phabricator.wikimedia.org/P32538 and previous config saved to /var/cache/conftool/dbconfig/20220818-155938-marostegui.json
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T312972)', diff saved to https://phabricator.wikimedia.org/P32537 and previous config saved to /var/cache/conftool/dbconfig/20220818-155410-marostegui.json
  • 15:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T312972)', diff saved to https://phabricator.wikimedia.org/P32536 and previous config saved to /var/cache/conftool/dbconfig/20220818-155348-marostegui.json
  • 15:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32535 and previous config saved to /var/cache/conftool/dbconfig/20220818-153842-marostegui.json
  • 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32534 and previous config saved to /var/cache/conftool/dbconfig/20220818-152335-marostegui.json
  • 15:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T312972)', diff saved to https://phabricator.wikimedia.org/P32533 and previous config saved to /var/cache/conftool/dbconfig/20220818-150829-marostegui.json
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T312972)', diff saved to https://phabricator.wikimedia.org/P32532 and previous config saved to /var/cache/conftool/dbconfig/20220818-150621-marostegui.json
  • 15:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 15:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 15:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T312972)', diff saved to https://phabricator.wikimedia.org/P32531 and previous config saved to /var/cache/conftool/dbconfig/20220818-150601-marostegui.json
  • 15:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-stretch2001.codfw.wmnet with OS bullseye
  • 14:58 dancy@deploy1002: Finished deploy [integration/docroot@a43ff3b]: (no justification provided) (duration: 00m 38s)
  • 14:58 dancy@deploy1002: Started deploy [integration/docroot@a43ff3b]: (no justification provided)
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32530 and previous config saved to /var/cache/conftool/dbconfig/20220818-145055-marostegui.json
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32529 and previous config saved to /var/cache/conftool/dbconfig/20220818-143549-marostegui.json
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T312972)', diff saved to https://phabricator.wikimedia.org/P32528 and previous config saved to /var/cache/conftool/dbconfig/20220818-142043-marostegui.json
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T312972)', diff saved to https://phabricator.wikimedia.org/P32527 and previous config saved to /var/cache/conftool/dbconfig/20220818-141835-marostegui.json
  • 14:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 14:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T312972)', diff saved to https://phabricator.wikimedia.org/P32526 and previous config saved to /var/cache/conftool/dbconfig/20220818-141815-marostegui.json
  • 14:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:10 TheresNoTime: UTC afternoon backport window done
  • 14:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:09 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable new Vector skin on select pages (take 2) (T314286) (duration: 03m 07s)
  • 14:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P32525 and previous config saved to /var/cache/conftool/dbconfig/20220818-140309-marostegui.json
  • 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:01 TheresNoTime: extending deployment window slightly
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P32524 and previous config saved to /var/cache/conftool/dbconfig/20220818-134803-marostegui.json
  • 13:45 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert: Enable new Vector skin on select pages (T314286) (duration: 03m 35s)
  • 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:37 jbond: uploaded spicerack_3.2.0 to apt.wikimedia.org bullseye-wikimedia
  • 13:37 samtar@deploy1002: scap failed: average error rate on 5/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org for details)
  • 13:37 jbond: release spicerack 3.2.0
  • 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:33 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T312972)', diff saved to https://phabricator.wikimedia.org/P32523 and previous config saved to /var/cache/conftool/dbconfig/20220818-133257-marostegui.json
  • 13:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
  • 13:31 jayme@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=toolhub
  • 13:31 samtar@deploy1002: Synchronized wmf-config: Config: Remove unused config for Echo notification emails (T314604) (duration: 03m 25s)
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:28 awight@deploy1002: Finished deploy [kartotherian/deploy@672af45]: Update kartotherian to 285fc7d (duration: 03m 45s)
  • 13:26 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 13:25 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-codfw
  • 13:24 awight@deploy1002: Started deploy [kartotherian/deploy@672af45]: Update kartotherian to 285fc7d
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 samtar@deploy1002: Synchronized wmf-config: Config: Disable DiscussionTools pageframe everywhere except labs and mediawikiwiki (duration: 03m 26s)
  • 13:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:15 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
  • 13:13 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: InitialiseSettings-labs: Enable Phonos on beta enwiki (T314294) (duration: 03m 30s)
  • 13:01 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
  • 12:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:33 reedy@deploy1002: Synchronized wmf-config/: SFS config updates (duration: 03m 25s)
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T312972)', diff saved to https://phabricator.wikimedia.org/P32522 and previous config saved to /var/cache/conftool/dbconfig/20220818-123241-marostegui.json
  • 12:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 12:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T312972)', diff saved to https://phabricator.wikimedia.org/P32521 and previous config saved to /var/cache/conftool/dbconfig/20220818-123220-marostegui.json
  • 12:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:28 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Set wgSFSReportOnly in here (duration: 03m 27s)
  • 12:25 marostegui: Install 10.6.9 on pc1014
  • 12:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32520 and previous config saved to /var/cache/conftool/dbconfig/20220818-121714-marostegui.json
  • 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:04 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32519 and previous config saved to /var/cache/conftool/dbconfig/20220818-120208-marostegui.json
  • 11:55 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T312972)', diff saved to https://phabricator.wikimedia.org/P32518 and previous config saved to /var/cache/conftool/dbconfig/20220818-114702-marostegui.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T312972)', diff saved to https://phabricator.wikimedia.org/P32517 and previous config saved to /var/cache/conftool/dbconfig/20220818-114555-marostegui.json
  • 11:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 11:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 11:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T312972)', diff saved to https://phabricator.wikimedia.org/P32516 and previous config saved to /var/cache/conftool/dbconfig/20220818-114518-marostegui.json
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db1112', diff saved to https://phabricator.wikimedia.org/P32515 and previous config saved to /var/cache/conftool/dbconfig/20220818-113655-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'depool db1112', diff saved to https://phabricator.wikimedia.org/P32514 and previous config saved to /var/cache/conftool/dbconfig/20220818-113556-ladsgroup.json
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P32513 and previous config saved to /var/cache/conftool/dbconfig/20220818-113012-marostegui.json
  • 11:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P32511 and previous config saved to /var/cache/conftool/dbconfig/20220818-111506-marostegui.json
  • 11:00 jayme: kubernetes2015:~$ sudo systemctl reset-failed ifup@ens13.service - T273026
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T312972)', diff saved to https://phabricator.wikimedia.org/P32510 and previous config saved to /var/cache/conftool/dbconfig/20220818-110000-marostegui.json
  • 10:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T312972)', diff saved to https://phabricator.wikimedia.org/P32509 and previous config saved to /var/cache/conftool/dbconfig/20220818-105552-marostegui.json
  • 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 10:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T312972)', diff saved to https://phabricator.wikimedia.org/P32508 and previous config saved to /var/cache/conftool/dbconfig/20220818-105531-marostegui.json
  • 10:55 jayme: kubernetes2016:~$ sudo systemctl reset-failed ifup@ens13.service - T273026
  • 10:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db1166', diff saved to https://phabricator.wikimedia.org/P32506 and previous config saved to /var/cache/conftool/dbconfig/20220818-104731-ladsgroup.json
  • 10:45 reedy@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/StopForumSpam/includes/: T315447 (duration: 03m 36s)
  • 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P32505 and previous config saved to /var/cache/conftool/dbconfig/20220818-104552-ladsgroup.json
  • 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32504 and previous config saved to /var/cache/conftool/dbconfig/20220818-104025-marostegui.json
  • 10:37 jayme: kubernetes2006:~$ sudo systemctl reset-failed ifup@ens13.service - T273026
  • 10:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:27 ladsgroup@deploy1002: Synchronized wmf-config/etcd.php: Config: Drop now-unused wmfEtcdApplyDBConfig() (T298485) (duration: 03m 36s)
  • 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32503 and previous config saved to /var/cache/conftool/dbconfig/20220818-102519-marostegui.json
  • 10:22 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 10:22 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 10:22 jayme: kubernetes2005:~$ sudo systemctl status ifup@ens13.service - T273026
  • 10:20 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 10:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:19 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 10:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:16 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:16 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:16 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Call wmfApplyEtcdDBConfig() directly in CS.php (T298485) (duration: 03m 46s)
  • 10:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T312972)', diff saved to https://phabricator.wikimedia.org/P32501 and previous config saved to /var/cache/conftool/dbconfig/20220818-101013-marostegui.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T312972)', diff saved to https://phabricator.wikimedia.org/P32500 and previous config saved to /var/cache/conftool/dbconfig/20220818-100806-marostegui.json
  • 10:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 10:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T312972)', diff saved to https://phabricator.wikimedia.org/P32499 and previous config saved to /var/cache/conftool/dbconfig/20220818-100744-marostegui.json
  • 10:03 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
  • 10:00 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 10:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:56 ladsgroup@deploy1002: Synchronized wmf-config/etcd.php: Config: Allow passing arguments to wmfEtcdApplyDBConfig() (T298485) (duration: 03m 40s)
  • 09:53 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:53 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:53 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:52 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32498 and previous config saved to /var/cache/conftool/dbconfig/20220818-095238-marostegui.json
  • 09:47 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:44 jayme: dnsdisc depooling codfw for services running in kubernetes cluster (for 30-60min due to T310483, T260661)
  • 09:43 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2004.codfw.wmnet
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32497 and previous config saved to /var/cache/conftool/dbconfig/20220818-093732-marostegui.json
  • 09:34 _joe_: updating vopsbot to 0.3.0
  • 09:33 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2004.codfw.wmnet
  • 09:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2003.codfw.wmnet
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T312972)', diff saved to https://phabricator.wikimedia.org/P32496 and previous config saved to /var/cache/conftool/dbconfig/20220818-092226-marostegui.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T312972)', diff saved to https://phabricator.wikimedia.org/P32495 and previous config saved to /var/cache/conftool/dbconfig/20220818-092219-marostegui.json
  • 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 09:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T312972)', diff saved to https://phabricator.wikimedia.org/P32494 and previous config saved to /var/cache/conftool/dbconfig/20220818-092130-marostegui.json
  • 09:19 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2003.codfw.wmnet
  • 09:18 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2002.codfw.wmnet
  • 09:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:10 ladsgroup@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Simplify wmfEtcdApplyDBConfig() a bit (T298485), Part II (duration: 03m 11s)
  • 09:09 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2002.codfw.wmnet
  • 09:09 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32493 and previous config saved to /var/cache/conftool/dbconfig/20220818-090624-marostegui.json
  • 09:06 ladsgroup@deploy1002: Synchronized wmf-config/etcd.php: Config: Simplify wmfEtcdApplyDBConfig() a bit (T298485), Part I (duration: 03m 02s)
  • 09:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:59 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
  • 08:59 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32492 and previous config saved to /var/cache/conftool/dbconfig/20220818-085118-marostegui.json
  • 08:49 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
  • 08:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1002.eqiad.wmnet
  • 08:39 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1002.eqiad.wmnet
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T312972)', diff saved to https://phabricator.wikimedia.org/P32491 and previous config saved to /var/cache/conftool/dbconfig/20220818-083612-marostegui.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T312972)', diff saved to https://phabricator.wikimedia.org/P32490 and previous config saved to /var/cache/conftool/dbconfig/20220818-083505-marostegui.json
  • 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 08:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 08:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312972)', diff saved to https://phabricator.wikimedia.org/P32489 and previous config saved to /var/cache/conftool/dbconfig/20220818-083417-marostegui.json
  • 08:33 vgutierrez: upgrade to ATS 9.1.3 in cp5014 and cp5016 - T309651
  • 08:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:26 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to the old templatelinks fields in wikidata and new wikis (T312865) (duration: 03m 20s)
  • 08:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32488 and previous config saved to /var/cache/conftool/dbconfig/20220818-081911-marostegui.json
  • 08:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P32487 and previous config saved to /var/cache/conftool/dbconfig/20220818-081627-ladsgroup.json
  • 08:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:09 marostegui: dbmaint Promote pc1013 as pc3 master T315526
  • 08:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:07 marostegui@deploy1002: Synchronized wmf-config/ProductionServices.php: Promote pc1013 to pc3 master T315526 (duration: 03m 11s)
  • 08:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32486 and previous config saved to /var/cache/conftool/dbconfig/20220818-080405-marostegui.json
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P32485 and previous config saved to /var/cache/conftool/dbconfig/20220818-080122-ladsgroup.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312972)', diff saved to https://phabricator.wikimedia.org/P32484 and previous config saved to /var/cache/conftool/dbconfig/20220818-074859-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T312972)', diff saved to https://phabricator.wikimedia.org/P32483 and previous config saved to /var/cache/conftool/dbconfig/20220818-074652-marostegui.json
  • 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 07:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 07:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P32482 and previous config saved to /var/cache/conftool/dbconfig/20220818-074618-ladsgroup.json
  • 07:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:39 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/GrowthExperiments/modules/ext.growthExperiments.HelpPanel/SuggestedEditsGuidance.js: 520cd7b: Fix structured task restriction check (T315516) (duration: 03m 17s)
  • 07:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1104 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P32480 and previous config saved to /var/cache/conftool/dbconfig/20220818-073113-ladsgroup.json
  • 07:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 07:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 07:26 godog: roll-restart swift-proxy to apply bumbed memcached limits T314914
  • 07:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:24 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/includes/specials/SpecialRecentChangesLinked.php: Backport: Revert "Revert "SpecialRecentChangesLinked: Use rdbms code for building the main query"" (duration: 03m 31s)
  • 07:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 07:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 07:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 06:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 06:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T314041)', diff saved to https://phabricator.wikimedia.org/P32479 and previous config saved to /var/cache/conftool/dbconfig/20220818-064124-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P32478 and previous config saved to /var/cache/conftool/dbconfig/20220818-062618-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 06:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maint
  • 06:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maint
  • 06:11 Amir1: dbmaint@s8 eqiad (T314369 T312863 T309311 T60674 T303603 T310485)
  • 06:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P32477 and previous config saved to /var/cache/conftool/dbconfig/20220818-061112-ladsgroup.json
  • 06:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1104 T314369', diff saved to https://phabricator.wikimedia.org/P32476 and previous config saved to /var/cache/conftool/dbconfig/20220818-060707-ladsgroup.json
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1109 to s8 primary and set section read-write T314369', diff saved to https://phabricator.wikimedia.org/P32475 and previous config saved to /var/cache/conftool/dbconfig/20220818-060213-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T314369', diff saved to https://phabricator.wikimedia.org/P32474 and previous config saved to /var/cache/conftool/dbconfig/20220818-060137-ladsgroup.json
  • 06:01 Amir1: Starting s8 eqiad failover from db1104 to db1109 - T314369
  • 05:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T314041)', diff saved to https://phabricator.wikimedia.org/P32473 and previous config saved to /var/cache/conftool/dbconfig/20220818-055606-ladsgroup.json
  • 04:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 31 hosts with reason: Primary switchover s8 T314369
  • 04:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 31 hosts with reason: Primary switchover s8 T314369
  • 04:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1109 with weight 0 T314369', diff saved to https://phabricator.wikimedia.org/P32471 and previous config saved to /var/cache/conftool/dbconfig/20220818-045218-ladsgroup.json
  • 04:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 31 hosts with reason: Primary switchover s8 T314369
  • 04:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 31 hosts with reason: Primary switchover s8 T314369
  • 04:30 TimStarling: on mw1411, mw1413, mw1419, mw1429, mw1431, mw1433: set scaling_governor to performance, attempt 2, T315398
  • 02:15 TimStarling: on mw1411, mw1413, mw1419, mw1429, mw1431, mw1433: set scaling_governor to performance T315398
  • 00:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
  • 00:48 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
  • 00:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
  • 00:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
  • 00:41 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
  • 00:39 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
  • 00:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
  • 00:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2023.codfw.wmnet']
  • 00:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes2023.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2023.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:06 eileen___: civicrm upgraded from 97638e58 to edfe2f16
  • 00:05 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)

2022-08-17

  • 23:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
  • 23:57 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes2023
  • 23:57 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes2023
  • 23:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
  • 23:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
  • 23:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
  • 23:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
  • 23:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
  • 23:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
  • 23:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
  • 23:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
  • 23:23 mutante: phab2002 - chmod -R phd /srv/repos | find /srv/repos/ -gid 498 -exec chown phd:phd {} \; T313360
  • 23:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2002.codfw.wmnet']
  • 23:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-stretch2001.codfw.wmnet']
  • 23:03 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-stretch2001.codfw.wmnet']
  • 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:34 dancy@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.25 refs T314186 (duration: 03m 17s)
  • 22:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:31 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.25 refs T314186
  • 22:16 eileen___: civicrm upgraded from 4be0724d to 97638e58
  • 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:16 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/FlaggedRevs: Backport: Remove indexExists check for page_name_title index (duration: 03m 12s)
  • 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:13 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/FlaggedRevs/frontend/FlaggedRevsUIHooks.php: Backport: Do not attempt to create a FlaggableWikiPage when the title can't exist (T315479) (duration: 03m 26s)
  • 21:08 ejegg: updated civicrm from c228e3d7 to 4be0724d
  • 21:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2005.codfw.wmnet with OS bullseye
  • 21:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2005.codfw.wmnet with reason: host reimage
  • 20:50 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2005.codfw.wmnet with reason: host reimage
  • 20:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:36 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: InitialiseSettings: Add wmgUsePhonos (default => false) (T314294) (duration: 03m 29s)
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:30 samtar@deploy1002: Synchronized wmf-config/extension-list: Config: extension-list: Add Phonos (T314294) (duration: 03m 17s)
  • 20:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-logging2005.codfw.wmnet with OS bullseye
  • 20:27 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2004.codfw.wmnet with OS bullseye
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1ddc661: QuickSurveys: Disable extension on JA wiki (T311015) (duration: 03m 19s)
  • 20:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 2cf80d1: QuickSurveys: Remove research incentive survey from BN wiki (T314333) (duration: 03m 24s)
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage
  • 20:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2004.codfw.wmnet with reason: host reimage
  • 19:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS bullseye
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:11 demon@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.23 refs T314186 (duration: 03m 15s)
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:07 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.23 refs T314186
  • 19:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1073.eqiad.wmnet with OS bullseye
  • 19:01 demon@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.25 refs T314186 (duration: 03m 24s)
  • 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:58 demon@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.25 refs T314186
  • 18:58 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-logging2004.codfw.wmnet with OS bullseye
  • 18:55 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
  • 18:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1073.eqiad.wmnet with reason: host reimage
  • 18:40 urandom: disabling reserved space on codfw nodes (RESTBase), /dev/md2 (aka /srv/cassandra/instance-data) -- T314941
  • 18:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
  • 18:38 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1073.eqiad.wmnet with reason: host reimage
  • 18:36 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T314041)', diff saved to https://phabricator.wikimedia.org/P32469 and previous config saved to /var/cache/conftool/dbconfig/20220817-183223-ladsgroup.json
  • 18:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 18:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T314041)', diff saved to https://phabricator.wikimedia.org/P32468 and previous config saved to /var/cache/conftool/dbconfig/20220817-183202-ladsgroup.json
  • 18:25 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1073.eqiad.wmnet with OS bullseye
  • 18:22 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P32467 and previous config saved to /var/cache/conftool/dbconfig/20220817-181656-ladsgroup.json
  • 18:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1056.eqiad.wmnet with OS bullseye
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P32466 and previous config saved to /var/cache/conftool/dbconfig/20220817-180150-ladsgroup.json
  • 18:01 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS bullseye
  • 17:48 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging2004.codfw.wmnet with OS bullseye
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T314041)', diff saved to https://phabricator.wikimedia.org/P32465 and previous config saved to /var/cache/conftool/dbconfig/20220817-174644-ladsgroup.json
  • 17:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1056.eqiad.wmnet with reason: host reimage
  • 17:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2005
  • 17:41 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2005
  • 17:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1056.eqiad.wmnet with reason: host reimage
  • 17:33 ladsgroup@deploy1002: Synchronized portals: Migrate wikinews.org to the modern portals (duration: 03m 32s)
  • 17:31 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004
  • 17:30 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004
  • 17:29 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: Migrate wikinews.org to the modern portals (duration: 03m 29s)
  • 17:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1056.eqiad.wmnet with OS bullseye
  • 17:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-logging2004.codfw.wmnet with OS bullseye
  • 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host graphite2004.codfw.wmnet with OS bullseye
  • 16:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on graphite2004.codfw.wmnet with reason: host reimage
  • 16:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on graphite2004.codfw.wmnet with reason: host reimage
  • 16:54 sbassett@deploy1002: Synchronized wmf-config/CommonSettings.php: Enable StopForumSpam on candidate wikis (CS.php) - T273220 (duration: 03m 26s)
  • 16:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host graphite2004.codfw.wmnet with OS bullseye
  • 16:50 sbassett@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable StopForumSpam on candidate wikis (IS.php) - T273220 (duration: 03m 20s)
  • 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32463 and previous config saved to /var/cache/conftool/dbconfig/20220817-162655-root.json
  • 16:24 cwhite: restart logmsgbot T257861
  • 16:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 16:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1079.eqiad.wmnet with OS bullseye
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32462 and previous config saved to /var/cache/conftool/dbconfig/20220817-161151-root.json
  • 16:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P32461 and previous config saved to /var/cache/conftool/dbconfig/20220817-155653-root.json
  • 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32460 and previous config saved to /var/cache/conftool/dbconfig/20220817-155646-root.json
  • 15:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:54 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: jawiki: Restrict abusefilter log access (2) (T315199) (duration: 03m 47s)
  • 15:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1079.eqiad.wmnet with reason: host reimage
  • 15:52 jbond: push out update for linux-image-amd64 on bullseye
  • 15:51 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1079.eqiad.wmnet with reason: host reimage
  • 15:50 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:50 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: jawiki: Restrict abusefilter log access (1) (T315199) (duration: 03m 25s)
  • 15:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:43 TheresNoTime: finished deploying RESTBase is not enabled on closed wikis (T315383)
  • 15:42 jayme@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=1)
  • 15:42 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 15:42 samtar@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: RESTBase is not enabled on closed wikis (T315383) (duration: 03m 27s)
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P32458 and previous config saved to /var/cache/conftool/dbconfig/20220817-154148-root.json
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P32457 and previous config saved to /var/cache/conftool/dbconfig/20220817-154142-root.json
  • 15:41 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 15:41 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 15:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:38 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1079.eqiad.wmnet with OS bullseye
  • 15:37 jbond: install net-snmp updates
  • 15:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P32455 and previous config saved to /var/cache/conftool/dbconfig/20220817-152643-root.json
  • 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32454 and previous config saved to /var/cache/conftool/dbconfig/20220817-152637-root.json
  • 15:24 TheresNoTime: deploying RESTBase is not enabled on closed wikis (T315383) out of window
  • 15:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P32453 and previous config saved to /var/cache/conftool/dbconfig/20220817-151139-root.json
  • 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32452 and previous config saved to /var/cache/conftool/dbconfig/20220817-151132-root.json
  • 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P32450 and previous config saved to /var/cache/conftool/dbconfig/20220817-145634-root.json
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P32449 and previous config saved to /var/cache/conftool/dbconfig/20220817-145628-root.json
  • 14:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host graphite2004.codfw.wmnet with OS bullseye
  • 14:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32447 and previous config saved to /var/cache/conftool/dbconfig/20220817-144129-root.json
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P32446 and previous config saved to /var/cache/conftool/dbconfig/20220817-144123-root.json
  • 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on graphite2004.codfw.wmnet with reason: host reimage
  • 14:32 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on graphite2004.codfw.wmnet with reason: host reimage
  • 14:18 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-stretch2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:18 marostegui: Redact new wikis guwwiktionary pcmwiki bjnwiktionary T312214 T310879 T309056
  • 14:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-stretch2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host graphite2004.codfw.wmnet with OS bullseye
  • 14:01 taavi: UTC afternoon deploys done
  • 14:00 taavi@deploy1002: Finished scap: Backport for gerrit:823697 Add wgDiscussionToolsEnablePermalinksBackend config (duration: 19m 24s)
  • 13:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-stretch2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:51 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 13:43 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-stretch2002
  • 13:42 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-stretch2002
  • 13:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-stretch2001
  • 13:41 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-stretch2001
  • 13:41 taavi@deploy1002: Started scap: Backport for gerrit:823697 Add wgDiscussionToolsEnablePermalinksBackend config
  • 13:38 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Realtime Preview on Group 1 (T314182) (duration: 03m 26s)
  • 13:32 taavi@deploy1002: Synchronized php-1.39.0-wmf.25/extensions/DiscussionTools/includes/Hooks/DataUpdatesHooks.php: Backport: Add try…catch in failing deferred update (T315383) (duration: 03m 18s)
  • 13:27 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: lots of DiscussionTools and other changes (duration: 03m 11s)
  • 13:19 mforns@deploy1002: Finished deploy [airflow-dags/analytics@141f179]: (no justification provided) (duration: 00m 10s)
  • 13:19 mforns@deploy1002: Started deploy [airflow-dags/analytics@141f179]: (no justification provided)
  • 12:39 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (T310776, T312209, T309054) (duration: 03m 30s)
  • 12:30 urbanecm@deploy1002: Synchronized dblists-index.php: Creating bjnwiktionary (T312209) (duration: 03m 32s)
  • 12:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating bjnwiktionary (T312209) (duration: 03m 13s)
  • 12:23 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating bjnwiktionary (T312209) (duration: 03m 19s)
  • 12:20 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating bjnwiktionary (T312209) (duration: 03m 27s)
  • 12:17 jbond: remove prometheus-ipmi-exporter from stretch
  • 12:16 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating bjnwiktionary (T312209)
  • 12:15 jbond: copy prometheus-ipmi-exporter package from buster to stretch
  • 12:12 urbanecm@deploy1002: Synchronized dblists: Creating bjnwiktionary (T312209) (duration: 03m 33s)
  • 12:09 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating bjnwiktionary (T312209) (duration: 03m 29s)
  • 12:02 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating guwwiktionary (T309054) (duration: 03m 34s)
  • 12:01 jbond: copy prometheus-ipmi-exporter package from bullseye to buster
  • 11:58 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating guwwiktionary (T309054) (duration: 03m 43s)
  • 11:54 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating guwwiktionary (T309054) (duration: 03m 25s)
  • 11:51 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating guwwiktionary (T309054)
  • 11:47 urbanecm@deploy1002: Synchronized dblists: Creating guwwiktionary (T309054) (duration: 03m 11s)
  • 11:44 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating guwwiktionary (T309054) (duration: 03m 08s)
  • 11:38 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1001.eqiad.wmnet sretest1002.eqiad.wmnet on all recursors
  • 11:38 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache sretest1001.eqiad.wmnet sretest1002.eqiad.wmnet on all recursors
  • 11:38 urbanecm@deploy1002: Synchronized langlist: Creating pcmwiki (T310776) (duration: 03m 42s)
  • 11:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating pcmwiki (T310776) (duration: 03m 18s)
  • 11:31 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating pcmwiki (T310776) (duration: 03m 24s)
  • 11:27 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating pcmwiki (T310776) (duration: 03m 13s)
  • 11:24 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating pcmwiki (T310776)
  • 11:20 urbanecm@deploy1002: Synchronized dblists: Creating pcmwiki (T310776) (duration: 03m 13s)
  • 11:17 urbanecm@deploy1002: Synchronized wmf-config/db-production.php: Creating pcmwiki (T310776) (duration: 03m 22s)
  • 11:11 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:11 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32444 and previous config saved to /var/cache/conftool/dbconfig/20220817-092244-root.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32443 and previous config saved to /var/cache/conftool/dbconfig/20220817-092125-root.json
  • 09:10 hashar: Upgraded Gerrit from 3.4.4 to 3.4.5 # T315408
  • 09:09 hashar@deploy1002: Finished deploy [gerrit/gerrit@e11e6a7]: Gerrit to 3.4.5 on gerrit1001 # T315408 (duration: 00m 09s)
  • 09:09 hashar@deploy1002: Started deploy [gerrit/gerrit@e11e6a7]: Gerrit to 3.4.5 on gerrit1001 # T315408
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32442 and previous config saved to /var/cache/conftool/dbconfig/20220817-090739-root.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32441 and previous config saved to /var/cache/conftool/dbconfig/20220817-090620-root.json
  • 09:04 hashar@deploy1002: Finished deploy [gerrit/gerrit@e11e6a7]: Gerrit to 3.4.5 on gerrit 2002 # T315408 (duration: 00m 11s)
  • 09:03 hashar@deploy1002: Started deploy [gerrit/gerrit@e11e6a7]: Gerrit to 3.4.5 on gerrit 2002 # T315408
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32440 and previous config saved to /var/cache/conftool/dbconfig/20220817-085235-root.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32439 and previous config saved to /var/cache/conftool/dbconfig/20220817-085224-root.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32438 and previous config saved to /var/cache/conftool/dbconfig/20220817-085136-root.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32437 and previous config saved to /var/cache/conftool/dbconfig/20220817-085115-root.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32436 and previous config saved to /var/cache/conftool/dbconfig/20220817-083730-root.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32435 and previous config saved to /var/cache/conftool/dbconfig/20220817-083719-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32434 and previous config saved to /var/cache/conftool/dbconfig/20220817-083631-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 25%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32433 and previous config saved to /var/cache/conftool/dbconfig/20220817-083611-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32432 and previous config saved to /var/cache/conftool/dbconfig/20220817-082226-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32431 and previous config saved to /var/cache/conftool/dbconfig/20220817-082215-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32430 and previous config saved to /var/cache/conftool/dbconfig/20220817-082127-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32429 and previous config saved to /var/cache/conftool/dbconfig/20220817-082106-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32428 and previous config saved to /var/cache/conftool/dbconfig/20220817-080721-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32427 and previous config saved to /var/cache/conftool/dbconfig/20220817-080710-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 10%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32426 and previous config saved to /var/cache/conftool/dbconfig/20220817-080622-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32425 and previous config saved to /var/cache/conftool/dbconfig/20220817-080602-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 2%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32424 and previous config saved to /var/cache/conftool/dbconfig/20220817-075216-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32423 and previous config saved to /var/cache/conftool/dbconfig/20220817-075206-root.json
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 5%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32422 and previous config saved to /var/cache/conftool/dbconfig/20220817-075118-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 2%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32421 and previous config saved to /var/cache/conftool/dbconfig/20220817-075057-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32420 and previous config saved to /var/cache/conftool/dbconfig/20220817-073712-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32419 and previous config saved to /var/cache/conftool/dbconfig/20220817-073701-root.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32418 and previous config saved to /var/cache/conftool/dbconfig/20220817-073613-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1132 (re)pooling @ 1%: Repooling 10.6', diff saved to https://phabricator.wikimedia.org/P32417 and previous config saved to /var/cache/conftool/dbconfig/20220817-073553-root.json
  • 07:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T314041)', diff saved to https://phabricator.wikimedia.org/P32416 and previous config saved to /var/cache/conftool/dbconfig/20220817-073141-ladsgroup.json
  • 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 07:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32415 and previous config saved to /var/cache/conftool/dbconfig/20220817-073052-ladsgroup.json
  • 07:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P32414 and previous config saved to /var/cache/conftool/dbconfig/20220817-071546-ladsgroup.json
  • 07:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P32413 and previous config saved to /var/cache/conftool/dbconfig/20220817-070040-ladsgroup.json
  • 06:54 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
  • 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32412 and previous config saved to /var/cache/conftool/dbconfig/20220817-064534-ladsgroup.json
  • 06:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1028.eqiad.wmnet with OS bullseye
  • 06:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
  • 06:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1034.eqiad.wmnet with reason: host reimage
  • 06:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
  • 06:36 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1029.eqiad.wmnet with OS bullseye
  • 06:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1034.eqiad.wmnet with reason: host reimage
  • 06:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1032.eqiad.wmnet with OS bullseye
  • 06:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
  • 06:21 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
  • 06:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
  • 06:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1029.eqiad.wmnet with reason: host reimage
  • 06:20 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
  • 06:20 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudcephosd1028.eqiad.wmnet with reason: host reimage
  • 06:17 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
  • 06:15 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1032.eqiad.wmnet with reason: host reimage
  • 06:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
  • 06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1028.eqiad.wmnet with reason: host reimage
  • 06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1033.eqiad.wmnet with reason: host reimage
  • 06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1032.eqiad.wmnet with reason: host reimage
  • 06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1029.eqiad.wmnet with reason: host reimage
  • 06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
  • 06:10 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1027.eqiad.wmnet with reason: host reimage
  • 06:00 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
  • 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
  • 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS bullseye
  • 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
  • 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS bullseye
  • 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS bullseye
  • 05:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
  • 05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
  • 05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1029.eqiad.wmnet with OS bullseye
  • 05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1028.eqiad.wmnet with OS bullseye
  • 05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
  • 05:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1032.eqiad.wmnet with OS bullseye
  • 05:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
  • 05:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
  • 05:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1026.eqiad.wmnet with OS bullseye
  • 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
  • 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
  • 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS bullseye
  • 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
  • 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS bullseye
  • 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS bullseye
  • 05:31 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
  • 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
  • 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1028.eqiad.wmnet with OS bullseye
  • 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
  • 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1029.eqiad.wmnet with OS bullseye
  • 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1032.eqiad.wmnet with OS bullseye
  • 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
  • 05:26 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
  • 05:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
  • 05:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
  • 05:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1030.eqiad.wmnet with OS bullseye
  • 05:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
  • 05:13 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
  • 05:03 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS bullseye
  • 05:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1032.eqiad.wmnet with OS bullseye
  • 04:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 04:59 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1026.eqiad.wmnet with OS bullseye
  • 04:58 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
  • 04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
  • 04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
  • 04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
  • 04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS bullseye
  • 04:57 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS bullseye
  • 04:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
  • 04:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
  • 04:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1034.eqiad.wmnet with OS bullseye
  • 04:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1033.eqiad.wmnet with OS bullseye
  • 04:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
  • 04:48 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1029.eqiad.wmnet with OS bullseye
  • 04:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1028.eqiad.wmnet with OS bullseye
  • 04:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1027.eqiad.wmnet with OS bullseye
  • 04:42 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1030.eqiad.wmnet with OS bullseye
  • 04:31 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1030.eqiad.wmnet with OS bullseye
  • 04:25 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
  • 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1034.eqiad.wmnet with OS bullseye
  • 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1033.eqiad.wmnet with OS bullseye
  • 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1032.eqiad.wmnet with OS bullseye
  • 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
  • 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1029.eqiad.wmnet with OS bullseye
  • 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1028.eqiad.wmnet with OS bullseye
  • 04:23 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1027.eqiad.wmnet with OS bullseye
  • 04:23 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1030.eqiad.wmnet with reason: host reimage
  • 04:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1030.eqiad.wmnet with OS bullseye
  • 04:08 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 02:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[1051-1052].eqiad.wmnet
  • 02:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:45 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
  • 02:32 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic[1051-1052].eqiad.wmnet
  • 02:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts elastic[1051-1052].eqiad.wmnet
  • 02:16 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic[1051-1052].eqiad.wmnet
  • 02:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[1049-1050].eqiad.wmnet
  • 02:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:59 sbassett: Re-deployed security fix for T309894 to wmf.25
  • 01:54 sbassett: Re-deployed security fix for T309894 to wmf.23
  • 01:49 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
  • 01:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging2005']
  • 01:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2005']
  • 01:12 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic[1049-1050].eqiad.wmnet
  • 01:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging2005.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kafka-logging2004']
  • 00:26 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2004']
  • 00:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kafka-logging2004']
  • 00:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kafka-logging2004']
  • 00:03 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-logging2005.mgmt.codfw.wmnet with reboot policy FORCED
  • 00:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging2005.mgmt.codfw.wmnet with reboot policy FORCED

2022-08-16

  • 23:56 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-logging2005.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-logging2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:44 mutante: phab1001 - repeated rsync of /srv/repos to phab2002, then chown -R phd /srv/repos/ (without setting the group) - this way UID is fixed and privs match exactly phab1001 - T313360
  • 23:37 mutante: phab2002 - chown -R phd:www-data /srv/repos/ (because of UID mismatch) T313360
  • 23:32 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kafka-logging2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['graphite2004']
  • 23:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['graphite2004']
  • 23:28 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2005
  • 23:27 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2005
  • 23:27 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2004
  • 23:26 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2004
  • 23:24 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['graphite2004']
  • 23:22 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['graphite2004']
  • 23:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['graphite2004']
  • 23:20 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['graphite2004']
  • 23:20 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1026.eqiad.wmnet with OS bullseye
  • 23:04 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
  • 23:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
  • 23:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host graphite2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1026.eqiad.wmnet with reason: host reimage
  • 22:59 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1025.eqiad.wmnet with reason: host reimage
  • 22:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1026.eqiad.wmnet with OS bullseye
  • 22:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 22:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 22:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 22:30 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 22:29 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 22:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Large deletions affecting this replica
  • 22:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Large deletions affecting this replica
  • 22:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
  • 22:07 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1025.eqiad.wmnet with reason: host reimage
  • 21:56 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 21:56 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 21:54 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.25 refs T314186
  • 21:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:53 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 21:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic1048.eqiad.wmnet
  • 21:53 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:47 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 21:45 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1075.eqiad.wmnet with OS bullseye
  • 21:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 21:45 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 21:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 21:44 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 21:42 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic1048.eqiad.wmnet
  • 21:41 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts elastic1048.eqiad.wmnet
  • 21:31 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts elastic1048.eqiad.wmnet
  • 21:29 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host graphite2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1075.eqiad.wmnet with reason: host reimage
  • 21:27 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host graphite2004
  • 21:26 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host graphite2004
  • 21:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1075.eqiad.wmnet with reason: host reimage
  • 21:25 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 21:13 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1075.eqiad.wmnet with OS bullseye
  • 21:11 cstone: civicrm upgraded from 92467234 to c228e3d7
  • 21:05 otto@deploy1002: Finished deploy [airflow-dags/platform_eng@33afb85]: initial scap deploy to an-airflow1004, take 3 - T312858 (duration: 00m 18s)
  • 21:05 otto@deploy1002: Started deploy [airflow-dags/platform_eng@33afb85]: initial scap deploy to an-airflow1004, take 3 - T312858
  • 21:01 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.25 refs T314186 (duration: 08m 02s)
  • 20:54 otto@deploy1002: Finished deploy [airflow-dags/platform_eng@da511ee]: initial scap deploy to an-airflow1004, take 2 - T312858 (duration: 01m 05s)
  • 20:53 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.25 refs T314186
  • 20:53 otto@deploy1002: Started deploy [airflow-dags/platform_eng@da511ee]: initial scap deploy to an-airflow1004, take 2 - T312858
  • 20:47 cjming: end of UTC late backport window
  • 20:45 cjming@deploy1002: Finished scap: Backport for gerrit:823268 Update sticky header config for idwiki, viwiki A/B experiment (duration: 06m 44s)
  • 20:42 otto@deploy1002: Finished deploy [airflow-dags/platform_eng@eba3ff8]: initial scap deploy to an-airflow1004 - T312858 (duration: 02m 30s)
  • 20:39 otto@deploy1002: Started deploy [airflow-dags/platform_eng@eba3ff8]: initial scap deploy to an-airflow1004 - T312858
  • 20:39 cjming@deploy1002: Started scap: Backport for gerrit:823268 Update sticky header config for idwiki, viwiki A/B experiment
  • 20:35 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1055.eqiad.wmnet with OS bullseye
  • 20:28 cjming@deploy1002: Finished scap: Backport for gerrit:823658 mediawikiwiki: set $wgCdnMatchParameterOrder to false (duration: 08m 54s)
  • 20:26 mutante: mw1406 - sudo systemctl start php7.2-fpm_check_restart
  • 20:19 cjming@deploy1002: Started scap: Backport for gerrit:823658 mediawikiwiki: set $wgCdnMatchParameterOrder to false
  • 20:18 ori: removed /var/lock/scap.operations_mediawiki-config.lock on deploy1002
  • 20:16 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1055.eqiad.wmnet with reason: host reimage
  • 20:14 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 20:14 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1055.eqiad.wmnet with reason: host reimage
  • 20:13 cjming@deploy1002: scap failed: LockFailedError Failed to acquire lock "/var/lock/scap.operations_mediawiki-config.lock"; owner is "demon"; reason is "all wikis to 1.39.0-wmf.23 refs T314186" (duration: 00m 00s)
  • 19:58 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1055.eqiad.wmnet with OS bullseye
  • 19:53 dancy@deploy1002: backport aborted: (duration: 00m 36s)
  • 19:53 demon@deploy1002: stage-train aborted: (duration: 07m 00s)
  • 19:53 demon@deploy1002: deploy-promote aborted: (duration: 05m 35s)
  • 19:53 demon@deploy1002: sync-world aborted: testwikis wikis to 1.39.0-wmf.25 refs T314186 (duration: 03m 39s)
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32408 and previous config saved to /var/cache/conftool/dbconfig/20220816-195115-ladsgroup.json
  • 19:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 19:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 19:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 19:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32407 and previous config saved to /var/cache/conftool/dbconfig/20220816-195043-ladsgroup.json
  • 19:49 demon@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.25 refs T314186
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P32406 and previous config saved to /var/cache/conftool/dbconfig/20220816-193537-ladsgroup.json
  • 19:25 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 19:21 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 19:20 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1025.eqiad.wmnet with OS bullseye
  • 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P32405 and previous config saved to /var/cache/conftool/dbconfig/20220816-192031-ladsgroup.json
  • 19:19 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 19:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 19:13 otto@deploy1002: Finished deploy [analytics/refinery@6e47e0e]: Full deploy after last week's interrupted deployment. This syncs the latest refinery to all targets. an-launcher1002 already has these files. (duration: 24m 46s)
  • 19:07 demon@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.39.0-wmf.23 refs T314186
  • 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32404 and previous config saved to /var/cache/conftool/dbconfig/20220816-190525-ladsgroup.json
  • 19:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 19:04 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 19:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1076.eqiad.wmnet with OS bullseye
  • 18:58 demon@deploy1002: Pruned MediaWiki: 1.39.0-wmf.22 (duration: 02m 02s)
  • 18:56 demon@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.24 refs T314186 (duration: 35m 39s)
  • 18:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 18:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 18:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 18:48 otto@deploy1002: Started deploy [analytics/refinery@6e47e0e]: Full deploy after last week's interrupted deployment. This syncs the latest refinery to all targets. an-launcher1002 already has these files.
  • 18:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1076.eqiad.wmnet with reason: host reimage
  • 18:40 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 18:40 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 18:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1076.eqiad.wmnet with reason: host reimage
  • 18:37 jynus: restore x2 codfw replication T315271
  • 18:26 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1076.eqiad.wmnet with OS bullseye
  • 18:20 demon@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.24 refs T314186
  • 18:08 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 18:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 18:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1078.eqiad.wmnet with OS bullseye
  • 17:51 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 17:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1078.eqiad.wmnet with reason: host reimage
  • 17:40 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1078.eqiad.wmnet with reason: host reimage
  • 17:40 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1025.eqiad.wmnet with OS bullseye
  • 17:27 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1078.eqiad.wmnet with OS bullseye
  • 17:00 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage - ryankemper@cumin1001 - T289135
  • 16:57 ryankemper: [WDQS] `ryankemper@wdqs1007:~$ sudo systemctl restart wdqs-blazegraph`
  • 16:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1049.eqiad.wmnet with OS bullseye
  • 16:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1049.eqiad.wmnet with reason: host reimage
  • 16:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1049.eqiad.wmnet with reason: host reimage
  • 16:17 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1049.eqiad.wmnet with OS bullseye
  • 16:13 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:02 btullis@deploy1002: Finished deploy [airflow-dags/analytics@3c998da]: (no justification provided) (duration: 00m 12s)
  • 16:02 btullis@deploy1002: Started deploy [airflow-dags/analytics@3c998da]: (no justification provided)
  • 15:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2032.codfw.wmnet
  • 15:48 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2032.codfw.wmnet
  • 15:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1074.eqiad.wmnet with OS bullseye
  • 15:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
  • 15:29 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
  • 15:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1074.eqiad.wmnet with reason: host reimage
  • 15:23 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1074.eqiad.wmnet with reason: host reimage
  • 15:12 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route-jayme (exit_code=0)
  • 15:10 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1074.eqiad.wmnet with OS bullseye
  • 15:07 jayme@cumin1001: START - Cookbook sre.discovery.service-route-jayme
  • 15:07 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route-jayme (exit_code=0)
  • 15:07 jayme@cumin1001: START - Cookbook sre.discovery.service-route-jayme
  • 14:31 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route-jayme (exit_code=0)
  • 14:30 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1077.eqiad.wmnet with OS bullseye
  • 14:26 jayme@cumin1001: START - Cookbook sre.discovery.service-route-jayme
  • 14:13 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1077.eqiad.wmnet with reason: host reimage
  • 14:10 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1077.eqiad.wmnet with reason: host reimage
  • 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:57 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1077.eqiad.wmnet with OS bullseye
  • 13:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1057.eqiad.wmnet with OS bullseye
  • 13:55 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: revert: Config: jawiki: Restrict abusefilter log view to "abusefilter-modify" user (T315199) (duration: 03m 12s)
  • 13:41 taavi: UTC afternoon deploys done
  • 13:40 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: jawiki: Restrict abusefilter log view to "abusefilter-modify" user (T315199) (duration: 03m 21s)
  • 13:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:38 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 13:38 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 13:38 jayme@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=1)
  • 13:38 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:36 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1057.eqiad.wmnet with reason: host reimage
  • 13:33 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1057.eqiad.wmnet with reason: host reimage
  • 13:24 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
  • 13:24 jayme@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 13:24 jayme@cumin1001: START - Cookbook sre.discovery.service-route
  • 13:24 taavi@deploy1002: Synchronized wmf-config: Config: kowiki: Change logo for 600k articles (T315127) (duration: 03m 11s)
  • 13:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:20 taavi@deploy1002: Synchronized static/images: Config: kowiki: Add logo (legacy vector and vector-2022) for 600k articles (T315127) (duration: 03m 29s)
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:17 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1057.eqiad.wmnet with OS bullseye
  • 13:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 13:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 13:04 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 11:24 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 11:24 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 11:08 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
  • 11:03 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 11:02 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 10:53 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 10:50 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 10:49 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-eqiad
  • 10:49 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 10:40 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 10:38 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 10:34 jelto@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 10:30 jelto: reimaging gitlab2003 (insetup) to test partman recipe from gerrit:823115 - T274463
  • 09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 20:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 09:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 08:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 08:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 08:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 08:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32402 and previous config saved to /var/cache/conftool/dbconfig/20220816-074259-ladsgroup.json
  • 07:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32401 and previous config saved to /var/cache/conftool/dbconfig/20220816-074239-ladsgroup.json
  • 07:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P32400 and previous config saved to /var/cache/conftool/dbconfig/20220816-072733-ladsgroup.json
  • 07:26 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2067.codfw.wmnet
  • 07:26 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2067.codfw.wmnet
  • 07:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 07:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 07:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 07:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1169.eqiad.wmnet with reason: Maint
  • 07:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1169.eqiad.wmnet with reason: Maint
  • 07:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P32399 and previous config saved to /var/cache/conftool/dbconfig/20220816-071227-ladsgroup.json
  • 06:58 hashar@deploy1002: Finished deploy [integration/docroot@c142ba7]: Drop archived wikibase-vuejs-components storybook - T309872 (duration: 00m 10s)
  • 06:58 hashar@deploy1002: Started deploy [integration/docroot@c142ba7]: Drop archived wikibase-vuejs-components storybook - T309872
  • 06:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32398 and previous config saved to /var/cache/conftool/dbconfig/20220816-065721-ladsgroup.json
  • 06:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maint
  • 06:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maint
  • 06:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P32397 and previous config saved to /var/cache/conftool/dbconfig/20220816-062955-ladsgroup.json
  • 06:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maint work on old s1 master (T312984 T312863 T310011 T309311 T60674 T298560 T298555 T310485 T301312)
  • 06:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maint work on old s1 master (T312984 T312863 T310011 T309311 T60674 T298560 T298555 T310485 T301312)
  • 06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1163 T314380', diff saved to https://phabricator.wikimedia.org/P32396 and previous config saved to /var/cache/conftool/dbconfig/20220816-061413-ladsgroup.json
  • 06:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1118 to s1 primary and set section read-write T314380', diff saved to https://phabricator.wikimedia.org/P32395 and previous config saved to /var/cache/conftool/dbconfig/20220816-060530-ladsgroup.json
  • 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T314380', diff saved to https://phabricator.wikimedia.org/P32394 and previous config saved to /var/cache/conftool/dbconfig/20220816-060455-ladsgroup.json
  • 06:04 Amir1: Starting s1 eqiad failover from db1163 to db1118 - T314380
  • 05:43 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=(appservers|api)-ro
  • 05:43 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=(appservers|api)-ro
  • 05:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1118 with weight 0 T314380', diff saved to https://phabricator.wikimedia.org/P32393 and previous config saved to /var/cache/conftool/dbconfig/20220816-053534-ladsgroup.json
  • 05:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s1 T314380
  • 05:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s1 T314380
  • 05:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db[2142-2143].codfw.wmnet with reason: After-canary
  • 05:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db[2142-2143].codfw.wmnet with reason: After-canary
  • 04:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 04:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 04:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:45 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:45 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 04:44 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 04:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1059.eqiad.wmnet with OS bullseye
  • 04:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1059.eqiad.wmnet with reason: host reimage
  • 04:11 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1059.eqiad.wmnet with reason: host reimage
  • 03:57 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1059.eqiad.wmnet with OS bullseye
  • 03:56 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage - ryankemper@cumin1001 - T289135
  • 03:55 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage - ryankemper@cumin1001 - T289135
  • 03:53 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage - ryankemper@cumin1001 - T289135
  • 01:39 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddumps1001.wikimedia.org with OS bullseye
  • 00:18 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: replaceableSettings g 820247 (duration: 03m 18s)
  • 00:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:13 tstarling@deploy1002: Synchronized tests: config tests, for consistency g 820247 (duration: 03m 22s)
  • 00:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-08-15

  • 23:20 mutante: phab2002 - manually removing service IP addresses for git-ssh.codfw.wikimedia.org which were added by puppet even after gerrit:823220 (!) T280597
  • 22:59 mutante: search-loader1001 - killed puppet process that had been running since May
  • 22:52 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
  • 22:49 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddumps1001.wikimedia.org with reason: host reimage
  • 22:36 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddumps1001.wikimedia.org with OS bullseye
  • 22:33 mutante: rsyncing /srv/repos and /srv/dumps from phab1001 to phab2002 before applying prod puppet role (T313360)
  • 22:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1083.eqiad.wmnet with OS bullseye
  • 21:54 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Revert "Enable sticky header edit A/B test for idwiki + viwiki"" (duration: 03m 37s)
  • 21:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:45 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1083.eqiad.wmnet with reason: host reimage
  • 21:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:42 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1083.eqiad.wmnet with reason: host reimage
  • 21:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:42 cjming@deploy1002: Synchronized php-1.39.0-wmf.23/skins/Vector/resources/skins.vector.es6: Backport: Sticky header AB test bucketing for 2 treatment buckets (T312573) (duration: 03m 05s)
  • 21:34 ejegg: payments-wiki upgraded from 41709763 to f9f91f1f
  • afk: payments-wiki rolled back to 41709763
  • 21:29 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1083.eqiad.wmnet with OS bullseye
  • 21:22 ejegg: payments-wiki upgraded from 41709763 to f9f91f1f
  • 21:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1080.eqiad.wmnet with OS bullseye
  • 20:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:55 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Revert "Enable sticky header edit A/B test for idwiki + viwiki" (duration: 03m 15s)
  • 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:50 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1080.eqiad.wmnet with reason: host reimage
  • 20:48 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1080.eqiad.wmnet with reason: host reimage
  • 20:35 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1080.eqiad.wmnet with OS bullseye
  • 20:33 cjming: end of UTC late backport window
  • 20:31 cjming@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/GrowthExperiments: Backport: WelcomeSurvey/VariantHooks: Change hook used for redirection (T313064) (duration: 04m 37s)
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:12 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable sticky header edit A/B test for idwiki + viwiki (T312295) (duration: 03m 30s)
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T314041)', diff saved to https://phabricator.wikimedia.org/P32391 and previous config saved to /var/cache/conftool/dbconfig/20220815-193541-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 19:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T314041)', diff saved to https://phabricator.wikimedia.org/P32390 and previous config saved to /var/cache/conftool/dbconfig/20220815-193520-ladsgroup.json
  • 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P32389 and previous config saved to /var/cache/conftool/dbconfig/20220815-192014-ladsgroup.json
  • 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P32388 and previous config saved to /var/cache/conftool/dbconfig/20220815-190508-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T314041)', diff saved to https://phabricator.wikimedia.org/P32387 and previous config saved to /var/cache/conftool/dbconfig/20220815-185002-ladsgroup.json
  • 18:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1081.eqiad.wmnet with OS bullseye
  • 18:40 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@230a820]: include additional deubgging information in HivePartitionRangeSensor logs (duration: 02m 08s)
  • 18:38 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@230a820]: include additional deubgging information in HivePartitionRangeSensor logs
  • 18:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1081.eqiad.wmnet with reason: host reimage
  • 18:31 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ms-be2067.codfw.wmnet
  • 18:29 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1081.eqiad.wmnet with reason: host reimage
  • 18:24 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ms-be2067.codfw.wmnet
  • 18:16 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1081.eqiad.wmnet with OS bullseye
  • 18:07 herron: thanos compact process was hung, forced thanos-compact restart on thanos-fe2001
  • 17:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1052.eqiad.wmnet with OS bullseye
  • 17:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1052.eqiad.wmnet with reason: host reimage
  • 17:29 pt1979@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 17:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2067.codfw.wmnet
  • 17:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1052.eqiad.wmnet with reason: host reimage
  • 17:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ms-be2067.codfw.wmnet
  • 17:28 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-be2067.codfw.wmnet
  • 17:24 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@d4137b5]: increase subgraph query SLA and remove same from drop_old_data (duration: 02m 17s)
  • 17:22 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@d4137b5]: increase subgraph query SLA and remove same from drop_old_data
  • 17:17 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1052.eqiad.wmnet with OS bullseye
  • 17:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1082.eqiad.wmnet with OS bullseye
  • 16:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1082.eqiad.wmnet with reason: host reimage
  • 16:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1082.eqiad.wmnet with reason: host reimage
  • 16:32 damilare: payments-wiki upgraded from 0894d75a to 41709763
  • 16:27 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 16:25 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 16:23 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1082.eqiad.wmnet with OS bullseye
  • 16:17 dancy@deploy1002: Installation of scap version "4.13.0" completed for 553 hosts
  • 16:17 dancy@deploy1002: Installing scap version "4.13.0" for 553 hosts
  • 16:14 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:56 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts logstash2003.codfw.wmnet
  • 15:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts logstash2003.codfw.wmnet
  • 15:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: disk fault investigation
  • 15:32 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2067.codfw.wmnet with reason: disk fault investigation
  • 15:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2032.codfw.wmnet
  • 15:31 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2032.codfw.wmnet
  • 15:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
  • 15:31 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
  • 15:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2032.codfw.wmnet
  • 15:31 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2032.codfw.wmnet
  • 15:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1068.eqiad.wmnet with OS bullseye
  • 14:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1068.eqiad.wmnet with reason: host reimage
  • 14:36 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1068.eqiad.wmnet with reason: host reimage
  • 14:26 hnowlan@deploy1002: Finished deploy [restbase/deploy@a571f9a]: Add blwiki T310874 (duration: 15m 42s)
  • 14:23 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1068.eqiad.wmnet with OS bullseye
  • 14:10 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
  • 14:10 hnowlan@deploy1002: Started deploy [restbase/deploy@a571f9a]: Add blwiki T310874
  • 14:10 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2032.codfw.wmnet with reason: RAID battery failure
  • 14:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1070.eqiad.wmnet with OS bullseye
  • 13:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1070.eqiad.wmnet with reason: host reimage
  • 13:46 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1070.eqiad.wmnet with reason: host reimage
  • 13:34 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1070.eqiad.wmnet with OS bullseye
  • 13:29 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: de81bcb: testwikidatawiki: Add wikidata as import source (T315211) (duration: 03m 26s)
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e277223: Revert "Revert "Remove WikibaseTermboxInteraction $wgEventLoggingSchemas entry"" (T290303) (duration: 03m 29s)
  • 13:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:03 Emperor: pd 1I:1:1 modify disablepd forced on ms-be2028 T315213
  • 07:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:17 urbanecm: UTC morning B&C window done
  • 07:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a454d3b: Pin wgCheckUserLogReasonMigrationStage to read and write old (T233004) (duration: 03m 16s)
  • 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 43cd5ef: Add bnwiki in wgImportSources to bnwikibooks (T314820) (duration: 03m 05s)
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T314041)', diff saved to https://phabricator.wikimedia.org/P32386 and previous config saved to /var/cache/conftool/dbconfig/20220815-070955-ladsgroup.json
  • 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:08 urbanecm: mwscript resetAuthenticationThrottle.php --wiki=cswiki --signup --ip='194.31.191.20' # T315141
  • 07:06 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: 7c2a393ee: dc0d62a3: 6f687bcfc: Update throttle rules (T315182, T315141) (duration: 03m 21s)
  • 02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32385 and previous config saved to /var/cache/conftool/dbconfig/20220815-023538-ladsgroup.json
  • 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32384 and previous config saved to /var/cache/conftool/dbconfig/20220815-022032-ladsgroup.json
  • 02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32383 and previous config saved to /var/cache/conftool/dbconfig/20220815-020526-ladsgroup.json
  • 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32382 and previous config saved to /var/cache/conftool/dbconfig/20220815-015020-ladsgroup.json

2022-08-14

  • 08:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32380 and previous config saved to /var/cache/conftool/dbconfig/20220814-085443-ladsgroup.json
  • 08:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 08:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance

2022-08-13

  • 13:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 13:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32379 and previous config saved to /var/cache/conftool/dbconfig/20220813-133713-ladsgroup.json
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32378 and previous config saved to /var/cache/conftool/dbconfig/20220813-132207-ladsgroup.json
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P32377 and previous config saved to /var/cache/conftool/dbconfig/20220813-130701-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32376 and previous config saved to /var/cache/conftool/dbconfig/20220813-125156-ladsgroup.json

2022-08-12

  • 23:41 mutante: wikistats-bullseye:~$ /usr/lib/wikistats/update.php wp prefix blk ; /usr/lib/wikistats/update.php wp prefix kcg T315121
  • 23:38 mutante: [mwmaint1002:~] $ sudo systemctl start mediawiki_job_initsitestats.timer T315121
  • 22:14 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 21:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1071.eqiad.wmnet with OS bullseye
  • 21:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host clouddb2002-dev.codfw.wmnet with OS bullseye
  • 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1071.eqiad.wmnet with reason: host reimage
  • 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1071.eqiad.wmnet with reason: host reimage
  • 21:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1071.eqiad.wmnet with OS bullseye
  • 21:10 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
  • 21:06 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb2002-dev.codfw.wmnet with reason: host reimage
  • 21:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1053.eqiad.wmnet with OS bullseye
  • 20:50 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host clouddb2002-dev.codfw.wmnet with OS bullseye
  • 20:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1053.eqiad.wmnet with reason: host reimage
  • 20:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1053.eqiad.wmnet with reason: host reimage
  • 20:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1053.eqiad.wmnet with OS bullseye
  • 20:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1048.eqiad.wmnet with OS bullseye
  • 19:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1048.eqiad.wmnet with reason: host reimage
  • 19:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1048.eqiad.wmnet with reason: host reimage
  • 19:42 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1048.eqiad.wmnet with OS bullseye
  • 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32375 and previous config saved to /var/cache/conftool/dbconfig/20220812-193822-ladsgroup.json
  • 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T312863)', diff saved to https://phabricator.wikimedia.org/P32374 and previous config saved to /var/cache/conftool/dbconfig/20220812-193801-ladsgroup.json
  • 19:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1054.eqiad.wmnet with OS bullseye
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32373 and previous config saved to /var/cache/conftool/dbconfig/20220812-192255-ladsgroup.json
  • 19:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1054.eqiad.wmnet with reason: host reimage
  • 19:09 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1054.eqiad.wmnet with reason: host reimage
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P32372 and previous config saved to /var/cache/conftool/dbconfig/20220812-190749-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maint
  • 18:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet with reason: Maint
  • 18:54 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1054.eqiad.wmnet with OS bullseye
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T312863)', diff saved to https://phabricator.wikimedia.org/P32371 and previous config saved to /var/cache/conftool/dbconfig/20220812-185243-ladsgroup.json
  • 18:48 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1066.eqiad.wmnet with OS bullseye
  • 18:25 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1066.eqiad.wmnet with reason: host reimage
  • 18:22 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1066.eqiad.wmnet with reason: host reimage
  • 18:08 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1066.eqiad.wmnet with OS bullseye
  • 18:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1064.eqiad.wmnet with OS bullseye
  • 17:42 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1064.eqiad.wmnet with reason: host reimage
  • 17:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1064.eqiad.wmnet with reason: host reimage
  • 17:24 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1064.eqiad.wmnet with OS bullseye
  • 17:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts netmon2002.wikimedia.org
  • 17:21 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts netmon2002.wikimedia.org
  • 17:19 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon2002.wikimedia.org with OS bullseye
  • 17:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
  • 17:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
  • 16:42 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bullseye
  • 16:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1067.eqiad.wmnet with OS bullseye
  • 16:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2003-dev.wikimedia.org
  • 16:21 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:16 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 16:11 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2003-dev.wikimedia.org
  • 16:08 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['netmon2002.wikimedia.org']
  • 16:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1067.eqiad.wmnet with reason: host reimage
  • 15:58 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1067.eqiad.wmnet with reason: host reimage
  • 15:43 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1067.eqiad.wmnet with OS bullseye
  • 15:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['netmon2002.wikimedia.org']
  • 15:31 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['netmon2002.wikimedia.org']
  • 15:31 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['netmon2002.wikimedia.org']
  • 15:07 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts netmon1002.wikimedia.org
  • 15:07 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts netmon1002.wikimedia.org
  • 15:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1061.eqiad.wmnet with OS bullseye
  • 14:46 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1061.eqiad.wmnet with reason: host reimage
  • 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=varnish-fe
  • 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-be
  • 14:46 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2042.codfw.wmnet,service=ats-tls
  • 14:43 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1061.eqiad.wmnet with reason: host reimage
  • 14:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maint
  • 14:28 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1061.eqiad.wmnet with OS bullseye
  • 14:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maint
  • 14:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1063.eqiad.wmnet with OS bullseye
  • 14:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1063.eqiad.wmnet with reason: host reimage
  • 14:02 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1063.eqiad.wmnet with reason: host reimage
  • 13:47 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1063.eqiad.wmnet with OS bullseye
  • 13:41 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 06:01 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=elastic10[8-9][0-9].*
  • 05:54 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=elastic110.*
  • 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T312863)', diff saved to https://phabricator.wikimedia.org/P32369 and previous config saved to /var/cache/conftool/dbconfig/20220812-010312-ladsgroup.json
  • 01:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T312863)', diff saved to https://phabricator.wikimedia.org/P32368 and previous config saved to /var/cache/conftool/dbconfig/20220812-010233-ladsgroup.json
  • 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32367 and previous config saved to /var/cache/conftool/dbconfig/20220812-004727-ladsgroup.json
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P32366 and previous config saved to /var/cache/conftool/dbconfig/20220812-003221-ladsgroup.json
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T312863)', diff saved to https://phabricator.wikimedia.org/P32365 and previous config saved to /var/cache/conftool/dbconfig/20220812-001715-ladsgroup.json

2022-08-11

  • 21:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:04 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: revert Define default value for "wmgSiteLogoVariants" (T305692 T308620) (duration: 03m 15s)
  • 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:47 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Define default value for "wmgSiteLogoVariants" (T305692 T308620) (duration: 03m 07s)
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:29 thcipriani@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/VisualEditor/modules/ve-mw/preinit/ve.init.mw.DesktopArticleTarget.init.js: Backport: Do not show incompatible skin warning when page is not editable (T314952) (duration: 03m 16s)
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:23 mutante: merging change on prod phabricator host to allow scap deployment, part 1
  • 19:42 damilare: payments-wiki upgraded from cf5e1848 to 0894d75a
  • 19:41 mutante: disabling puppet on C:profile::phabricator::main
  • 19:20 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: upgrade to 3.11.13 T309896 - mvernon@cumin2002
  • 17:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:58 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Fix labtestwiki database name servers (T310795) (duration: 03m 39s)
  • 17:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:52 sukhe: testing ATS 9.1.3-1wm1 on cp3064: T309651
  • 17:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host netmon2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:46 sukhe: testing ATS 9.1.3-1wm1 on cp3064: T3096515
  • 17:41 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host netmon2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:40 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:38 sukhe: testing ATS 9.1.3-1wm1 on cp1090: T309651
  • 17:36 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host netmon2002
  • 17:34 pt1979@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host netmon2002
  • 17:33 sukhe: testing ATS 9.1.3-1wm1 on cp3065: T309651
  • 17:28 sukhe: testing ATS 9.1.3-1wm1 on cp1089: T309651
  • 17:19 bking@cumin1001: conftool action : set/weight=10:pooled=no; selector: service=elasticsearch-omega-ssl,name=elastic1100.eqiad.wmnet
  • 17:18 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=elasticsearch-omega-ssl,name=elastic1100.eqiad.wmnet
  • 17:15 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=search-omega-https,name=elastic1100.eqiad.wmnet
  • 16:35 mvernon@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: upgrade to 3.11.13 T309896 - mvernon@cumin2002
  • 16:30 mvernon@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: upgrade to 3.11.13 T309896 - mvernon@cumin2002
  • 16:29 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: T309810
  • 16:29 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: T309810
  • 16:26 inflatador: bking@elastic1054 attempting to ban elastic1100-1102 from cluster due to firewall issues
  • 16:13 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: service=search-omega-https,name=elastic1100.eqiad.wmnet
  • 16:12 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=elastic1100
  • 15:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:09 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P32364 and previous config saved to /var/cache/conftool/dbconfig/20220811-145823-ladsgroup.json
  • 14:55 inflatador: bking@cumin1001 running puppet agent across eqiad elastic hosts
  • 14:48 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P32362 and previous config saved to /var/cache/conftool/dbconfig/20220811-144318-ladsgroup.json
  • 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P32361 and previous config saved to /var/cache/conftool/dbconfig/20220811-142813-ladsgroup.json
  • 14:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1003.wikimedia.org
  • 14:28 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:24 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:19 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1003.wikimedia.org
  • 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1004.wikimedia.org
  • 14:18 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:17 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to the old templatelinks fields in s2 (T312865) (duration: 03m 25s)
  • 14:16 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:15 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:13 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P32360 and previous config saved to /var/cache/conftool/dbconfig/20220811-141309-ladsgroup.json
  • 14:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:11 awight: EU backport window complete
  • 14:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:10 awight@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/DiscussionTools/includes/CommentFormatter.php: Backport: CommentFormatter: Set 'data-mw-comment' even when reply tool disabled (T314707) (duration: 03m 31s)
  • 14:09 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1004.wikimedia.org
  • 14:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:52 mvernon@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: upgrade to 3.11.13 T309896 - mvernon@cumin2002
  • 13:50 awight@deploy1002: Synchronized wmf-config: Config: Revert "Revert "testwiki: Add mediawiki.web_ui.interactions stream"" (duration: 03m 10s)
  • 13:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:36 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1060.eqiad.wmnet with OS bullseye
  • 13:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:36 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: trwikiquote: Install WikiLove extension (T314895) (duration: 03m 30s)
  • 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:33 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host logstash2003.codfw.wmnet
  • 13:25 awight@deploy1002: Synchronized static/images: Config: Revert "trwiki: Change old and new vector logos for 500k articles" (part 3) (duration: 03m 09s)
  • 13:21 awight@deploy1002: Synchronized logos/: Config: Revert "trwiki: Change old and new vector logos for 500k articles" (part 2) (duration: 03m 09s)
  • 13:19 topranks: merging CR821781 to expose additional network info in puppet facts
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:18 awight@deploy1002: Synchronized wmf-config/: Config: Revert "trwiki: Change old and new vector logos for 500k articles" (part 1) (duration: 03m 13s)
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:14 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1060.eqiad.wmnet with reason: host reimage
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:11 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1060.eqiad.wmnet with reason: host reimage
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:08 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable editor line numbering on all namespaces, for twwiki (T302852) (duration: 03m 42s)
  • 12:56 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1060.eqiad.wmnet with OS bullseye
  • 12:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 12:49 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:46 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2018.codfw.wmnet
  • 12:26 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase202[367].codfw.wmnet
  • 12:17 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:17 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 12:17 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:16 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:13 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host logstash2003.codfw.wmnet
  • 12:11 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:10 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 09:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 09:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 09:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 09:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 09:32 godog: arm keyholder on netmon2001
  • 09:09 jbond: update gnutls28 on bullseye systems
  • 09:00 jbond: update unzip
  • 08:21 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:13 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:12 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr3-ulsfo:xe-0/1/1
  • 08:06 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr3-ulsfo:xe-0/1/1
  • 07:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr3-ulsfo:xe-0/1/1
  • 07:57 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr3-ulsfo:xe-0/1/1
  • 07:55 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-wikikube-rw,name=codfw
  • 07:51 vgutierrez: rolling restart of pybal in eqsin and ulsfo
  • 07:24 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=restbase-async,name=eqiad
  • 07:24 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox-timeline
  • 07:23 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=inference
  • 07:19 _joe_: pooling all services in codfw
  • 07:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T312863)', diff saved to https://phabricator.wikimedia.org/P32357 and previous config saved to /var/cache/conftool/dbconfig/20220811-070312-ladsgroup.json
  • 07:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 07:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 07:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T312863)', diff saved to https://phabricator.wikimedia.org/P32356 and previous config saved to /var/cache/conftool/dbconfig/20220811-070252-ladsgroup.json
  • 06:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P32355 and previous config saved to /var/cache/conftool/dbconfig/20220811-064746-ladsgroup.json
  • 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P32354 and previous config saved to /var/cache/conftool/dbconfig/20220811-063240-ladsgroup.json
  • 06:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T312863)', diff saved to https://phabricator.wikimedia.org/P32353 and previous config saved to /var/cache/conftool/dbconfig/20220811-061734-ladsgroup.json
  • 06:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maint
  • 06:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maint
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1162 (T314368 T298555 T312863 T310011 T309311 T60674 T298560 T303603 T310485)', diff saved to https://phabricator.wikimedia.org/P32352 and previous config saved to /var/cache/conftool/dbconfig/20220811-060625-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1122 to s2 primary and set section read-write T314368', diff saved to https://phabricator.wikimedia.org/P32351 and previous config saved to /var/cache/conftool/dbconfig/20220811-060113-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T314368', diff saved to https://phabricator.wikimedia.org/P32350 and previous config saved to /var/cache/conftool/dbconfig/20220811-060042-ladsgroup.json
  • 06:00 Amir1: Starting s2 eqiad failover from db1162 to db1122 - T314368
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1122 with weight 0 T314368', diff saved to https://phabricator.wikimedia.org/P32349 and previous config saved to /var/cache/conftool/dbconfig/20220811-051913-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s2 T314368
  • 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s2 T314368
  • m: chown -R librenms /srv/librenms/rrd/ on netmon1003 T314972
  • 03:51 cwhite: chown librenms /srv/librenms/rrd/* on netmon1003 T314972
  • 02:55 ejegg: civicrm upgraded from 1f91ac2d to 92467234
  • 02:46 ejegg: updated process-control yaml files with @wmff alias
  • 02:08 ejegg: civicrm rolled back from 92467234 to 1f91ac2d
  • 02:05 ejegg: civicrm upgraded from 1f91ac2d to 92467234
  • 01:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:38 tstarling@deploy1002: Synchronized wmf-config/logging.php: (no justification provided) (duration: 03m 25s)
  • 01:19 tstarling@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers|api)-ro,name=codfw
  • 01:19 tstarling@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 00:58 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=varnish-fe
  • 00:58 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-be
  • 00:58 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2042.codfw.wmnet,service=ats-tls
  • 00:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp2042.codfw.wmnet with reason: host down; depooled and will debug tomorrow
  • 00:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp2042.codfw.wmnet with reason: host down; depooled and will debug tomorrow

2022-08-10

  • 21:25 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1016.eqiad.wmnet
  • 21:23 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1014.eqiad.wmnet
  • 21:10 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 16 hosts with reason: T309810
  • 21:10 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 16 hosts with reason: T309810
  • 21:09 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1101-1102].eqiad.wmnet with reason: T309810
  • 21:09 bking@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1101-1102].eqiad.wmnet with reason: T309810
  • 21:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:00 cjming: end of UTC late backport window
  • 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:59 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused $wgEnableMWSuggest (duration: 03m 04s)
  • 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:56 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable new topic tool on dewiki (T313699) (duration: 03m 01s)
  • 20:34 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: testwiki: set $wgCdnMatchParameterOrder to false (T314868) (duration: 03m 20s)
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:09 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:08 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Start writing to cuc_actor everywhere except s4 and s8 (T233004) (duration: 03m 15s)
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:51 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2053-2054].codfw.wmnet
  • 19:51 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2053-2054].codfw.wmnet
  • 19:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2019-2020].codfw.wmnet
  • 19:35 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2019-2020].codfw.wmnet
  • 19:35 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2016-2018].codfw.wmnet
  • 19:35 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2016-2018].codfw.wmnet
  • 19:34 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2036.codfw.wmnet
  • 19:34 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2036.codfw.wmnet
  • 19:28 sukhe: testing ATS 9.1.3-1wm1 on cp4026: T309651
  • 19:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1087.eqiad.wmnet with OS bullseye
  • 19:06 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1086.eqiad.wmnet with OS bullseye
  • 18:55 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1087.eqiad.wmnet with reason: host reimage
  • 18:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1086.eqiad.wmnet with reason: host reimage
  • 18:50 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1087.eqiad.wmnet with reason: host reimage
  • 18:49 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1086.eqiad.wmnet with reason: host reimage
  • 18:47 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:38 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1087.eqiad.wmnet with OS bullseye
  • 18:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1086.eqiad.wmnet with OS bullseye
  • 18:22 urandom: truncating Cassandra hints (eqiad datacenter) -- T314941
  • 18:13 urandom: truncating codfw Cassandra hints (eqiad datacenter) -- T314941
  • 18:07 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main2005.codfw.wmnet
  • 18:07 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kafka-main2005.codfw.wmnet
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool D8 DBs after PDU maint (T310146)', diff saved to https://phabricator.wikimedia.org/P32346 and previous config saved to /var/cache/conftool/dbconfig/20220810-180529-ladsgroup.json
  • 17:42 otto@deploy1002: Finished deploy [analytics/refinery@6e47e0e]: Add missing changes to the deletion script - T270433 - [analytics/refinery@6e47e0e] (duration: 05m 28s)
  • 17:39 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labweb1002.wikimedia.org
  • 17:39 fnegri@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:36 otto@deploy1002: Started deploy [analytics/refinery@6e47e0e]: Add missing changes to the deletion script - T270433 - [analytics/refinery@6e47e0e]
  • 17:35 fnegri@cumin1001: START - Cookbook sre.dns.netbox
  • 17:34 otto@deploy1002: Finished deploy [analytics/refinery@6e47e0e] (hadoop-test): Add missing changes to the deletion script - T270433 - TEST [analytics/refinery@6e47e0e] (duration: 04m 19s)
  • 17:30 fnegri@cumin1001: START - Cookbook sre.hosts.decommission for hosts labweb1002.wikimedia.org
  • 17:30 otto@deploy1002: Started deploy [analytics/refinery@6e47e0e] (hadoop-test): Add missing changes to the deletion script - T270433 - TEST [analytics/refinery@6e47e0e]
  • 17:09 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 17:08 otto@deploy1002: Started deploy [analytics/refinery@d4dd7e4] (hadoop-test): Add safety limits to refinery-drop-older-than - T270433 - TEST [analytics/refinery@d4dd7e4]
  • 17:06 sukhe: testing ATS 9.1.3-1wm1 on cp4032: T309651
  • 17:06 urandom: flushing RESTBase Cassandra tables -row B- to (temporarily) free instance-data space -- T314941
  • 17:05 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on krb2001.codfw.wmnet with reason: btullis codfw maintenance
  • 17:05 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on krb2001.codfw.wmnet with reason: btullis codfw maintenance
  • 17:04 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gerrit2001.wikimedia.org
  • 17:02 sukhe: testing ATS 9.1.3-1wm1 on cp6008: T309651
  • 16:56 sukhe: testing ATS 9.1.3-1wm1 on cp6016: T309651
  • 16:55 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labweb1001.wikimedia.org
  • 16:55 fnegri@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:32 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gerrit2001.wikimedia.org
  • 16:32 dzahn@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:32 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2013-2014].codfw.wmnet
  • 16:31 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes[2013-2014].codfw.wmnet
  • 16:31 jelto: kubectl uncordon kubernetes2014.codfw.wmnet
  • 16:31 fnegri@cumin1001: START - Cookbook sre.dns.netbox
  • 16:30 jelto: kubectl uncordon kubernetes2013.codfw.wmnet
  • 16:29 urandom: restarting Cassandra (RESTBase) -row A- to apply r822110 -- T314941
  • 16:27 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 16:25 fnegri@cumin1001: START - Cookbook sre.hosts.decommission for hosts labweb1001.wikimedia.org
  • 16:23 mutante: shutting down gerrit2001
  • 16:23 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2034-2035].codfw.wmnet
  • 16:23 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2034-2035].codfw.wmnet
  • 16:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2016-2018].codfw.wmnet
  • 16:22 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2016-2018].codfw.wmnet
  • 16:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
  • 16:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=sessionstore2003.codfw.wmnet
  • 16:13 sukhe: reprepro -C component/trafficserver9 include buster-wikimedia trafficserver_9.1.3-1wm1_amd64.changes: T309651
  • 16:13 dzahn@cumin2002: START - Cookbook sre.hosts.decommission for hosts gerrit2001.wikimedia.org
  • 16:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2039,2050,2056,2059].codfw.wmnet,thanos-be2004.codfw.wmnet with reason: PDU work
  • 16:10 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2039,2050,2056,2059].codfw.wmnet,thanos-be2004.codfw.wmnet with reason: PDU work
  • 16:09 urandom: flushing tables in row D (RESTBase Cassandra cluster) -- T314941
  • 15:54 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gitlab-runner2004.codfw.wmnet
  • 15:54 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for gitlab-runner2004.codfw.wmnet
  • 15:53 sukhe: poweroff cp2041, 42 for PDU ugprade: rack D7
  • 15:51 urandom: flushing tables in row B (RESTBase Cassandra cluster) -- T314941
  • 15:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2004.codfw.wmnet with reason: PDU maintenance
  • 15:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2004.codfw.wmnet with reason: PDU maintenance
  • 15:46 urandom: flushing tables in row A (RESTBase Cassandra cluster) -- T314941
  • 15:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs2012.codfw.wmnet with reason: btullis codfw maintenance
  • 15:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp[2041-2042].codfw.wmnet with reason: shutdown for PDU upgrade: rack D4
  • 15:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs2012.codfw.wmnet with reason: btullis codfw maintenance
  • 15:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp[2041-2042].codfw.wmnet with reason: shutdown for PDU upgrade: rack D4
  • 15:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs2011.codfw.wmnet with reason: btullis codfw maintenance
  • 15:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs2011.codfw.wmnet with reason: btullis codfw maintenance
  • 15:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs2010.codfw.wmnet with reason: btullis codfw maintenance
  • 15:45 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs2010.codfw.wmnet with reason: btullis codfw maintenance
  • 15:45 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on aqs2009.codfw.wmnet with reason: btullis codfw maintenance
  • 15:45 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on aqs2009.codfw.wmnet with reason: btullis codfw maintenance
  • 15:37 urandom: (ephemerally) increasing hinted hand-off delivery rate limit to 16KB, RESTBase eqiad nodes -- T314941
  • 15:34 jbond: remove puppetmaster[12]002 from production
  • 15:30 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main2004.codfw.wmnet
  • 15:30 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for kafka-main2004.codfw.wmnet
  • 15:20 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2051-2052].codfw.wmnet
  • 15:20 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2051-2052].codfw.wmnet
  • 15:17 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc-gp2003.codfw.wmnet
  • 15:17 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc-gp2003.codfw.wmnet
  • 15:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2033.codfw.wmnet
  • 15:16 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2033.codfw.wmnet
  • 15:14 _joe_: power off krb2002
  • 15:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb2002.codfw.wmnet with reason: PDU maintenance
  • 15:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on krb2002.codfw.wmnet with reason: PDU maintenance
  • 15:13 _joe_: shutting down rdb2010,puppetmaster2002 for d5 maintenance
  • 15:02 jelto: power off mc2035
  • 15:01 jelto: power off mc2034
  • 15:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc2035.codfw.wmnet with reason: PDU swap
  • 15:01 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mc2035.codfw.wmnet with reason: PDU swap
  • 15:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc2034.codfw.wmnet with reason: PDU swap
  • 15:01 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mc2034.codfw.wmnet with reason: PDU swap
  • 14:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: PDU Maint (T310146)
  • 14:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: PDU Maint (T310146)
  • 14:38 urandom: disabling reserved space on eqiad nodes (RESTBase), /dev/md2 (aka /srv/cassandra/instance-data) -- T314941
  • 14:28 jelto: power off kafka-main2004 gracefully
  • 14:28 hnowlan: shutting down sessionstore2003
  • 14:27 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=sessionstore2003.codfw.wmnet
  • 14:27 sukhe: power off cp2039, cp2040 for PDU upgrade: rack D
  • 14:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2003.codfw.wmnet with reason: PDU maintenance
  • 14:27 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2003.codfw.wmnet with reason: PDU maintenance
  • 14:25 jelto: power off mc-gp2003
  • 14:25 jelto: power off mc2033
  • 14:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:30:00 on kafka-main2004.codfw.wmnet with reason: PDU swap
  • 14:23 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:30:00 on kafka-main2004.codfw.wmnet with reason: PDU swap
  • 14:23 sukhe: depool codfw for PDU upgrade: rack D
  • 14:23 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:30:00 on mc-gp2003.codfw.wmnet with reason: PDU swap
  • 14:23 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:30:00 on mc-gp2003.codfw.wmnet with reason: PDU swap
  • 14:23 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:30:00 on mc2033.codfw.wmnet with reason: PDU swap
  • 14:23 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:30:00 on mc2033.codfw.wmnet with reason: PDU swap
  • 14:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp20[39|40]\.codfw\.wmnet,service=ats-tls
  • 14:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cp[2039-2040].codfw.wmnet with reason: shutdown for PDU upgrade: rack D4
  • 14:13 urandom: flushing Cassandra tables, restbase1030
  • 14:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on cp[2039-2040].codfw.wmnet with reason: shutdown for PDU upgrade: rack D4
  • 14:13 urandom: flushing Cassandra tables, restbase1019
  • 14:12 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on dns2002.wikimedia.org with reason: shutdown for PDU upgrade: rack D4
  • 14:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on dns2002.wikimedia.org with reason: shutdown for PDU upgrade: rack D4
  • 14:11 urandom: flushing Cassandra tables, restbase1017 1018 1021 1024 1025 1026 1028 1029
  • 14:05 urandom: flushing tables, restbase1016
  • 13:52 hnowlan: powered up restbase2018
  • 13:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on netmon1003.wikimedia.org with reason: pdu
  • 13:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on netmon1003.wikimedia.org with reason: pdu
  • 13:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on logstash2003.codfw.wmnet with reason: pdu
  • 13:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on logstash2003.codfw.wmnet with reason: pdu
  • 13:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on logstash2029.codfw.wmnet with reason: pdu
  • 13:31 root@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on logstash2029.codfw.wmnet with reason: pdu
  • 13:30 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: T310146
  • 13:30 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: T310146
  • 13:17 elukey: powering on restbase2027
  • 13:12 elukey: powering on restbase2026
  • 13:12 _joe_: powering on restbase2023
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1160 (T312863)', diff saved to https://phabricator.wikimedia.org/P32343 and previous config saved to /var/cache/conftool/dbconfig/20220810-130108-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 12:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[2072,2084-2085].codfw.wmnet with reason: T310146
  • 12:37 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[2072,2084-2085].codfw.wmnet with reason: T310146
  • 12:27 jbond: remove confd from serveres that shouldn;t have it
  • 12:05 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/Echo/maintenance/removeOrphanedEvents.php: Backport: Run clean ups with removeOrphanedEvents in major batches (T310428) (duration: 03m 32s)
  • 11:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:15 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
  • 10:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:51 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:37 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2181-2182].codfw.wmnet with reason: D6 PDU maint (T310146)
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2181-2182].codfw.wmnet with reason: D6 PDU maint (T310146)
  • 10:26 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on maps2010.codfw.wmnet with reason: PDU maintenance
  • 10:26 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on maps2010.codfw.wmnet with reason: PDU maintenance
  • 10:26 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
  • 10:25 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
  • 10:25 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
  • 10:24 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase2018.codfw.wmnet with reason: PDU maintenance
  • 10:24 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2018.codfw.wmnet
  • 10:24 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase2018.codfw.wmnet with reason: PDU maintenance
  • 10:23 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ml-serve2008.codfw.wmnet with reason: PDU maintenance
  • 10:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ml-serve2008.codfw.wmnet with reason: PDU maintenance
  • 10:20 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ores2009.codfw.wmnet with reason: PDU maintenance
  • 10:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ores2009.codfw.wmnet with reason: PDU maintenance
  • 10:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ores2008.codfw.wmnet with reason: PDU maintenance
  • 10:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on ores2008.codfw.wmnet with reason: PDU maintenance
  • 10:03 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase202[367].codfw.wmnet
  • 10:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase[2023,2026-2027].codfw.wmnet with reason: PDU maintenance
  • 10:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase[2023,2026-2027].codfw.wmnet with reason: PDU maintenance
  • 09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: D8 PDU Maint (T310146)
  • 09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: D8 PDU Maint (T310146)
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool D8 DBs for PDU maint (T310146)', diff saved to https://phabricator.wikimedia.org/P32341 and previous config saved to /var/cache/conftool/dbconfig/20220810-095059-ladsgroup.json
  • 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2101,2130,2140].codfw.wmnet,dbproxy2004.codfw.wmnet with reason: D6 PDU maint (T310146)
  • 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2101,2130,2140].codfw.wmnet,dbproxy2004.codfw.wmnet with reason: D6 PDU maint (T310146)
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool D6 dbs (T310146)', diff saved to https://phabricator.wikimedia.org/P32340 and previous config saved to /var/cache/conftool/dbconfig/20220810-093433-ladsgroup.json
  • 09:31 jelto: depool services in codfw for upcoming PDU replacement - T309956
  • 09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 09:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 8 hosts with reason: Maintenance
  • 09:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 09:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 09:28 jynus: shutdown backup2007 before pdu upgrade T310146
  • 09:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:15 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.23/maintenance/namespaceDupes.php: Backport: maintenance: Add support for links migration to namespaceDupes.php (T314711) (duration: 03m 18s)
  • 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2093,2120,2129,2172].codfw.wmnet with reason: D5 PDU maint (T310146)
  • 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2093,2120,2129,2172].codfw.wmnet with reason: D5 PDU maint (T310146)
  • 09:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool D5 dbs (T310146)', diff saved to https://phabricator.wikimedia.org/P32339 and previous config saved to /var/cache/conftool/dbconfig/20220810-091038-ladsgroup.json
  • 08:49 jynus: shutdown dbprov2003 before pdu upgrade T310146
  • 08:49 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.debug (exit_code=99) for Netbox interface ID cr1-drmrs:xe-0/1/2
  • 08:48 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-drmrs:xe-0/1/2
  • 08:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2028.codfw.wmnet
  • 08:48 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2028.codfw.wmnet
  • 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P32337 and previous config saved to /var/cache/conftool/dbconfig/20220810-084222-ladsgroup.json
  • 08:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 08:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:35 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to the old templatelinks fields in s5 (T312865) (duration: 03m 29s)
  • 08:32 jelto: power off gitlab-runner2004
  • 08:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:30:00 on gitlab-runner2004.codfw.wmnet with reason: PDU swap
  • 08:31 root@cumin1001: START - Cookbook sre.hosts.downtime for 8:30:00 on gitlab-runner2004.codfw.wmnet with reason: PDU swap
  • 08:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be2028.codfw.wmnet with reason: Trying to fix full /
  • 08:28 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be2028.codfw.wmnet with reason: Trying to fix full /
  • 08:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID cr1-drmrs:xe-0/1/2
  • 08:27 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-drmrs:xe-0/1/2
  • 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P32336 and previous config saved to /var/cache/conftool/dbconfig/20220810-082718-ladsgroup.json
  • 08:25 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.debug (exit_code=99) for Netbox interface ID cr1-drmrs:xe-0/1/2
  • 08:25 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-drmrs:xe-0/1/2
  • 08:24 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.debug (exit_code=99) for Netbox interface ID cr1-drmrs:xe-0/1/2
  • 08:24 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID cr1-drmrs:xe-0/1/2
  • 08:23 kart_: Run: mwscript namespaceDupes.php arywiki --fix (T291737)
  • 08:13 jynus: restart replication on db1117:m1 T309074
  • 08:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P32335 and previous config saved to /var/cache/conftool/dbconfig/20220810-081213-ladsgroup.json
  • 08:09 kartik@deploy1002: Finished scap: Backport: arywiki: change namespace translations, add unchanged namespaces and add old translations as aliases (T291737) (duration: 10m 37s)
  • 07:59 kartik@deploy1002: Started scap: Backport: arywiki: change namespace translations, add unchanged namespaces and add old translations as aliases (T291737)
  • 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P32334 and previous config saved to /var/cache/conftool/dbconfig/20220810-075708-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P32333 and previous config saved to /var/cache/conftool/dbconfig/20220810-075636-ladsgroup.json
  • 07:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 07:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 07:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:51 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:46 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 07:39 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:34 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 07:33 godog: depool thanos-fe2001 for debugging
  • 07:11 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable SectionTranslation on testwiki with new MT support from Google (T313296) (duration: 05m 44s)
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 05:24 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on kubernetes[2013-2014].codfw.wmnet with reason: PDU maintenance
  • 05:24 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on kubernetes[2013-2014].codfw.wmnet with reason: PDU maintenance
  • 05:19 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on parse[2016-2020].codfw.wmnet with reason: PDU maintenance
  • 05:19 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on parse[2016-2020].codfw.wmnet with reason: PDU maintenance
  • 05:12 _joe_: starting to shut down servers in codfw for the PDU maintenance
  • 05:09 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on 10 hosts with reason: PDU maintenance
  • 05:09 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on 10 hosts with reason: PDU maintenance
  • 05:09 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on mc-gp2003.codfw.wmnet with reason: PDU maintenance
  • 05:09 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on mc-gp2003.codfw.wmnet with reason: PDU maintenance
  • 05:06 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on mc2033.codfw.wmnet with reason: PDU maintenance
  • 05:06 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on mc2033.codfw.wmnet with reason: PDU maintenance
  • 05:05 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on 7 hosts with reason: PDU maintenance
  • 05:05 oblivian@cumin1001: START - Cookbook sre.hosts.downtime for 18:00:00 on 7 hosts with reason: PDU maintenance
  • 02:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-08-09

  • 23:17 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1011.eqiad.wmnet
  • 23:07 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 23:06 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 22:51 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 22:51 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 22:49 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 22:49 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 22:46 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1015.eqiad.wmnet
  • 22:31 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 22:31 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 22:28 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:02 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 22:02 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 21:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2006.codfw.wmnet with reason: T310146
  • 21:57 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2006.codfw.wmnet with reason: T310146
  • 21:53 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 21:52 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 21:50 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 21:49 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 21:43 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 21:43 bking@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 21:43 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 21:43 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 21:43 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 21:43 bking@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 21:08 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:00 bking@cumin1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs1014.eqiad.wmnet
  • 20:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T312863)', diff saved to https://phabricator.wikimedia.org/P32332 and previous config saved to /var/cache/conftool/dbconfig/20220809-205548-ladsgroup.json
  • 20:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs1014.eqiad.wmnet
  • 20:51 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs1014.eqiad.wmnet
  • 20:46 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32331 and previous config saved to /var/cache/conftool/dbconfig/20220809-204042-ladsgroup.json
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P32330 and previous config saved to /var/cache/conftool/dbconfig/20220809-202536-ladsgroup.json
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T312863)', diff saved to https://phabricator.wikimedia.org/P32329 and previous config saved to /var/cache/conftool/dbconfig/20220809-201030-ladsgroup.json
  • 19:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1014.eqiad.wmnet with reason: T314890
  • 19:57 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1014.eqiad.wmnet with reason: T314890
  • 19:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: T314890
  • 19:56 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: T314890
  • 19:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1015.eqiad.wmnet with reason: T314890
  • 19:55 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1015.eqiad.wmnet with reason: T314890
  • 19:38 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 19:36 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 19:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 19:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:06 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:47 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1072.eqiad.wmnet with OS bullseye
  • 17:29 vgutierrez: test trafficserver 9.1.2-1wm2 in cp6016 - T309651
  • 17:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1072.eqiad.wmnet with reason: host reimage
  • 17:13 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1072.eqiad.wmnet with reason: host reimage
  • 17:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1072.eqiad.wmnet with OS bullseye
  • 16:54 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:54 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:53 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 16:53 bking@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 16:26 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:26 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1069.eqiad.wmnet with OS bullseye
  • 15:45 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1069.eqiad.wmnet with reason: host reimage
  • 15:42 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1069.eqiad.wmnet with reason: host reimage
  • 15:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:30 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1069.eqiad.wmnet with OS bullseye
  • 15:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:27 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1058.eqiad.wmnet with OS bullseye
  • 15:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1058.eqiad.wmnet with reason: host reimage
  • 15:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1058.eqiad.wmnet with reason: host reimage
  • 14:59 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:59 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • m: finished running 'homer "status:active" commit "netmon: Add the netmon1003 host as a syslog destination"' in the cumin1001 host. Homer reported no errors.
  • 14:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:50 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1058.eqiad.wmnet with OS bullseye
  • 14:28 bking@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=codfw
  • 13:57 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:57 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • m: Add the new netmon1003 host as a syslog destination in homer templates/common/system.conf https://gerrit.wikimedia.org/r/c/operations/homer/public/+/819124
  • m: Successfully ran '# run-puppet-merge' in the netmon1002 and netmon1003 hosts.
  • m: Running '# run-puppet-agent' in the netmon1003 host
  • m: Running '# run-puppet-agent' in the netmon1002 host
  • 13:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 13:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.force-shard-allocation
  • m: puppet-merge on puppetmaster2004.codfw.wmnet for patch 819179 succeeded
  • m: Set netmon1003 as netmon_server and netmon1002 as a netmon_servers_failover in the Puppet repository https://gerrit.wikimedia.org/r/c/operations/puppet/+/819179
  • m: authdns updated successfully
  • m: Had to revert https://gerrit.wikimedia.org/r/c/operations/dns/+/819177 because I rebased my changes incorrectly, sent the new patch in https://gerrit.wikimedia.org/r/c/operations/dns/+/821746
  • m: running '# authdns-update' in ns0.wikimedia.org
  • m: Flip DNS for LibreNMS and Smokeping from netmon1002 to netmon1003 https://gerrit.wikimedia.org/r/c/operations/dns/+/819177
  • 13:23 jynus: stop replication on db1117:m1 T309074
  • m: netmon1002 to netmon1003 failover
  • 13:17 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:16 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:53 vgutierrez: rolling restart of pybal in eqsin - T310070
  • 09:25 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:12 vgutierrez: rolling restart of pybal in codfw - T310070
  • 08:47 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:30 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:28 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:27 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:26 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:24 jynus: starting data check using es1021 and es2021, expect increased read traffic T314559
  • 08:21 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 06:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 06:19 Amir1: dbmaint s5@eqiad (T312863 T312984 T310011 T310485)
  • 06:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1130.eqiad.wmnet with reason: Maint
  • 06:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1130.eqiad.wmnet with reason: Maint
  • 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1130 T314370', diff saved to https://phabricator.wikimedia.org/P32323 and previous config saved to /var/cache/conftool/dbconfig/20220809-060836-ladsgroup.json
  • 06:07 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1100 to s5 primary and set section read-write T314370', diff saved to https://phabricator.wikimedia.org/P32322 and previous config saved to /var/cache/conftool/dbconfig/20220809-060159-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T314370', diff saved to https://phabricator.wikimedia.org/P32321 and previous config saved to /var/cache/conftool/dbconfig/20220809-060105-ladsgroup.json
  • 06:00 Amir1: Starting s5 eqiad failover from db1130 to db1100 - T314370
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1100 with weight 0 T314370', diff saved to https://phabricator.wikimedia.org/P32320 and previous config saved to /var/cache/conftool/dbconfig/20220809-051251-ladsgroup.json
  • 05:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s5 T314370
  • 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 22 hosts with reason: Primary switchover s5 T314370
  • 02:42 ejegg: SmashPig upgraded from 9b97ea15 to 13e9e9cc
  • 02:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T312863)', diff saved to https://phabricator.wikimedia.org/P32318 and previous config saved to /var/cache/conftool/dbconfig/20220809-023113-ladsgroup.json
  • 02:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 02:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32317 and previous config saved to /var/cache/conftool/dbconfig/20220809-023052-ladsgroup.json
  • 02:28 ejegg: payments-wiki upgraded from 6880236d to cf5e1848
  • 02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P32316 and previous config saved to /var/cache/conftool/dbconfig/20220809-021546-ladsgroup.json
  • 02:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P32315 and previous config saved to /var/cache/conftool/dbconfig/20220809-020040-ladsgroup.json
  • 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32314 and previous config saved to /var/cache/conftool/dbconfig/20220809-014534-ladsgroup.json

2022-08-08

  • 23:52 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: clean up testwiki experiments T314750 (duration: 03m 19s)
  • 23:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:46 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: clean up testwiki experiments T314750 (duration: 03m 27s)
  • 23:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 23:32 eileen___: config revision changed from f5668044 to 787cd0e0<eileen___> eileen
  • 23:32 eileen___: civicrm upgraded from 497bddf7 to 1f91ac2d
  • 22:16 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 22:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic1065.eqiad.wmnet with OS bullseye
  • 21:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1065.eqiad.wmnet with reason: host reimage
  • 21:50 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1065.eqiad.wmnet with reason: host reimage
  • 21:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1065.eqiad.wmnet with OS bullseye
  • 21:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1062.eqiad.wmnet with OS bullseye
  • 20:53 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1062.eqiad.wmnet with reason: host reimage
  • 20:50 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1062.eqiad.wmnet with reason: host reimage
  • 20:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1062.eqiad.wmnet with OS bullseye
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:29 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 20:28 cjming: end of UTC late backport window
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:27 cjming@deploy1002: Synchronized php-1.39.0-wmf.23/skins/Vector/resources/skins.vector.styles/layouts/grid.less: Backport: Fix grid blowout bug (T314756) (duration: 03m 26s)
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:11 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Disable sticky header edit A/B test for pilot wikis (T312296) (duration: 03m 35s)
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1088.eqiad.wmnet with OS bullseye
  • 17:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1088.eqiad.wmnet with reason: host reimage
  • 17:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1088.eqiad.wmnet with reason: host reimage
  • 17:00 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1088.eqiad.wmnet with OS bullseye
  • 16:54 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1085.eqiad.wmnet with OS bullseye
  • 16:49 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 16:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
  • 16:39 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:38 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
  • 16:26 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1085.eqiad.wmnet with OS bullseye
  • 16:24 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host elastic1085.eqiad.wmnet with OS bullseye
  • 16:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
  • 16:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1085.eqiad.wmnet with reason: host reimage
  • 16:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 16:14 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 16:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 16:10 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 16:09 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 16:04 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1085.eqiad.wmnet with OS bullseye
  • 16:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1084.eqiad.wmnet with OS bullseye
  • 15:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reimage (bullseye upgrade) - ryankemper@cumin1001 - T289135
  • 15:47 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1084.eqiad.wmnet with reason: host reimage
  • 15:46 sukhe: upload reprepro -C main include bullseye-wikimedia python-pynetbox_6.6.0-1+wmf11u1_amd64.changes
  • 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1084.eqiad.wmnet with reason: host reimage
  • 15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maint
  • 15:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maint
  • 15:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic1084.eqiad.wmnet with OS bullseye
  • 14:59 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:55 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: T314256
  • 14:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on cp5001.eqsin.wmnet with reason: depooled: faulty DIMM: T314256
  • 14:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:11 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:56 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 77fd5ab: Growth: Add new rights to wgAvailableRights (duration: 03m 24s)
  • 12:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1102.eqiad.wmnet
  • 12:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:06 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/GrowthExperiments/: 3eaf155: MentorTools: Do not use MentorWeightManager (T314362) (duration: 03m 31s)
  • 12:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1102.eqiad.wmnet
  • 11:21 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2022.codfw.wmnet
  • 11:21 jelto: kubectl uncordon kubernetes2022.codfw.wmnet
  • 10:43 Amir1: Removing db2079 from orchestrator (T313885)
  • 10:39 Amir1: Removing db2079 from zarcillo (T313885)
  • 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2079.codfw.wmnet
  • 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:30 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2079.codfw.wmnet
  • 10:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2079.codfw.wmnet with reason: Decom
  • 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2079.codfw.wmnet with reason: Decom
  • 08:41 jbond: deploy libtirpc update
  • 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T312863)', diff saved to https://phabricator.wikimedia.org/P32310 and previous config saved to /var/cache/conftool/dbconfig/20220808-075723-ladsgroup.json
  • 07:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 07:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 07:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T312863)', diff saved to https://phabricator.wikimedia.org/P32309 and previous config saved to /var/cache/conftool/dbconfig/20220808-075702-ladsgroup.json
  • 07:53 godog: grow sda/sdb 3 by 100G on thanos-be2001 - T314275
  • 07:50 godog: grow sda/sdb 3 by 100G on thanos-be1004 - T314275
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32308 and previous config saved to /var/cache/conftool/dbconfig/20220808-074156-ladsgroup.json
  • 07:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P32307 and previous config saved to /var/cache/conftool/dbconfig/20220808-072650-ladsgroup.json
  • 07:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:22 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: trwikivoyage: Create rollbacker user group (T314678) (duration: 03m 17s)
  • 07:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:11 elukey: restart rsyslog on ml-serve2007
  • 07:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T312863)', diff saved to https://phabricator.wikimedia.org/P32306 and previous config saved to /var/cache/conftool/dbconfig/20220808-071144-ladsgroup.json
  • 07:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:09 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable SectionTranslation on 10 Wikipedias where ContentTranslation is default (T308829) (duration: 03m 15s)
  • 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:06 XioNoX: add CSP headers to Netbox - T296356
  • 07:05 elukey: restart rsyslog on ml-serve-ctrl2001

2022-08-07

  • 19:58 taavi: taavi@mwmaint1002 ~ $ echo "https://upload.wikimedia.org/wikipedia/commons/1/15/Keep_tidy_ask.svg" | mwscript purgeList.php --wiki enwiki # T314712
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T312863)', diff saved to https://phabricator.wikimedia.org/P32305 and previous config saved to /var/cache/conftool/dbconfig/20220807-135204-ladsgroup.json
  • 13:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32304 and previous config saved to /var/cache/conftool/dbconfig/20220807-135143-ladsgroup.json
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32303 and previous config saved to /var/cache/conftool/dbconfig/20220807-133637-ladsgroup.json
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P32302 and previous config saved to /var/cache/conftool/dbconfig/20220807-132131-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32301 and previous config saved to /var/cache/conftool/dbconfig/20220807-130625-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T312863)', diff saved to https://phabricator.wikimedia.org/P32300 and previous config saved to /var/cache/conftool/dbconfig/20220807-120610-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T312863)', diff saved to https://phabricator.wikimedia.org/P32299 and previous config saved to /var/cache/conftool/dbconfig/20220807-120549-ladsgroup.json
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32298 and previous config saved to /var/cache/conftool/dbconfig/20220807-115043-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P32297 and previous config saved to /var/cache/conftool/dbconfig/20220807-113537-ladsgroup.json
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T312863)', diff saved to https://phabricator.wikimedia.org/P32296 and previous config saved to /var/cache/conftool/dbconfig/20220807-112031-ladsgroup.json

2022-08-06

  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T312863)', diff saved to https://phabricator.wikimedia.org/P32295 and previous config saved to /var/cache/conftool/dbconfig/20220806-175916-ladsgroup.json
  • 17:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 03:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:02 krinkle@deploy1002: Synchronized w/: I9067d4 (duration: 03m 25s)
  • 03:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 02:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-08-05

  • 22:20 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly (duration: 02m 01s)
  • 22:18 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@71fe016]: Fix schedule_interval for image_recommendation_weekly
  • 17:08 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1195.eqiad.wmnet with OS bullseye
  • 16:54 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1194.eqiad.wmnet with OS bullseye
  • 16:53 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
  • 16:49 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: host reimage
  • 16:41 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
  • 16:37 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1194.eqiad.wmnet with reason: host reimage
  • 16:34 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1195.eqiad.wmnet with OS bullseye
  • 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=varnish-fe
  • 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-be
  • 16:27 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[56]\.codfw\.wmnet,service=ats-tls
  • 16:26 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1194.eqiad.wmnet with OS bullseye
  • 16:25 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1193.eqiad.wmnet with OS bullseye
  • 16:21 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host db1192.eqiad.wmnet with OS bullseye
  • 16:12 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@8489923]: T304954: Automate imagesuggestion imports (duration: 02m 03s)
  • 16:11 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
  • 16:11 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :) (duration: 06m 09s)
  • 16:10 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@8489923]: T304954: Automate imagesuggestion imports
  • 16:07 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1193.eqiad.wmnet with reason: host reimage
  • 16:07 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
  • 16:05 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine, now with FORCE :)
  • 16:04 milimetric@deploy1002: Finished deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine (duration: 34m 38s)
  • 16:03 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1192.eqiad.wmnet with reason: host reimage
  • 15:55 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1193.eqiad.wmnet with OS bullseye
  • 15:52 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1191.eqiad.wmnet with OS bullseye
  • 15:51 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1192.eqiad.wmnet with OS bullseye
  • 15:42 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1190.eqiad.wmnet with OS bullseye
  • 15:38 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
  • 15:34 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1191.eqiad.wmnet with reason: host reimage
  • 15:30 milimetric@deploy1002: Started deploy [analytics/refinery@fe7bf9e]: Hotfix for webrequest load refine
  • 15:28 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
  • 15:25 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1190.eqiad.wmnet with reason: host reimage
  • 15:24 jbond: upload trapperkeeper-metrics-clojure to puppet7 component
  • 15:22 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1191.eqiad.wmnet with OS bullseye
  • 15:19 jbond: upload puppetlabs-http-client-clojur to puppet7 component
  • 15:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:14 dancy@deploy1002: Finished scap: Backport for gerrit:820653 scap gitignore: ignore all files under the `scap` directory (duration: 04m 41s)
  • 15:11 jbond: upload jolokia to puppet7 component
  • 15:10 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1185.eqiad.wmnet with OS bullseye
  • 15:09 dancy@deploy1002: Started scap: Backport for gerrit:820653 scap gitignore: ignore all files under the `scap` directory
  • 15:09 jbond: upload test-chuck-clojure to puppet7 component
  • 15:05 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1190.eqiad.wmnet with OS bullseye
  • 15:04 jbond: upload test-check-clojure to puppet7 component
  • 14:57 jbond: upload nippy-clojure to puppet7 component
  • 14:56 pt1979@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
  • 14:52 pt1979@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1185.eqiad.wmnet with reason: host reimage
  • 14:43 jbond: upload fressian to puppet7 component
  • 14:40 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host db1185.eqiad.wmnet with OS bullseye
  • 14:40 jbond: upload test-generative-clojure to puppet7 component
  • 14:35 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:34 jbond: upload data-generators-clojure to puppet7 component
  • 14:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:23 jbond: upload encore-clojure to puppet7 component
  • 14:17 jbond: upload truss-clojure to puppet7 component
  • 14:13 jbond: upload structured-logging-clojure to puppet7 component
  • 14:06 jbond: upload murphy-clojure to puppet7 component
  • 13:57 jbond: upload logstash-logback-encoder-7.2 to puppet7 component
  • 13:49 jbond: upload kitchensink-clojure to puppet7 component
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool hosts with fragile power supply (T314559 T314628)', diff saved to https://phabricator.wikimedia.org/P32292 and previous config saved to /var/cache/conftool/dbconfig/20220805-132709-ladsgroup.json
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:09 sukhe: repool codfw
  • 13:02 jbond: upload honeysql-clojure to puppet7 component
  • 12:53 _joe_: progressive repool of services in codfw
  • 12:24 moritzm: installing nano bugfix updates from bullseye point release
  • 11:50 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 11:40 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on D3 (T310146)', diff saved to https://phabricator.wikimedia.org/P32291 and previous config saved to /var/cache/conftool/dbconfig/20220805-113729-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C6 (T310145)', diff saved to https://phabricator.wikimedia.org/P32290 and previous config saved to /var/cache/conftool/dbconfig/20220805-113555-ladsgroup.json
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool after PDU maint on C5 (T310145)', diff saved to https://phabricator.wikimedia.org/P32289 and previous config saved to /var/cache/conftool/dbconfig/20220805-113436-ladsgroup.json
  • 10:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 10:36 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 10:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 10:12 Amir1: dbmaint at s4@codfw (T312863)
  • 10:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 12 hosts with reason: Maintenance
  • 09:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 09:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
  • 00:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2001.wikimedia.org with reason: decom, replaced by gerrit2002
  • 00:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for gerrit2002.wikimedia.org
  • 00:53 dzahn@cumin1001: START - Cookbook sre.hosts.remove-downtime for gerrit2002.wikimedia.org
  • 00:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
  • 00:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 8 days, 0:00:00 on gerrit2002.wikimedia.org with reason: decom, replaced by gerrit2002
  • 00:18 mutante: restarting gerrit for config change - removing old replica T313250

2022-08-04

  • 23:07 mutante: switching gerrit-replica.wikimedia.org to new machine gerrit2002, dropping gerrit-replica-new.wikimedia.org T313250
  • 21:07 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:56 thcipriani@deploy1002: Finished scap: Backport for gerrit:819774 tkwiki: Update wordmark (duration: 06m 12s)
  • 20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 20:51 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 20:51 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 20:50 thcipriani@deploy1002: Started scap: Backport for gerrit:819774 tkwiki: Update wordmark
  • 20:48 thcipriani@deploy1002: Finished scap: Backport for gerrit:812391 [config]: Add click event logging for mobile and desktop (duration: 39m 16s)
  • 20:45 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 20:24 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 20:23 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 20:22 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:13 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 20:13 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 20:10 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 20:09 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 20:08 thcipriani@deploy1002: Started scap: Backport for gerrit:812391 [config]: Add click event logging for mobile and desktop
  • 19:59 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 19:55 dancy@deploy1002: rebuilt and synchronized wikiversions files: resync
  • 19:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for thanos-be2001.codfw.wmnet
  • 19:49 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for thanos-be2001.codfw.wmnet
  • 19:44 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 8 hosts
  • 19:44 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for 8 hosts
  • 19:42 Emperor: rebooting thanos-be2001 to fix drive ordering
  • 19:37 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2071.codfw.wmnet
  • 19:37 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2071.codfw.wmnet
  • 19:31 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2071.codfw.wmnet with reason: T310146
  • 19:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2071.codfw.wmnet with reason: T310146
  • 19:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:12 ryankemper@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 19:11 ryankemper@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 19:11 dancy: There were many errors during php-fpm restart due to failure to contact http://lvs2009:9090/pools/appservers-https_443/mw2361.codfw.wmnet and the like.
  • 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.39.0-wmf.23 refs T308076
  • 19:09 ryankemper@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 19:09 ryankemper@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 19:05 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 19:04 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
  • 19:04 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 19:03 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
  • 19:03 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 19:02 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: sync
  • 19:02 ottomata: roll-restarting eventgate-analytics-external to pick up backwards incompatible schema change - T314151
  • 18:47 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 18:46 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 18:41 cwhite: poweroff kafka-logging2003 - T310145
  • 18:39 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw237[0-6].codfw.wmnet
  • 18:39 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 7 hosts
  • 18:39 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for 7 hosts
  • 18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2369.codfw.wmnet
  • 18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2369.codfw.wmnet
  • 18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2368.codfw.wmnet
  • 18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2368.codfw.wmnet
  • 18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2367.codfw.wmnet
  • 18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2367.codfw.wmnet
  • 18:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2369.codfw.wmnet
  • 18:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2368.codfw.wmnet
  • 18:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2367.codfw.wmnet
  • 18:35 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2366.codfw.wmnet
  • 18:35 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2366.codfw.wmnet
  • 18:34 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2366.codfw.wmnet
  • 18:30 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2279.codfw.wmnet
  • 18:30 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2278.codfw.wmnet
  • 18:29 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2277.codfw.wmnet
  • 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2276.codfw.wmnet
  • 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2276.codfw.wmnet
  • 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2275.codfw.wmnet
  • 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2275.codfw.wmnet
  • 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2274.codfw.wmnet
  • 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2274.codfw.wmnet
  • 18:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2273.codfw.wmnet
  • 18:29 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2273.codfw.wmnet
  • 18:26 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 02m 39s)
  • 18:24 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2272.codfw.wmnet
  • 18:24 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2272.codfw.wmnet
  • 18:24 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2271.codfw.wmnet
  • 18:24 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for mw2271.codfw.wmnet
  • 18:23 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 18:23 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 32s)
  • 18:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2276.codfw.wmnet
  • 18:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2275.codfw.wmnet
  • 18:23 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2274.codfw.wmnet
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2273.codfw.wmnet
  • 18:22 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2272.codfw.wmnet
  • 18:22 Emperor: shutdown moss-fe2001.codfw.wmnet,ms-fe2011.codfw.wmnet,ms-be20[34,35,42,48,68].codfw.wmnet PDU work T310145
  • 18:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 8 hosts with reason: PDU work
  • 18:21 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 18:21 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 8 hosts with reason: PDU work
  • 18:21 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 03s)
  • 18:21 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 18:21 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 03s)
  • 18:21 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 18:20 milimetric@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 49s)
  • 18:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts
  • 18:20 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for 9 hosts
  • 18:19 milimetric@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 18:14 mutante: mw2272 and upwards: scap pull, checking monitoring, repooling.. one by one
  • 18:13 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=mw2271.codfw.wmnet
  • 18:12 btullis@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 51s)
  • 18:11 btullis@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 18:06 btullis@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 01m 54s)
  • 18:04 btullis@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2009.codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2009.codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:43 mutante: maps2008 - downtime and shutdown for D3 maintenance
  • 17:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps2008.codfw.wmnet with reason: codfw reboots
  • 17:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps2008.codfw.wmnet with reason: codfw reboots
  • 17:42 mutante: thunmbor2006 - downtime and shutdown for D3 maintenance
  • 17:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on thumbor2006.codfw.wmnet with reason: codfw reboots
  • 17:41 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on thumbor2006.codfw.wmnet with reason: codfw reboots
  • 17:39 mutante: mw2386 - systemctl reset-failed
  • 17:31 mutante: phab2001 - systemctl restart ssh-phab, attempting to clear Icinga pybal alerts, related to reboots
  • 17:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
  • 17:30 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
  • 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
  • 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on dns2001.wikimedia.org with reason: shutdown for PDU upgrade
  • 17:28 Amir1: dbmaint at s4@eqiad (T312863)
  • 17:26 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:26 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:24 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:23 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:23 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:23 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:20 mutante: [an-launcher1002:~] $ sudo systemctl reset-failed
  • 17:20 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=ms-fe2012.codfw.wmnet
  • 17:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 17:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 17:18 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet,service=varnish-fe
  • 17:18 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet,service=ats-be
  • 17:18 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet,service=ats-tls
  • 17:16 Emperor: shutdown of moss-fe2002.codfw.wmnet,ms-be20[37,38,43,61,65,69].codfw.wmnet,ms-fe2012.codfw.wmnet,thanos-fe2003.codfw.wmnet for power work T310146
  • 17:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp[2035-2036].codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:15 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp[2035-2036].codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 9 hosts with reason: PDU work
  • 17:15 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 9 hosts with reason: PDU work
  • 17:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[56]\.codfw\.wmnet,service=varnish-fe
  • 17:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[56]\.codfw\.wmnet,service=ats-be
  • 17:15 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[56]\.codfw\.wmnet,service=ats-tls
  • 17:13 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet
  • 17:13 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet
  • 17:12 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet,service=varnish-fe
  • 17:12 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet,service=ats-be
  • 17:12 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2037.codfw.wmnet,service=ats-tls
  • 17:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2050.codfw.wmnet with reason: T310146
  • 17:12 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2050.codfw.wmnet with reason: T310146
  • 17:11 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288] (duration: 00m 04s)
  • 17:11 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288]
  • 17:11 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288] (duration: 00m 07s)
  • 17:10 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288]
  • 17:10 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 15s)
  • 17:09 ebysans@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 17:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2010.codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:07 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2010.codfw.wmnet with reason: shutdown for PDU upgrade
  • 16:55 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2008.codfw.wmnet
  • 16:51 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288] (duration: 07m 14s)
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2016.codfw.wmnet
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase202[05].codfw.wmnet
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase202[05].codfw.wmnet
  • 16:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
  • 16:43 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2553288]
  • 16:43 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288] (duration: 00m 07s)
  • 16:43 ebysans@deploy1002: Started deploy [analytics/refinery@2553288] (thin): Regular analytics weekly train THIN [analytics/refinery@2553288]
  • 16:37 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 18 hosts
  • 16:37 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 18 hosts
  • 16:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2059.codfw.wmnet with reason: T310145
  • 16:35 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2059.codfw.wmnet with reason: T310145
  • 16:34 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2003.codfw.wmnet with reason: PDU swap
  • 16:34 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 00m 20s)
  • 16:34 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2003.codfw.wmnet with reason: PDU swap
  • 16:34 ebysans@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 16:32 ebysans@deploy1002: Finished deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288] (duration: 29m 59s)
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool D3 for PDU maint', diff saved to https://phabricator.wikimedia.org/P32286 and previous config saved to /var/cache/conftool/dbconfig/20220804-163037-ladsgroup.json
  • 16:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:28 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Start reading from new templatelinks columns in commons (T306673) (duration: 03m 00s)
  • 16:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:26 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:17 brett: deploying authdns - geodns: Map out African countries by DC latency (T311472)
  • 16:12 cwhite: poweroff logstash2028 - T310145
  • 16:06 Emperor: shutdown ms-be20[39,49,54].codfw.wmnet,thanos-be2003 for PDU swap T310145
  • 16:03 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet with reason: PDU work
  • 16:02 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2036,2049,2054].codfw.wmnet,thanos-be2003.codfw.wmnet with reason: PDU work
  • 16:02 ebysans@deploy1002: Started deploy [analytics/refinery@2553288]: Regular analytics weekly train [analytics/refinery@2553288]
  • 15:50 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2048.codfw.wmnet with reason: T310145
  • 15:50 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2048.codfw.wmnet with reason: T310145
  • 15:43 damilare: payments-wiki upgraded from 0e4a5b3b to 6880236d
  • 15:37 _joe_: uncordoning ml-serve200{1,6}
  • 15:27 sukhe: power off cp2037,cp2038: PDU upgrade
  • 15:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:30:00 on phab2001.codfw.wmnet with reason: PDU swap
  • 15:25 jelto: power off phab2001
  • 15:25 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:30:00 on phab2001.codfw.wmnet with reason: PDU swap
  • 15:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp[2037-2038].codfw.wmnet with reason: shutdown for PDU upgrade
  • 15:24 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp[2037-2038].codfw.wmnet with reason: shutdown for PDU upgrade
  • 15:24 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[78]\.codfw\.wmnet,service=varnish-fe
  • 15:23 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[78]\.codfw\.wmnet,service=ats-be
  • 15:23 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[78]\.codfw\.wmnet,service=ats-tls
  • 15:21 XioNoX: un-drain codfw-ulsfo link - T310310
  • 15:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db[2116,2127,2167-2168].codfw.wmnet,es2022.codfw.wmnet with reason: Maintenance (T310145)
  • 15:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db[2116,2127,2167-2168].codfw.wmnet,es2022.codfw.wmnet with reason: Maintenance (T310145)
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool C6 for PDU maint (T310145)', diff saved to https://phabricator.wikimedia.org/P32285 and previous config saved to /var/cache/conftool/dbconfig/20220804-151958-ladsgroup.json
  • 15:16 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 15:16 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on restbase[2016,2020,2025].codfw.wmnet with reason: PDU maintenance
  • 15:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on restbase[2016,2020,2025].codfw.wmnet with reason: PDU maintenance
  • 15:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db[2114,2126,2166].codfw.wmnet with reason: Maintenance (T310145)
  • 15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 10:00:00 on db[2114,2126,2166].codfw.wmnet with reason: Maintenance (T310145)
  • 15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[12]\.codfw\.wmnet,service=varnish-fe
  • 15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[12]\.codfw\.wmnet,service=ats-be
  • 15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp203[12]\.codfw\.wmnet,service=ats-tls
  • 15:12 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2058,2064].codfw.wmnet
  • 15:12 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be[2058,2064].codfw.wmnet
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool hosts for PDU maint (T310145)', diff saved to https://phabricator.wikimedia.org/P32284 and previous config saved to /var/cache/conftool/dbconfig/20220804-151121-ladsgroup.json
  • 15:09 godog: poweroff logstash2002 - T310145
  • 15:07 _joe_: pwoering down mc203{0,1}
  • 15:07 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on logstash2002.codfw.wmnet with reason: pdu
  • 15:06 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on logstash2002.codfw.wmnet with reason: pdu
  • 15:05 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 14:58 jelto: power off mc20[30-31]
  • 14:56 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc[2030-2031].codfw.wmnet with reason: PDU swap
  • 14:56 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mc[2030-2031].codfw.wmnet with reason: PDU swap
  • 14:56 XioNoX: draining codfw-ulsfo link - T310310
  • 14:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
  • 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
  • 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2025.codfw.wmnet
  • 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2020.codfw.wmnet
  • 14:35 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2016.codfw.wmnet
  • 14:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2011.codfw.wmnet with reason: T310145
  • 14:31 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2011.codfw.wmnet with reason: T310145
  • 14:25 jelto: power off gitlab-runner2003
  • 14:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:30:00 on gitlab-runner2003.codfw.wmnet with reason: PDU swap
  • 14:25 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2001.codfw.wmnet with reason: T310145
  • 14:24 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2001.codfw.wmnet with reason: T310145
  • 14:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 4:30:00 on gitlab-runner2003.codfw.wmnet with reason: PDU swap
  • 14:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2032.codfw.wmnet with reason: T310145
  • 14:22 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2032.codfw.wmnet with reason: T310145
  • 14:22 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on logstash2035.codfw.wmnet with reason: pdu
  • 14:22 godog: poweroff logstash2035 - T310145
  • 14:22 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on logstash2035.codfw.wmnet with reason: pdu
  • 14:21 Emperor: shutdown ms-be20[58,64].codfw.wmnet for PDU swap T310145
  • 14:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:14 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:13 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: Remove unused $wgMathUseRestBase (T274436) (duration: 03m 01s)
  • 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: CommonSettings-labs: Fix usage of $wgSFSValidateIPListLocationMD5 (duration: 02m 51s)
  • 14:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2033.codfw.wmnet with reason: T310145
  • 14:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2033.codfw.wmnet with reason: T310145
  • 14:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:59 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: wikitech: Remove old LDAP config vars (duration: 02m 54s)
  • 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2058,2064].codfw.wmnet with reason: PDU work
  • 13:58 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2058,2064].codfw.wmnet with reason: PDU work
  • 13:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:48 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove unused $wgIncludejQueryMigrate (T280944) (2/2) (duration: 03m 03s)
  • 13:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:45 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused $wgIncludejQueryMigrate (T280944) (1/2) (duration: 02m 58s)
  • 13:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:40 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2066.codfw.wmnet with reason: T310145
  • 13:39 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2066.codfw.wmnet with reason: T310145
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:37 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove unused $wgLegacyJavaScriptGlobals (T72470) (2/2) (duration: 02m 59s)
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove unused $wgLegacyJavaScriptGlobals (T72470) (1/2) (duration: 02m 58s)
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:26 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/SearchSettingsForSDC.php: Config: Remove unused $wgWBCSEnableDispatchingQueryBuilder (duration: 03m 01s)
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:17 taavi@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Remove unused CA P3P config (duration: 03m 09s)
  • 13:14 jbond: intorudce new puppetmaster backends puppetmaster[12]004
  • 13:14 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2065.codfw.wmnet with reason: T310145
  • 13:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2065.codfw.wmnet with reason: T310145
  • 13:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:11 taavi@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: QuickSurveys: Deploy research incentive survey to Bengali wiki (T314333) (duration: 03m 26s)
  • 13:07 moritzm: installing jetty9 security updates
  • 12:48 moritzm: installing Linux 4.19.249 kernels on Buster hosts
  • 12:03 jbond: send sretest100[12] and idp-test2001 to the new puppetmaster[12]004 servers to test
  • 11:46 moritzm: installing Linux 5.10.127-2 kernels on Bullseye hosts
  • 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2017.codfw.wmnet to cluster codfw and group D
  • 11:41 moritzm: installing libpgjava security updates
  • 11:37 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2017.codfw.wmnet to cluster codfw and group D
  • 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
  • 11:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2017.codfw.wmnet with OS bullseye
  • 10:53 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2015.codfw.wmnet
  • 10:53 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2022.codfw.wmnet
  • 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2017.codfw.wmnet with reason: host reimage
  • 10:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2017.codfw.wmnet with reason: host reimage
  • 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2017.codfw.wmnet with OS bullseye
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2017.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 10:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2017.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 10:19 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 9:00:00 on 32 hosts with reason: PDU swap
  • 10:19 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 9:00:00 on 32 hosts with reason: PDU swap
  • 10:03 Lucas_WMDE: stashbot temporarily parted and lost several logs between 9:42 UTC and 9:49 UTC; mainly mwdebug helmfil start/done, also ayounsi sre.deploy.python-code cookbook to cumin1001, cumin2002; see IRC logs
  • 10:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update requirements + wmf-netbox - ayounsi@cumin1001
  • 10:00 jynus: stop db2099 T310145
  • 10:00 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update requirements + wmf-netbox - ayounsi@cumin1001
  • 09:39 jelto: power off mw22[71-79].codfw.wmnet
  • 09:38 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/GrowthExperiments/includes/EventLogging/SpecialEditGrowthConfigLogger.php: ba67dd9: SpecialEditGrowthConfigLogger: Update schema version (T314173, T312148) (duration: 03m 18s)
  • 09:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2177 to s3 T311494', diff saved to https://phabricator.wikimedia.org/P32282 and previous config saved to /var/cache/conftool/dbconfig/20220804-093704-marostegui.json
  • 09:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:35 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ddcd333: testwiki: Growth: Assign enrollasmentor to * (T310905) (duration: 03m 41s)
  • 09:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:32 jelto: set/pooled=inactive mw22[71-79].codfw.wmnet
  • 09:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 9:30:00 on 9 hosts with reason: PDU swap
  • 09:31 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 9:30:00 on 9 hosts with reason: PDU swap
  • 09:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: wmf-netbox.py update - ayounsi@cumin1001
  • 09:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2089.codfw.wmnet
  • 09:26 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0614a39: testwiki: Growth: Switch to structured mentor list (T310905) (duration: 03m 38s)
  • 09:25 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: wmf-netbox.py update - ayounsi@cumin1001
  • 09:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:23 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 09:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2089.codfw.wmnet
  • 09:12 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=kubernetes2022.codfw.wmnet
  • 09:03 oblivian@mwmaint1002: pull aborted: (duration: 00m 06s)
  • 08:58 moritzm: installing gsasl security updates
  • 08:57 oblivian@mwmaint1002: pull aborted: (duration: 00m 18s)
  • 08:48 moritzm: draining ganeti2017 T311686
  • 08:45 jelto: power off kubernetes2022
  • 08:43 oblivian@deploy1002: Synchronized README: testing new scap configuration (duration: 03m 18s)
  • 08:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:22:00 on kubernetes2022.codfw.wmnet with reason: PDU swap
  • 08:38 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 10:22:00 on kubernetes2022.codfw.wmnet with reason: PDU swap
  • 08:37 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2022.codfw.wmnet
  • 08:35 jelto: kubectl drain kubernetes2022.codfw.wmnet
  • 08:32 jelto: kubectl cordon kubernetes2022.codfw.wmnet
  • 08:28 moritzm: imported gsasl 1.8.0-8+wmf1 to stretch-wikimedia
  • 08:26 jelto: power off mc2049 and mc2050
  • 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:36:00 on mc[2049-2050].codfw.wmnet with reason: PDU swap
  • 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 10:36:00 on mc[2049-2050].codfw.wmnet with reason: PDU swap
  • 08:22 oblivian@mwmaint1002: pull aborted: (duration: 00m 11s)
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132, db111, db1127, db1143', diff saved to https://phabricator.wikimedia.org/P32281 and previous config saved to /var/cache/conftool/dbconfig/20220804-081958-root.json
  • 08:19 jelto: power off mc2047 and mc2048
  • 08:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:45:00 on mc[2047-2048].codfw.wmnet with reason: PDU swap
  • 08:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 10:45:00 on mc[2047-2048].codfw.wmnet with reason: PDU swap
  • 08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 08:04 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 07:55 marostegui: Remove grants for 208.80.154.160/208.80.155.109 T314528
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2089 from dbctl T313799', diff saved to https://phabricator.wikimedia.org/P32280 and previous config saved to /var/cache/conftool/dbconfig/20220804-074957-marostegui.json
  • 07:47 godog: grow sda/sdb 3 by 100G on thanos-be2002 - T314275
  • 07:46 godog: grow sda/sdb 3 by 100G on thanos-be1003 - T314275
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2030.codfw.wmnet to cluster codfw and group A
  • 07:29 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2030.codfw.wmnet to cluster codfw and group A
  • 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
  • 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2135,2160].codfw.wmnet with reason: codfw pdu maintenance
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2135,2160].codfw.wmnet with reason: codfw pdu maintenance
  • 07:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
  • 07:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2030.codfw.wmnet to cluster codfw and group A
  • 07:09 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2030.codfw.wmnet to cluster codfw and group A
  • 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es[2023-2025].codfw.wmnet with reason: codfw pdu maintenance
  • 07:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es[2023-2025].codfw.wmnet with reason: codfw pdu maintenance
  • 07:05 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
  • 07:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:58 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 06:58 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 06:58 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 06:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
  • 06:06 _joe_: restarted memcached on mc2038 to pick up the actual production configuration
  • 05:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2030.codfw.wmnet with OS bullseye
  • 05:49 kart_: Updated cxserver to 2022-08-04-022612-production (T313296, T308248)
  • 05:44 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:43 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:42 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2030.codfw.wmnet with reason: host reimage
  • 05:39 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:38 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2030.codfw.wmnet with reason: host reimage
  • 05:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2030.codfw.wmnet with OS bullseye
  • 05:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2030.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 05:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2030.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 04:38 ejegg: payments-wiki upgraded from 712df4ce to 0e4a5b3b
  • 04:29 TimStarling: on mw2377 fiddling with CPU frequency control and doing benchmarks
  • 04:09 krinkle@mwmaint1002: pull aborted: (duration: 00m 05s)
  • 01:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32278 and previous config saved to /var/cache/conftool/dbconfig/20220804-012341-marostegui.json
  • 01:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32277 and previous config saved to /var/cache/conftool/dbconfig/20220804-010834-marostegui.json
  • 00:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P32276 and previous config saved to /var/cache/conftool/dbconfig/20220804-005328-marostegui.json
  • 00:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32275 and previous config saved to /var/cache/conftool/dbconfig/20220804-003822-marostegui.json
  • 00:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32274 and previous config saved to /var/cache/conftool/dbconfig/20220804-003611-marostegui.json
  • 00:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 00:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 00:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32273 and previous config saved to /var/cache/conftool/dbconfig/20220804-003549-marostegui.json
  • 00:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P32272 and previous config saved to /var/cache/conftool/dbconfig/20220804-002043-marostegui.json
  • 00:06 mutante: gerrit - [2022-08-04 00:05:33,173] Replication to gerrit2@gerrit2002.wikimedia.org:/srv/gerrit/git/analytics/geowiki.git started.. T313250
  • 00:06 mutante: gerrit - [2022-08-04 00:05:33,173] Replication to gerrit2@gerrit2002.wikimedia.org:/srv/gerrit/git/analytics/geowiki.git started... [CONTEXT pushOneId="83ad5008" ]
  • 00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P32271 and previous config saved to /var/cache/conftool/dbconfig/20220804-000536-marostegui.json
  • 00:03 mutante: gerrit - service restart to deploy config change to add second replica T313250
  • 00:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit.wikimedia.org with reason: service restart
  • 00:00 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit.wikimedia.org with reason: service restart
  • 00:00 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: service restart

2022-08-03

  • 23:59 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: service restart
  • 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32270 and previous config saved to /var/cache/conftool/dbconfig/20220803-235030-marostegui.json
  • 22:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32269 and previous config saved to /var/cache/conftool/dbconfig/20220803-225015-marostegui.json
  • 22:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 9 hosts with reason: Maintenance
  • 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 9 hosts with reason: Maintenance
  • 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312972)', diff saved to https://phabricator.wikimedia.org/P32268 and previous config saved to /var/cache/conftool/dbconfig/20220803-224827-marostegui.json
  • 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32267 and previous config saved to /var/cache/conftool/dbconfig/20220803-223321-marostegui.json
  • 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P32266 and previous config saved to /var/cache/conftool/dbconfig/20220803-221815-marostegui.json
  • 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T312972)', diff saved to https://phabricator.wikimedia.org/P32265 and previous config saved to /var/cache/conftool/dbconfig/20220803-220309-marostegui.json
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T312972)', diff saved to https://phabricator.wikimedia.org/P32264 and previous config saved to /var/cache/conftool/dbconfig/20220803-220057-marostegui.json
  • 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 22:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32263 and previous config saved to /var/cache/conftool/dbconfig/20220803-220007-marostegui.json
  • 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32262 and previous config saved to /var/cache/conftool/dbconfig/20220803-214501-marostegui.json
  • 21:44 damilare: payments-wiki updated from e1b6036a to 712df4ce
  • 21:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster plugin upgrade - ryankemper@cumin1001 - T314078
  • 21:35 ryankemper@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 21:35 ryankemper@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 21:30 ryankemper@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 21:30 ryankemper@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 21:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P32261 and previous config saved to /var/cache/conftool/dbconfig/20220803-212955-marostegui.json
  • 21:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32260 and previous config saved to /var/cache/conftool/dbconfig/20220803-211449-marostegui.json
  • 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T312972)', diff saved to https://phabricator.wikimedia.org/P32259 and previous config saved to /var/cache/conftool/dbconfig/20220803-211237-marostegui.json
  • 21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312972)', diff saved to https://phabricator.wikimedia.org/P32258 and previous config saved to /var/cache/conftool/dbconfig/20220803-211216-marostegui.json
  • 21:03 ejegg: updated standalone SmashPig deployment from 8e8f0017 to 9b97ea15
  • 21:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P32257 and previous config saved to /var/cache/conftool/dbconfig/20220803-205710-marostegui.json
  • 20:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:55 ebernhardson@deploy1002: Synchronized wmf-config/CirrusSearch-production.php: Config: cirrus: Set ElasticaWrite partition count for cloudelastic to 3 (duration: 03m 29s)
  • 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:54 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:43 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/VisualEditor/includes/VisualEditorParsoidClient.php: a804fe1: Update call to PageConfigFactory::create to use new signature (T314523) (duration: 03m 25s)
  • 20:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P32256 and previous config saved to /var/cache/conftool/dbconfig/20220803-204204-marostegui.json
  • 20:39 urbanecm@deploy1002: sync-file aborted: a804fe1: Update call to PageConfigFactory::create to use new signature (T314523Ăş (duration: 00m 00s)
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:36 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/DiscussionTools/: b840eef: Fix ReplyLinksController#teardown (duration: 03m 27s)
  • 20:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:31 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/CirrusSearch/: 70a18f5: Add explicit partitioning key to ElasticaWrite (T314426) (duration: 03m 13s)
  • 20:28 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:28 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/CirrusSearch/: 9961e9b: Add explicit partitioning key to ElasticaWrite (T314426) (duration: 03m 23s)
  • 20:28 cwhite@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host logstash2032.codfw.wmnet
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T312972)', diff saved to https://phabricator.wikimedia.org/P32255 and previous config saved to /var/cache/conftool/dbconfig/20220803-202658-marostegui.json
  • 20:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T312972)', diff saved to https://phabricator.wikimedia.org/P32254 and previous config saved to /var/cache/conftool/dbconfig/20220803-202146-marostegui.json
  • 20:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 20:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 20:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312972)', diff saved to https://phabricator.wikimedia.org/P32253 and previous config saved to /var/cache/conftool/dbconfig/20220803-202125-marostegui.json
  • 20:14 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 20:13 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 195f809: Start writing to cuc_actor on test wikis (T233004) (duration: 03m 27s)
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:08 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) logstash2032.codfw.wmnet on all recursors
  • 20:08 cwhite@cumin2002: START - Cookbook sre.dns.wipe-cache logstash2032.codfw.wmnet on all recursors
  • 20:08 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:07 mutante: gerrit - adding second replica T313250
  • 20:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32252 and previous config saved to /var/cache/conftool/dbconfig/20220803-200619-marostegui.json
  • 20:04 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 20:03 cwhite@cumin2002: START - Cookbook sre.ganeti.makevm for new host logstash2032.codfw.wmnet
  • 20:00 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2012.codfw.wmnet
  • 20:00 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2012.codfw.wmnet
  • 20:00 rzl@deploy1002: conftool action : set/pooled=yes; selector: name=kubernetes2012.codfw.wmnet
  • 19:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P32251 and previous config saved to /var/cache/conftool/dbconfig/20220803-195113-marostegui.json
  • 19:40 ryankemper: T314078 Forgot to mention, restart is at `ryankemper@cumin1001` tmux session `codfw_restarts`
  • 19:39 ryankemper: T314078 Rolling upgrade of codfw hosts; after this all of eqiad/codfw will have the new plugin version and we can resume the `search-loader` instances: `sudo -E cookbook sre.elasticsearch.rolling-operation search_codfw "codfw cluster plugin upgrade" --upgrade --nodes-per-run 3 --start-datetime 2022-08-03T19:38:10 --task-id T314078`
  • 19:38 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster plugin upgrade - ryankemper@cumin1001 - T314078
  • 19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T312972)', diff saved to https://phabricator.wikimedia.org/P32250 and previous config saved to /var/cache/conftool/dbconfig/20220803-193607-marostegui.json
  • 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T312972)', diff saved to https://phabricator.wikimedia.org/P32249 and previous config saved to /var/cache/conftool/dbconfig/20220803-193354-marostegui.json
  • 19:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 19:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T312972)', diff saved to https://phabricator.wikimedia.org/P32248 and previous config saved to /var/cache/conftool/dbconfig/20220803-193334-marostegui.json
  • 19:25 mutante: gerrit1001 - rsyncing /var/lib/gerrit/review_site/ over to gerrit2002 815401
  • 19:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32247 and previous config saved to /var/cache/conftool/dbconfig/20220803-191828-marostegui.json
  • 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P32246 and previous config saved to /var/cache/conftool/dbconfig/20220803-190321-marostegui.json
  • 18:56 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes2011.codfw.wmnet
  • 18:56 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes2011.codfw.wmnet
  • 18:56 rzl@deploy1002: conftool action : set/pooled=yes; selector: name=kubernetes2011.codfw.wmnet
  • 18:33 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2027,2037].codfw.wmnet
  • 18:33 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2027,2037].codfw.wmnet
  • 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:16 dancy@deploy1002: Synchronized php: group1 wikis to 1.39.0-wmf.23 refs T308076 (duration: 03m 37s)
  • 18:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:12 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.39.0-wmf.23 refs T308076
  • 17:58 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubestage2002.codfw.wmnet
  • 17:58 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubestage2002.codfw.wmnet
  • 17:57 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc[2025-2026].codfw.wmnet
  • 17:57 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc[2025-2026].codfw.wmnet
  • 17:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2044.codfw.wmnet
  • 17:57 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2044.codfw.wmnet
  • 17:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for elastic2043.codfw.wmnet
  • 17:56 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for elastic2043.codfw.wmnet
  • 17:55 ottomata: increasing partitions from 5 to 6 for *.cpjobqueue.partitioned.mediawiki.job.cirrusSearchElasticaWrite topics in Kafka main-eqiad and main-codfw - T314426
  • 17:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be2055.codfw.wmnet
  • 17:55 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be2055.codfw.wmnet
  • 17:50 rzl@cumin1001: conftool action : set/pooled=yes; selector: name=kubestage2002.codfw.wmnet
  • 17:38 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse[2008-2010].codfw.wmnet
  • 17:38 rzl@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse[2008-2010].codfw.wmnet
  • 17:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase20[12]4.codfw.wmnet
  • 17:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
  • 17:14 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for 6 hosts
  • 17:08 ryankemper: T310145 `elastic2031` and `wcqs2002` powered off in preparation for C1 maintenance
  • 17:06 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=(kubernetes2020.codfw.wmnet|kubernetes2009.codfw.wmnet|kubernetes2010.codfw.wmnet)
  • 17:00 btullis@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 16:48 Emperor: shutdown moss-fe2001.codfw.wmnet,ms-fe2011.codfw.wmnet,ms-be20[34,35,42,48,55,68].codfw.wmnet PDU work T310145
  • 16:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 8 hosts with reason: PDU work
  • 16:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping
  • 16:47 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 8 hosts with reason: PDU work
  • 16:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: in setup / flapping
  • 16:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet
  • 16:46 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet
  • 16:40 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2046.codfw.wmnet
  • 16:40 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2046.codfw.wmnet
  • 16:39 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 10 hosts
  • 16:39 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 10 hosts
  • 16:38 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2023.codfw.wmnet
  • 16:38 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for mc2023.codfw.wmnet
  • 16:37 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap
  • 16:37 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on gitlab-runner2002.codfw.wmnet with reason: PDU swap
  • 16:35 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap
  • 16:35 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc[2025-2026].codfw.wmnet with reason: PDU swap
  • 16:32 jelto: power off mc2025-2026
  • 16:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for rdb2008.codfw.wmnet
  • 16:30 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for rdb2008.codfw.wmnet
  • 16:28 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 16:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2009-2010,2020].codfw.wmnet
  • 16:27 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for kubernetes[2009-2010,2020].codfw.wmnet
  • 16:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 12 hosts
  • 16:11 jelto@cumin1001: START - Cookbook sre.hosts.remove-downtime for 12 hosts
  • 16:08 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts
  • 16:08 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for 15 hosts
  • 16:08 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs[2005-2008].codfw.wmnet
  • 16:08 mvernon@cumin1001: START - Cookbook sre.hosts.remove-downtime for aqs[2005-2008].codfw.wmnet
  • 15:59 Emperor: shutdown ms-be20[33,47],thanos-be2002 prior to PDU work T310070
  • 15:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet with reason: PDU work
  • 15:58 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ms-be[2033,2047].codfw.wmnet,thanos-be2002.codfw.wmnet with reason: PDU work
  • 15:52 jelto: pooling mw2259-2270 again
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T312972)', diff saved to https://phabricator.wikimedia.org/P32242 and previous config saved to /var/cache/conftool/dbconfig/20220803-154515-marostegui.json
  • 15:38 vgutierrez: clearing ats-be cache on cp6008 - T309651
  • 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:36 elukey: powercycle kafka-logging2003 - not responsive to serial console
  • 15:36 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/GrowthExperiments/includes/NewcomerTasks/AddImage/ServiceImageRecommendationProvider.php: 4438957: ServiceImageRecommendationProvider: Add extra logging when no JSON response received (T313973) (duration: 03m 04s)
  • 15:35 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: PDU maintenance
  • 15:35 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: PDU maintenance
  • 15:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
  • 15:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase2024.codfw.wmnet with reason: PDU maintenance
  • 15:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase2024.codfw.wmnet with reason: PDU maintenance
  • 15:32 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2024.codfw.wmnet
  • 15:30 vgutierrez: clearing ats-be cache on cp6016 - T309651
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32241 and previous config saved to /var/cache/conftool/dbconfig/20220803-153009-marostegui.json
  • 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.eqsin.wmnet on all recursors
  • 15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.eqsin.wmnet on all recursors
  • 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.ulsfo.wmnet on all recursors
  • 15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.ulsfo.wmnet on all recursors
  • 15:24 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.codfw.wmnet on all recursors
  • 15:24 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.codfw.wmnet on all recursors
  • 15:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase2021.codfw.wmnet
  • 15:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2030.codfw.wmnet with reason: T310070
  • 15:19 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2030.codfw.wmnet with reason: T310070
  • 15:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P32240 and previous config saved to /var/cache/conftool/dbconfig/20220803-151502-marostegui.json
  • 15:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2004.codfw.wmnet
  • 15:10 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2004.codfw.wmnet
  • 15:04 jelto: power off mc2023
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T312972)', diff saved to https://phabricator.wikimedia.org/P32239 and previous config saved to /var/cache/conftool/dbconfig/20220803-145956-marostegui.json
  • 14:59 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mc2023.codfw.wmnet with reason: PDU swap
  • 14:59 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on mc2023.codfw.wmnet with reason: PDU swap
  • 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T312972)', diff saved to https://phabricator.wikimedia.org/P32238 and previous config saved to /var/cache/conftool/dbconfig/20220803-145849-marostegui.json
  • 14:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 14:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 14:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T312972)', diff saved to https://phabricator.wikimedia.org/P32237 and previous config saved to /var/cache/conftool/dbconfig/20220803-145828-marostegui.json
  • 14:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:53 dancy@deploy1002: Pruned MediaWiki: 1.39.0-wmf.19 (duration: 05m 37s)
  • 14:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:47 dancy@deploy1002: Pruned MediaWiki: 1.39.0-wmf.21 (duration: 06m 13s)
  • 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P32236 and previous config saved to /var/cache/conftool/dbconfig/20220803-144322-marostegui.json
  • 14:34 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2029.codfw.wmnet with reason: T310070
  • 14:33 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2029.codfw.wmnet with reason: T310070
  • 14:32 Emperor: shutdown aqs200[5-8] prior to PDU work T310070
  • 14:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs[2005-2008].codfw.wmnet with reason: PDU work
  • 14:31 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on thumbor[2003-2004].codfw.wmnet with reason: PDU swap
  • 14:31 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs[2005-2008].codfw.wmnet with reason: PDU work
  • 14:31 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on thumbor[2003-2004].codfw.wmnet with reason: PDU swap
  • 14:28 jelto: power off thumbor2003 and thumbor2004
  • 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P32235 and previous config saved to /var/cache/conftool/dbconfig/20220803-142816-marostegui.json
  • 14:27 moritzm: upgrading ganeti/esams to Ganeti 3.0.2 T312637
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T312972)', diff saved to https://phabricator.wikimedia.org/P32234 and previous config saved to /var/cache/conftool/dbconfig/20220803-141310-marostegui.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1109 (T312972)', diff saved to https://phabricator.wikimedia.org/P32233 and previous config saved to /var/cache/conftool/dbconfig/20220803-141103-marostegui.json
  • 14:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 14:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32232 and previous config saved to /var/cache/conftool/dbconfig/20220803-141042-marostegui.json
  • 14:06 moritzm: installing freetype security updates on bullseye
  • 13:57 cdanis: ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕙☕ sudo cumin 'P{R:Class = Confd}' 'systemctl restart confd'
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32231 and previous config saved to /var/cache/conftool/dbconfig/20220803-135536-marostegui.json
  • 13:46 cdanis: ✔️ cdanis@deploy1002.eqiad.wmnet ~ 🕙☕ sudo systemctl restart confd
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P32230 and previous config saved to /var/cache/conftool/dbconfig/20220803-134030-marostegui.json
  • 13:30 moritzm: installing Java 8 security updates for Buster
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32229 and previous config saved to /var/cache/conftool/dbconfig/20220803-132524-marostegui.json
  • 13:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32228 and previous config saved to /var/cache/conftool/dbconfig/20220803-131916-marostegui.json
  • 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T312972)', diff saved to https://phabricator.wikimedia.org/P32227 and previous config saved to /var/cache/conftool/dbconfig/20220803-131855-marostegui.json
  • 13:18 sukhe: depool codfw for PDU upgrade: CR 819798
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:16 urbanecm@deploy1002: Synchronized wmf-config/MetaContactPages.php: f89f02e: Amend license request contact form per Legal (T303359) (duration: 09m 27s)
  • 13:12 jbond: introduce puppetmaster[12]004 for now as offline
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on kafka-logging2003.codfw.wmnet with reason: pdu
  • 13:09 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on kafka-logging2003.codfw.wmnet with reason: pdu
  • 13:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2044.codfw.wmnet with reason: T310070
  • 13:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2044.codfw.wmnet with reason: T310070
  • 13:04 pt1979@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32226 and previous config saved to /var/cache/conftool/dbconfig/20220803-130348-marostegui.json
  • 12:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2043.codfw.wmnet with reason: T310070
  • 12:59 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2043.codfw.wmnet with reason: T310070
  • 12:56 pt1979@cumin1001: START - Cookbook sre.dns.netbox
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P32224 and previous config saved to /var/cache/conftool/dbconfig/20220803-124842-marostegui.json
  • 12:40 moritzm: uploaded openjdk-8 8u342-b07-1~deb10u1 to component/jdk8 for buster-wikimedia (rebuild of latest Java 8 security update)
  • 12:36 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:36 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T312972)', diff saved to https://phabricator.wikimedia.org/P32223 and previous config saved to /var/cache/conftool/dbconfig/20220803-123336-marostegui.json
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T312972)', diff saved to https://phabricator.wikimedia.org/P32222 and previous config saved to /var/cache/conftool/dbconfig/20220803-122929-marostegui.json
  • 12:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 12:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 12:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 12:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T312972)', diff saved to https://phabricator.wikimedia.org/P32221 and previous config saved to /var/cache/conftool/dbconfig/20220803-122819-marostegui.json
  • 12:16 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@614f7b2]: (no justification provided) (duration: 00m 11s)
  • 12:16 ebysans@deploy1002: Started deploy [airflow-dags/analytics@614f7b2]: (no justification provided)
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32220 and previous config saved to /var/cache/conftool/dbconfig/20220803-121313-marostegui.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P32219 and previous config saved to /var/cache/conftool/dbconfig/20220803-115807-marostegui.json
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2176 to s1 T311494', diff saved to https://phabricator.wikimedia.org/P32218 and previous config saved to /var/cache/conftool/dbconfig/20220803-115706-marostegui.json
  • 11:49 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cumin2002.codfw.wmnet with reason: PDU maintenance, T310145
  • 11:49 root@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cumin2002.codfw.wmnet with reason: PDU maintenance, T310145
  • 11:46 jayme@cumin1001: conftool action : set/weight=10; selector: name=(kubernetes2019.codfw.wmnet|kubernetes2021.codfw.wmnet|kubernetes2022.codfw.wmnet|kubernetes2018.codfw.wmnet|kubernetes2020.codfw.wmnet)
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T312972)', diff saved to https://phabricator.wikimedia.org/P32217 and previous config saved to /var/cache/conftool/dbconfig/20220803-114301-marostegui.json
  • 11:41 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=(kubernetes2020.codfw.wmnet|kubernetes2009.codfw.wmnet|kubernetes2010.codfw.wmnet|kubernetes2011.codfw.wmnet|kubernetes2012.codfw.wmnet|kubestage2002.codfw.wmnet)
  • 11:38 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host restbase2022.codfw.wmnet
  • 11:37 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2022.codfw.wmnet
  • 11:35 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:32 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 11:26 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=wdqs
  • 11:22 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=kartotherian
  • 11:22 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-backend
  • 11:21 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=restbase-async
  • 11:17 _joe_: depooling codfw services from all traffic
  • 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2011.codfw.wmnet to cluster codfw and group C
  • 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2011.codfw.wmnet to cluster codfw and group C
  • 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
  • 10:47 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubestage2002.codfw.wmnet with reason: PDU swap
  • 10:46 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubestage2002.codfw.wmnet with reason: PDU swap
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T312972)', diff saved to https://phabricator.wikimedia.org/P32216 and previous config saved to /var/cache/conftool/dbconfig/20220803-104246-marostegui.json
  • 10:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T312972)', diff saved to https://phabricator.wikimedia.org/P32215 and previous config saved to /var/cache/conftool/dbconfig/20220803-104224-marostegui.json
  • 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
  • 10:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase201[45].codfw.wmnet
  • 10:38 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2022.codfw.wmnet
  • 10:38 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on restbase[2014-2015,2021-2022].codfw.wmnet with reason: PDU maintenance
  • 10:38 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on restbase[2014-2015,2021-2022].codfw.wmnet with reason: PDU maintenance
  • 10:37 jelto: shutdown kubestage2002 kubernetes2020 kubernetes2009 kubernetes2010 kubernetes2011 kubernetes2012
  • 10:30 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
  • 10:30 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
  • 10:29 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
  • 10:29 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
  • 10:27 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
  • 10:27 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
  • 10:27 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
  • 10:27 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32213 and previous config saved to /var/cache/conftool/dbconfig/20220803-102718-marostegui.json
  • 10:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2012.codfw.wmnet
  • 10:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2011.codfw.wmnet
  • 10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2010.codfw.wmnet
  • 10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2009.codfw.wmnet
  • 10:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2020.codfw.wmnet
  • 10:20 jelto@cumin1001: conftool action : set/pooled=no; selector: name=kubestage2002.codfw.wmnet
  • 10:14 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) proton.discovery.wmnet on all recursors
  • 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2011.codfw.wmnet with OS bullseye
  • 10:14 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache proton.discovery.wmnet on all recursors
  • 10:14 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
  • 10:14 oblivian@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P32212 and previous config saved to /var/cache/conftool/dbconfig/20220803-101212-marostegui.json
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T312972)', diff saved to https://phabricator.wikimedia.org/P32211 and previous config saved to /var/cache/conftool/dbconfig/20220803-095706-marostegui.json
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2011.codfw.wmnet with reason: host reimage
  • 09:56 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase2021.codfw.wmnet
  • 09:56 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2012.codfw.wmnet
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T312972)', diff saved to https://phabricator.wikimedia.org/P32210 and previous config saved to /var/cache/conftool/dbconfig/20220803-095559-marostegui.json
  • 09:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 09:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32209 and previous config saved to /var/cache/conftool/dbconfig/20220803-095538-marostegui.json
  • 09:55 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=restbase2027.codfw.wmnet
  • 09:54 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2011.codfw.wmnet
  • 09:54 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:54 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:54 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2011.codfw.wmnet with reason: host reimage
  • 09:52 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2010.codfw.wmnet
  • 09:50 jelto: kubectl drain --ignore-daemonsets --delete-local-data kubernetes2009.codfw.wmnet
  • 09:49 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 49 hosts with reason: PDU swap
  • 09:48 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 49 hosts with reason: PDU swap
  • 09:47 jelto: kubectl drain --ignore-daemonsets kubernetes2020.codfw.wmnet
  • 09:46 jelto: kubectl cordon kubernetes2020.codfw.wmnet kubernetes2009.codfw.wmnet kubernetes2010.codfw.wmnet kubernetes2011.codfw.wmnet kubernetes2012.codfw.wmnet
  • 09:43 jelto: kubectl drain --ignore-daemonsets kubestage2002.codfw.wmnet
  • 09:43 vgutierrez: rolling restart of pybal in codfw lvs instances - T310070
  • 09:42 jelto: kubectl cordon kubestage2002
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32208 and previous config saved to /var/cache/conftool/dbconfig/20220803-094032-marostegui.json
  • 09:35 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2011.codfw.wmnet with OS bullseye
  • 09:34 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@674bb8b]: (no justification provided) (duration: 00m 10s)
  • 09:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2090.codfw.wmnet
  • 09:33 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:33 ebysans@deploy1002: Started deploy [airflow-dags/analytics@674bb8b]: (no justification provided)
  • 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 09:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2011.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 09:29 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 09:25 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2090.codfw.wmnet
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P32207 and previous config saved to /var/cache/conftool/dbconfig/20220803-092525-marostegui.json
  • 09:24 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:24 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:24 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:23 oblivian@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 09:23 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:22 oblivian@cumin1001: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99)
  • 09:22 oblivian@cumin1001: START - Cookbook sre.discovery.service-route
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2090 from dbctl T314109', diff saved to https://phabricator.wikimedia.org/P32206 and previous config saved to /var/cache/conftool/dbconfig/20220803-092053-marostegui.json
  • 09:20 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
  • 09:15 jelto: power on mc2024
  • 09:10 XioNoX: configure BGP on the esams-drmrs link - T307221
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32205 and previous config saved to /var/cache/conftool/dbconfig/20220803-091019-marostegui.json
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T312972)', diff saved to https://phabricator.wikimedia.org/P32204 and previous config saved to /var/cache/conftool/dbconfig/20220803-090912-marostegui.json
  • 09:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 09:08 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2031.codfw.wmnet
  • 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T312972)', diff saved to https://phabricator.wikimedia.org/P32203 and previous config saved to /var/cache/conftool/dbconfig/20220803-090836-marostegui.json
  • 09:07 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2032.codfw.wmnet
  • 09:06 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2042.codfw.wmnet
  • 09:05 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
  • 09:04 jynus: stop backup2006 backup2009 for T310070
  • 09:00 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc2024.codfw.wmnet
  • 09:00 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
  • 08:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2031.codfw.wmnet
  • 08:59 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp2032.codfw.wmnet
  • 08:58 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
  • 08:58 jelto@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc2024.codfw.wmnet
  • 08:58 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2024.codfw.wmnet
  • 08:57 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
  • 08:57 oblivian@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
  • 08:54 XioNoX: put the esams-drmrs link in service - T307221
  • 08:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32202 and previous config saved to /var/cache/conftool/dbconfig/20220803-085330-marostegui.json
  • 08:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 oblivian@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
  • 08:49 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 08:47 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 08:41 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P32201 and previous config saved to /var/cache/conftool/dbconfig/20220803-083824-marostegui.json
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T312972)', diff saved to https://phabricator.wikimedia.org/P32200 and previous config saved to /var/cache/conftool/dbconfig/20220803-082318-marostegui.json
  • 08:19 jynus: stop db2098 for T310070
  • 08:17 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers|api)-ro,name=codfw
  • 08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2072.codfw.wmnet
  • 08:15 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:54 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 07:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2072.codfw.wmnet
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2072 from dbctl T313911', diff saved to https://phabricator.wikimedia.org/P32199 and previous config saved to /var/cache/conftool/dbconfig/20220803-074806-marostegui.json
  • 07:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T312972)', diff saved to https://phabricator.wikimedia.org/P32197 and previous config saved to /var/cache/conftool/dbconfig/20220803-072253-marostegui.json
  • 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T312972)', diff saved to https://phabricator.wikimedia.org/P32196 and previous config saved to /var/cache/conftool/dbconfig/20220803-072214-marostegui.json
  • 07:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
  • 07:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
  • 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2134,2160].codfw.wmnet with reason: codfw pdu maintenance
  • 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2134,2160].codfw.wmnet with reason: codfw pdu maintenance
  • 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: codfw pdu maintenance
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es[2020-2022].codfw.wmnet with reason: codfw pdu maintenance
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es[2020-2022].codfw.wmnet with reason: codfw pdu maintenance
  • 07:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: codfw pdu maintenance
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: codfw pdu maintenance
  • 07:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 07:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: codfw pdu maintenance
  • 07:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 07:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db[2096,2101,2115,2131].codfw.wmnet with reason: codfw pdu maintenance
  • 07:11 kartik@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: CX: Set MT threshold for publishing in Armenian WP to 80% (T313208) (duration: 03m 49s)
  • 07:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32195 and previous config saved to /var/cache/conftool/dbconfig/20220803-070708-marostegui.json
  • 07:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 07:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubestagetcd2002.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 07:00 moritzm: draining ganeti2011 T311686
  • 06:56 godog: grow sda/sdb 3 by 100G on thanos-be2003 - T314275
  • 06:56 godog: grow sda/sdb 3 by 100G on thanos-be1002 - T314275
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P32194 and previous config saved to /var/cache/conftool/dbconfig/20220803-065202-marostegui.json
  • 06:46 godog: power up centrallog2002 and prometheus2005 - T310070
  • 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2013.codfw.wmnet to cluster codfw and group C
  • 06:37 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2013.codfw.wmnet to cluster codfw and group C
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T312972)', diff saved to https://phabricator.wikimedia.org/P32193 and previous config saved to /var/cache/conftool/dbconfig/20220803-063656-marostegui.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T312972)', diff saved to https://phabricator.wikimedia.org/P32192 and previous config saved to /var/cache/conftool/dbconfig/20220803-063148-marostegui.json
  • 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 13 hosts with reason: Maintenance
  • 06:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 13 hosts with reason: Maintenance
  • 06:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T312972)', diff saved to https://phabricator.wikimedia.org/P32191 and previous config saved to /var/cache/conftool/dbconfig/20220803-063045-marostegui.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P32190 and previous config saved to /var/cache/conftool/dbconfig/20220803-061538-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P32189 and previous config saved to /var/cache/conftool/dbconfig/20220803-060032-marostegui.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T312972)', diff saved to https://phabricator.wikimedia.org/P32188 and previous config saved to /var/cache/conftool/dbconfig/20220803-054526-marostegui.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T312972)', diff saved to https://phabricator.wikimedia.org/P32187 and previous config saved to /var/cache/conftool/dbconfig/20220803-054106-marostegui.json
  • 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 05:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 05:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance

2022-08-02

  • 22:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 22:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 22:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 22:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 22:15 mutante: gerrit - syncing data (/srv/gerrit /var/lib/gerrit2/review_site /home) again after gerrit2002 was reimaged with buster T313250 T313972
  • 22:04 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 06s)
  • 22:04 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
  • 22:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23 refs T308076
  • 21:53 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:29 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.23/extensions/CirrusSearch/includes/Sanity/Checker.php: Backport: Fix appending of join conds (T312421 T314439) (duration: 03m 15s)
  • 21:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:27 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: deploy wmf-elasticsearch-search-plugins pkg - bking@cumin1001 - T314078
  • 21:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS buster
  • 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:58 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.22 refs T308076
  • 20:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:53 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
  • 20:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:51 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
  • 20:50 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.39.0-wmf.23 refs T308076
  • 20:38 mutante: re-imaging gerrit2002 with buster - because it's on bullseye, needs git-fat and that has not been ported to python3 yet which blocks upgrading gerrit machines otherwise T313250 T243027 T279509
  • 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:36 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS buster
  • 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:36 urbanecm: UTC evening B&C window done
  • 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:33 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/includes/Rest/Handler/HTMLTransformInput.php: 69e9152: ParsoidHandler: fix page bundle input with no orig HTML (duration: 03m 22s)
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:29 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.23/includes/Rest/Handler/ParsoidHandler.php: 322a960: ParsoidHandler: pass metrics object to HTMLTransformInput (duration: 03m 19s)
  • 20:27 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:27 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5fac0aa: GrowthExperiments: Remove wgGEHomepageTutorialTitle (duration: 03m 26s)
  • 20:06 dancy@deploy1002: Finished scap: Backport for gerrit:819612 Revert "Bump wikimedia/parsoid to 0.16.0-a18" (duration: 11m 30s)
  • 20:01 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 05s)
  • 20:01 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
  • 19:59 dancy@deploy1002: Finished deploy [gerrit/gerrit@94c5028]: (no justification provided) (duration: 00m 01s)
  • 19:59 dancy@deploy1002: Started deploy [gerrit/gerrit@94c5028]: (no justification provided)
  • 19:55 dancy@deploy1002: Started scap: Backport for gerrit:819612 Revert "Bump wikimedia/parsoid to 0.16.0-a18"
  • 19:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-tls
  • 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=varnish-fe
  • 19:37 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2034.codfw.wmnet,service=ats-be
  • 19:36 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-tls
  • 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=varnish-fe
  • 19:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2033.codfw.wmnet,service=ats-be
  • 19:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2041,2046].codfw.wmnet
  • 19:35 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-be[2041,2046].codfw.wmnet
  • 19:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for thanos-fe2002.codfw.wmnet
  • 19:28 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for thanos-fe2002.codfw.wmnet
  • 19:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-fe2010.codfw.wmnet
  • 19:26 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-fe2010.codfw.wmnet
  • 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=ats-tls
  • 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=varnish-fe
  • 19:21 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2032.codfw.wmnet,service=ats-be
  • 19:17 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mc2038.codfw.wmnet with reason: install
  • 19:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mc2038.codfw.wmnet with reason: install
  • 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-tls
  • 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=varnish-fe
  • 19:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet,service=ats-be
  • 19:11 mutante: gerrit1001 - rsyncing /home/ to gerrit2002:/srv/home-gerrit1001.wikimedia.org T313250
  • 19:01 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: new machine
  • 19:01 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit2002.wikimedia.org with reason: new machine
  • 18:55 dancy@deploy1002: Finished scap: testwikis wikis to 1.39.0-wmf.23 refs T308076 (duration: 50m 39s)
  • 18:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:52 ejegg: updated payments-wiki from 589bb64e to e1b6036a (just i18n changes in extensions)
  • 18:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:46 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (3 nodes at a time) for ElasticSearch cluster search_eqiad: deploy wmf-elasticsearch-search-plugins pkg - bking@cumin1001 - T314078
  • 18:46 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mc2038.codfw.wmnet with reason: install
  • 18:45 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on mc2038.codfw.wmnet with reason: install
  • 18:41 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mc2038.codfw.wmnet
  • 18:41 rzl@cumin2002: START - Cookbook sre.hosts.remove-downtime for mc2038.codfw.wmnet
  • 18:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:18 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2038.codfw.wmnet with reason: install
  • 18:18 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2038.codfw.wmnet with reason: install
  • 18:17 rzl@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw2038.codfw.wmnet with reason: install
  • 18:17 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2038.codfw.wmnet with reason: install
  • 18:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: shutdown for PDU upgrade
  • 18:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: shutdown for PDU upgrade
  • 18:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:04 dancy@deploy1002: Started scap: testwikis wikis to 1.39.0-wmf.23 refs T308076
  • 17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312972)', diff saved to https://phabricator.wikimedia.org/P32185 and previous config saved to /var/cache/conftool/dbconfig/20220802-175233-marostegui.json
  • 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2159', diff saved to https://phabricator.wikimedia.org/P32184 and previous config saved to /var/cache/conftool/dbconfig/20220802-174311-ladsgroup.json
  • 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P32183 and previous config saved to /var/cache/conftool/dbconfig/20220802-173723-marostegui.json
  • 17:35 moritzm: installing node-moment security updates
  • 17:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic[2041-2042,2057].codfw.wmnet with reason: T310070
  • 17:32 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic[2041-2042,2057].codfw.wmnet with reason: T310070
  • 17:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
  • 17:25 moritzm: installing fribidi security updates
  • 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P32182 and previous config saved to /var/cache/conftool/dbconfig/20220802-172217-marostegui.json
  • 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-tls
  • 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=varnish-fe
  • 17:20 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2030.codfw.wmnet,service=ats-be
  • 17:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
  • 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T312972)', diff saved to https://phabricator.wikimedia.org/P32181 and previous config saved to /var/cache/conftool/dbconfig/20220802-170711-marostegui.json
  • 17:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc[2042-2043].codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc[2042-2043].codfw.wmnet with reason: shutdown for PDU upgrade
  • 17:05 Emperor: ms-be20[31,32,41,46].codfw.wmnet,ms-fe2010.codfw.wmnet,thanos-fe2002.codfw.wmnet downtime for PDU work T309957
  • 17:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T312972)', diff saved to https://phabricator.wikimedia.org/P32180 and previous config saved to /var/cache/conftool/dbconfig/20220802-170503-marostegui.json
  • 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 17:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: shutdown for PDU replacement
  • 17:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 17:04 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 6 hosts with reason: shutdown for PDU replacement
  • 17:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 8 hosts with reason: Maintenance
  • 17:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 8 hosts with reason: Maintenance
  • 17:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 17:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312972)', diff saved to https://phabricator.wikimedia.org/P32179 and previous config saved to /var/cache/conftool/dbconfig/20220802-170333-marostegui.json
  • 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-tls
  • 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=varnish-fe
  • 17:01 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet,service=ats-be
  • 17:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be[2030,2045,2052].codfw.wmnet
  • 17:00 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-be[2030,2045,2052].codfw.wmnet
  • 16:57 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1004.eqiad.wmnet
  • 16:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P32178 and previous config saved to /var/cache/conftool/dbconfig/20220802-164827-marostegui.json
  • 16:38 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:35 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:35 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P32177 and previous config saved to /var/cache/conftool/dbconfig/20220802-163321-marostegui.json
  • 16:29 dancy@mwmaint1002: pull aborted: (duration: 00m 07s)
  • 16:25 rzl: rzl@stat1007:~$ sudo systemctl stop wmde-analytics-daily-early # wedged, timer will restart it now with max_runtime_seconds
  • 16:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T312972)', diff saved to https://phabricator.wikimedia.org/P32176 and previous config saved to /var/cache/conftool/dbconfig/20220802-161815-marostegui.json
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T312972)', diff saved to https://phabricator.wikimedia.org/P32175 and previous config saved to /var/cache/conftool/dbconfig/20220802-161607-marostegui.json
  • 16:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 16:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312972)', diff saved to https://phabricator.wikimedia.org/P32174 and previous config saved to /var/cache/conftool/dbconfig/20220802-161545-marostegui.json
  • 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1004.eqiad.wmnet on all recursors
  • 16:10 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1004.eqiad.wmnet on all recursors
  • 16:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:05 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 16:05 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1004.eqiad.wmnet
  • 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P32173 and previous config saved to /var/cache/conftool/dbconfig/20220802-160039-marostegui.json
  • 15:51 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2056.codfw.wmnet with reason: T309957
  • 15:50 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2056.codfw.wmnet with reason: T309957
  • 15:49 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2040.codfw.wmnet with reason: T309957
  • 15:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2040.codfw.wmnet with reason: T309957
  • 15:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2039.codfw.wmnet with reason: T309957
  • 15:45 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2039.codfw.wmnet with reason: T309957
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P32172 and previous config saved to /var/cache/conftool/dbconfig/20220802-154533-marostegui.json
  • 15:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc[2040-2041].codfw.wmnet with reason: shutdown for PDU upgrade
  • 15:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc[2040-2041].codfw.wmnet with reason: shutdown for PDU upgrade
  • 15:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host elastic2037.codfw.wmnet
  • 15:36 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host elastic2037.codfw.wmnet
  • 15:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T312972)', diff saved to https://phabricator.wikimedia.org/P32171 and previous config saved to /var/cache/conftool/dbconfig/20220802-153027-marostegui.json
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T312972)', diff saved to https://phabricator.wikimedia.org/P32170 and previous config saved to /var/cache/conftool/dbconfig/20220802-152818-marostegui.json
  • 15:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32169 and previous config saved to /var/cache/conftool/dbconfig/20220802-152740-marostegui.json
  • 15:24 moritzm: installing gnupg2 security updates
  • 15:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc2024.codfw.wmnet with reason: shutdown for PDU upgrade
  • 15:15 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc2024.codfw.wmnet with reason: shutdown for PDU upgrade
  • 15:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster1004.eqiad.wmnet with OS buster
  • 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P32167 and previous config saved to /var/cache/conftool/dbconfig/20220802-151234-marostegui.json
  • 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Power down for PDU maintenance, T310070
  • 15:10 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ganeti-test[2001-2003].codfw.wmnet with reason: Power down for PDU maintenance, T310070
  • 15:08 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on thanos-be2001.codfw.wmnet with reason: pdu
  • 15:08 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on thanos-be2001.codfw.wmnet with reason: pdu
  • 15:07 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
  • 15:07 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mc-gp2002.codfw.wmnet with reason: Power down for PDU maintenance, T310070
  • 15:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on mc-gp2002.codfw.wmnet with reason: Power down for PDU maintenance, T310070
  • 15:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2037.codfw.wmnet with reason: T309957
  • 15:04 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2037.codfw.wmnet with reason: T309957
  • 15:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: shutdown for PDU upgrade
  • 15:00 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: shutdown for PDU upgrade
  • 14:59 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic2025.codfw.wmnet with reason: T309957
  • 14:59 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic2025.codfw.wmnet with reason: T309957
  • 14:58 oblivian@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=(appservers|api)-ro,name=codfw
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P32166 and previous config saved to /var/cache/conftool/dbconfig/20220802-145728-marostegui.json
  • 14:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2060.codfw.wmnet with OS bullseye
  • 14:53 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: host reimage
  • 14:50 moritzm: uploaded gnupg2 2.1.18-8~deb9u4+wmf1 to stretch-wikimedia
  • 14:50 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster1004.eqiad.wmnet with reason: host reimage
  • 14:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32164 and previous config saved to /var/cache/conftool/dbconfig/20220802-144222-marostegui.json
  • 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32163 and previous config saved to /var/cache/conftool/dbconfig/20220802-144013-marostegui.json
  • 14:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 14:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32162 and previous config saved to /var/cache/conftool/dbconfig/20220802-143952-marostegui.json
  • 14:37 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetmaster1004.eqiad.wmnet with OS buster
  • 14:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2060.codfw.wmnet with reason: host reimage
  • 14:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2060.codfw.wmnet with reason: host reimage
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P32161 and previous config saved to /var/cache/conftool/dbconfig/20220802-142446-marostegui.json
  • 14:23 Emperor: shutdown ms-be20[30,45,52] for PDU work T309957
  • 14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
  • 14:21 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be[2030,2045,2052].codfw.wmnet with reason: shutdown for PDU replacement
  • 14:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2060.codfw.wmnet with OS bullseye
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P32160 and previous config saved to /var/cache/conftool/dbconfig/20220802-140940-marostegui.json
  • 14:05 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster2004.codfw.wmnet with OS buster
  • 14:04 godog: grow sda/sdb 3 by 100G on thanos-be1001 - T314275
  • 14:03 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on centrallog2002.codfw.wmnet with reason: pdu
  • 14:03 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on centrallog2002.codfw.wmnet with reason: pdu
  • 14:01 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on prometheus2005.codfw.wmnet with reason: pdu
  • 14:01 root@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on prometheus2005.codfw.wmnet with reason: pdu
  • 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-tls
  • 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2032.codfw.wmnet,service=ats-be
  • 13:57 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet,service=ats-be
  • 13:56 godog: schedule poweroff for centrallog2002 at 16 utc - T310070
  • 13:54 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-be
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32159 and previous config saved to /var/cache/conftool/dbconfig/20220802-135435-marostegui.json
  • 13:53 godog: depool and poweroff prometheus2005 - T310070
  • 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-tls
  • 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=ats-tls
  • 13:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[3-4].codfw.wmnet,service=varnish-fe
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32158 and previous config saved to /var/cache/conftool/dbconfig/20220802-135226-marostegui.json
  • 13:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:52 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=ats-tls
  • 13:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32157 and previous config saved to /var/cache/conftool/dbconfig/20220802-135155-marostegui.json
  • 13:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=ats-tls
  • 13:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp203[1-2].codfw.wmnet,service=varnish-fe
  • 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-be
  • 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=varnish-fe
  • 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2030.codfw.wmnet,service=ats-be
  • 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=ats-tls
  • 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=varnish-fe
  • 13:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet,service=ats-be
  • 13:45 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: host reimage
  • 13:42 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster2004.codfw.wmnet with reason: host reimage
  • 13:42 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:42 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2013.codfw.wmnet with OS bullseye
  • 13:41 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:41 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:40 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable usage tracking for statement for cebwiki (T296384) – expected to gradually increase number of wbc_entity_usage and probably recentchanges rows on cebwiki, but not too much, see task for details (duration: 03m 06s)
  • 13:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2028.codfw.wmnet with OS bullseye
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P32156 and previous config saved to /var/cache/conftool/dbconfig/20220802-133648-marostegui.json
  • 13:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:34 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Introduce $wmgEntityUsageModifierLimitsStatement (T296384) (2/2) (duration: 03m 21s)
  • 13:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:33 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:31 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Introduce $wmgEntityUsageModifierLimitsStatement (T296384) (1/2) (duration: 03m 16s)
  • 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ganeti2028.codfw.wmnet with reason: Power down for PDU maintenance, T309957
  • 13:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on ganeti2028.codfw.wmnet with reason: Power down for PDU maintenance, T309957
  • 13:27 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetmaster2004.codfw.wmnet with OS buster
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2013.codfw.wmnet with reason: host reimage
  • 13:24 vgutierrez: restarting ATS 9.x instances to apply https://gerrit.wikimedia.org/r/819585 - T309651
  • 13:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2028.codfw.wmnet with reason: host reimage
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P32155 and previous config saved to /var/cache/conftool/dbconfig/20220802-132142-marostegui.json
  • 13:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2013.codfw.wmnet with reason: host reimage
  • 13:19 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2028.codfw.wmnet with reason: host reimage
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: a4499e5: Revert "testwiki: Add mediawiki.web_ui.interactions stream" (T314151, T311268) (duration: 03m 19s)
  • 13:10 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c2fb8a5: Enable RealtimePreview on Group 0 wikis (T314150) (duration: 03m 21s)
  • 13:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32154 and previous config saved to /var/cache/conftool/dbconfig/20220802-130636-marostegui.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T312972)', diff saved to https://phabricator.wikimedia.org/P32153 and previous config saved to /var/cache/conftool/dbconfig/20220802-130428-marostegui.json
  • 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 13:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 13:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 13:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312972)', diff saved to https://phabricator.wikimedia.org/P32152 and previous config saved to /var/cache/conftool/dbconfig/20220802-130351-marostegui.json
  • 13:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2013.codfw.wmnet with OS bullseye
  • 13:00 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2028.codfw.wmnet with OS bullseye
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti2013.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 12:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ganeti2013.codfw.wmnet with reason: Remove node for eventual reimage, T311686
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P32151 and previous config saved to /var/cache/conftool/dbconfig/20220802-124845-marostegui.json
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P32150 and previous config saved to /var/cache/conftool/dbconfig/20220802-123338-marostegui.json
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T312972)', diff saved to https://phabricator.wikimedia.org/P32149 and previous config saved to /var/cache/conftool/dbconfig/20220802-121832-marostegui.json
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T312972)', diff saved to https://phabricator.wikimedia.org/P32148 and previous config saved to /var/cache/conftool/dbconfig/20220802-121624-marostegui.json
  • 12:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:01 marostegui: dbmaint x1@eqiad T314087
  • 11:57 marostegui: dbmaint s7@eqiad T314377
  • 11:57 marostegui: dbmaint s3@eqiad T314377
  • 11:57 marostegui: dbmaint s8@eqiad T314377
  • 11:55 marostegui: dbmait s8@eqiad T314377
  • 11:54 marostegui: dbmait s3@eqiad T314377
  • 11:50 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:48 marostegui: dbmait s7@eqiad T314377
  • 11:46 marostegui: dbmait s4@eqiad T314377
  • 11:35 elukey: restart rsyslog on ml-serve1006
  • 10:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1082.eqiad.wmnet with reason: T312626 btullis
  • 10:50 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-worker1082.eqiad.wmnet with reason: T312626 btullis
  • 10:49 godog: grow sda3 by 100G on thanos-be2004 - T314275
  • 10:42 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 10:42 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 10:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: After restart', diff saved to https://phabricator.wikimedia.org/P32147 and previous config saved to /var/cache/conftool/dbconfig/20220802-103318-root.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: After restart', diff saved to https://phabricator.wikimedia.org/P32146 and previous config saved to /var/cache/conftool/dbconfig/20220802-101813-root.json
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'Add db2175 to s2 T311494', diff saved to https://phabricator.wikimedia.org/P32145 and previous config saved to /var/cache/conftool/dbconfig/20220802-101522-marostegui.json
  • 10:12 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1019.eqiad.wmnet with OS bullseye
  • 10:05 jynus: shutdown dbprov2002 backup2005 backup2008 T310070
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: After restart', diff saved to https://phabricator.wikimedia.org/P32144 and previous config saved to /var/cache/conftool/dbconfig/20220802-100308-root.json
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32143 and previous config saved to /var/cache/conftool/dbconfig/20220802-100304-root.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db2079 from dbctl T313885', diff saved to https://phabricator.wikimedia.org/P32141 and previous config saved to /var/cache/conftool/dbconfig/20220802-095455-marostegui.json
  • 09:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1019.eqiad.wmnet with reason: host reimage
  • 09:49 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1019.eqiad.wmnet with reason: host reimage
  • 09:49 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: After restart', diff saved to https://phabricator.wikimedia.org/P32140 and previous config saved to /var/cache/conftool/dbconfig/20220802-094804-root.json
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32139 and previous config saved to /var/cache/conftool/dbconfig/20220802-094759-root.json
  • 09:44 godog: grow sdb3 by 100G on thanos-be2004 - T314275
  • 09:43 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 09:42 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 09:37 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1019.eqiad.wmnet with OS bullseye
  • 09:36 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: After restart', diff saved to https://phabricator.wikimedia.org/P32138 and previous config saved to /var/cache/conftool/dbconfig/20220802-093259-root.json
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32137 and previous config saved to /var/cache/conftool/dbconfig/20220802-093254-root.json
  • 09:30 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 09:30 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 09:28 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 09:26 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 09:25 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 09:22 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 5%: After restart', diff saved to https://phabricator.wikimedia.org/P32136 and previous config saved to /var/cache/conftool/dbconfig/20220802-091754-root.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32135 and previous config saved to /var/cache/conftool/dbconfig/20220802-091749-root.json
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2143', diff saved to https://phabricator.wikimedia.org/P32134 and previous config saved to /var/cache/conftool/dbconfig/20220802-091518-root.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 2%: After restart', diff saved to https://phabricator.wikimedia.org/P32133 and previous config saved to /var/cache/conftool/dbconfig/20220802-090250-root.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32132 and previous config saved to /var/cache/conftool/dbconfig/20220802-090245-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 1%: After restart', diff saved to https://phabricator.wikimedia.org/P32131 and previous config saved to /var/cache/conftool/dbconfig/20220802-084745-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: After maintenance', diff saved to https://phabricator.wikimedia.org/P32130 and previous config saved to /var/cache/conftool/dbconfig/20220802-084740-root.json
  • 08:46 marostegui: stop mysql on db2095 db2107 db2109 db2137 db2147 db2159 db2160 pc2012 for pdu maintenance on codfw b5 T310070
  • 07:49 moritzm: upgrading drmrs ganeti clusters to 3.0.2 T312637
  • 07:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 07:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to plain disks, T311686
  • 07:22 godog: bounce icinga on alert2001 - T314353
  • 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 07:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on kubetcd2005.codfw.wmnet with reason: Switch instance to DRBD, T311686
  • 06:58 elukey: restart rsyslog on ml-serve2006
  • 06:56 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.22/extensions/FlaggedRevs/maintenance/pruneRevData.php: Backport: pruneRevData: Make cleaning in larger batches (T296380) (duration: 03m 26s)
  • 06:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:46 godog: bounce icinga on alert1001 - T314353
  • 05:48 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db2088.codfw.wmnet
  • 05:48 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:44 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 05:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2088.codfw.wmnet
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P32127 and previous config saved to /var/cache/conftool/dbconfig/20220802-052923-root.json
  • 05:24 marostegui: dbmait x1@eqiad T314087
  • 04:17 ryankemper: [Elastic] Small amendment to my earlier statement; based off epoch time `be_x_oldwiki_titlesuggest_1659407912` was not an old index hanging around after a reindex operation, but rather the new one that the reindex operation was trying to create, but had not yet finished (therefore didn't switch over the aliases). It presumably got interrupted by the reimage of `elastic2059`.
  • 04:15 ryankemper: [Elastic] Blew away red index like so: `ryankemper@cumin1001:~$ curl -XDELETE https://search.svc.codfw.wmnet:9243/be_x_oldwiki_titlesuggest_1659407912`. Cluster is back to `green` status.
  • 04:07 ryankemper: [Elastic] Per `curl -s https://search.svc.codfw.wmnet:9243/_cat/aliases | grep -i be_x` I see `be_x_oldwiki_titlesuggest ` alias points to `be_x_oldwiki_titlesuggest_1658396688`. I think this means the red index is an old index from an in-progress reindex operation. I likely just need to delete `be_x_oldwiki_titlesuggest_1659407912` but doing some quick digging first
  • 04:04 ryankemper: [Elastic] Red cluster status in main codfw elasticsearch cluster (`https://search.svc.codfw.wmnet:9243`); culprit appears to be index `be_x_oldwiki_titlesuggest_1659407912`. Confusingly it has 2 replicas set so it's not clear to me how we got into this state starting from green (in the past we've gone into red status from indices that erroneously had 0 replicas in production)
  • 03:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:40 krinkle@deploy1002: Synchronized multiversion/: I0802db272695 (duration: 03m 10s)
  • 03:40 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:34 krinkle@deploy1002: Synchronized wmf-config/: I9b89c0ff5c2 (duration: 03m 32s)
  • 03:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:27 krinkle@deploy1002: Synchronized multiversion/: I6e97d39a3, Ib843ebced31 (duration: 03m 30s)
  • 03:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:22 krinkle@mwmaint1002: pull aborted: (duration: 00m 11s)
  • 03:21 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I39a2b86065 (duration: 03m 19s)
  • 03:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host elastic2059.codfw.wmnet with OS bullseye
  • 03:15 krinkle@deploy1002: Synchronized multiversion/: Ieaea60 (duration: 03m 03s)
  • 03:14 krinkle@mwmaint2002: pull aborted: (duration: 01m 36s)
  • 03:14 krinkle@mwmaint1002: pull aborted: (duration: 01m 31s)
  • 03:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2059.codfw.wmnet with reason: host reimage
  • 02:54 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph.service` to clear `Query Service HTTP Port` && `WDQS SPARQL` alerts
  • 02:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2059.codfw.wmnet with reason: host reimage
  • 02:36 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2059.codfw.wmnet with OS bullseye
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:35 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: Ieaea60a991e5 (duration: 03m 10s)
  • 00:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:23 krinkle@deploy1002: Synchronized multiversion/: Ia3406e (duration: 03m 22s)
  • 00:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:16 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:16 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-08-01

  • 23:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Id1ce285631f5, I194d419fbfe (duration: 03m 09s)
  • 23:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 23:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 23:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 23:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:08 moritzm: drain ganeti2028 T309957
  • 21:03 mutante: gerrit2002 - mkdir /var/lib/gerrit2/review_site | gerrit1001 - rsyncing /var/lib/gerrit2/review_site/ to gerrit2002 T313250 T313972
  • 21:01 urbanecm: UTC late backport window done
  • 21:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 461e070: itwiki: Change robot policy on NS2 and NS3 (T314165) (duration: 03m 18s)
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:57 mutante: phab1001 - rsyncing repo data /srv/repos/ to phab2002 (in addition to phab1004 previously) T313360
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:55 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mnwwiktionary --fix # T314023
  • 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ba8c177: mnwwiktionary: Create Appendix namespace (T314023) (duration: 03m 09s)
  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:48 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript updateArticleCount.php --wiki=viwikibooks --update # T314239
  • 20:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c19c3e36ab: DiscussionTools: Make new reply buttons available at mediawiki.org (T314076); 24db016c4: viwikibooks: Change wgArticleCountMethod to any (T314239) (duration: 03m 10s)
  • 20:35 daniel@deploy1002: Synchronized php-1.39.0-wmf.22/includes/Rest/Handler: Fix: Parsoid REST handler: allow pagebundle input without original HTML. (duration: 03m 15s)
  • 20:25 urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-ne.svg (T311700)
  • 20:21 daniel@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-wordmark-ne.svg: Config: newiki: Update wordmark (T311700) (duration: 03m 17s)
  • 20:17 daniel@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: newiki: Update wordmark (T311700) (duration: 03m 32s)
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2054.codfw.wmnet with OS bullseye
  • 19:41 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
  • 19:35 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2054.codfw.wmnet with reason: host reimage
  • 19:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2054.codfw.wmnet with OS bullseye
  • 18:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2031.codfw.wmnet with OS bullseye
  • 18:44 mutante: gitlab - moved data_persistence group to new parent, under /repos/
  • 18:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
  • 18:32 mutante: gitlab - created group 'data_persistence' - added Ladsgroup and upgraded from member to maintainer
  • 18:27 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2031.codfw.wmnet with reason: host reimage
  • 18:12 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2031.codfw.wmnet with OS bullseye
  • 17:58 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2025.codfw.wmnet with OS bullseye
  • 17:37 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
  • 17:31 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2025.codfw.wmnet with reason: host reimage
  • 17:18 ryankemper: T289135 T314078 Manually reimaging remaining codfw stretch hosts (`elastic[2025,2031,2054,2059-2060]`) to bullseye, one host at a time, waiting for green cluster status to return between each run. `ryankemper@cumin1001` tmux session `codfw_reimage`
  • 17:16 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2025.codfw.wmnet with OS bullseye
  • 17:08 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 17:08 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 17:06 mutante: alert1001 - systemctl restart nsca - pinged by fundraising tech because fundraising hosts have the "passive check is awol" issue again (T196336)
  • 16:25 moritzm: installing tcpdump updates from bullseye point release
  • 16:23 cwhite@puppetmaster1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
  • 16:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1018.eqiad.wmnet with OS bullseye
  • 16:10 cwhite@puppetmaster1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kibana7,name=logstash2023.codfw.wmnet
  • 15:57 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
  • 15:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1018.eqiad.wmnet with reason: host reimage
  • 15:41 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1018.eqiad.wmnet with OS bullseye
  • 15:39 mvernon@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase T309896 - mvernon@cumin1001
  • 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:29 mvernon@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1016.eqiad.wmnet: Canary testing of 3.11.13 on Restbase T309896 - mvernon@cumin1001
  • 15:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Beta: add configuration for redirect badges (T313896) (2/2, should be a no-op) (duration: 03m 30s)
  • 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Beta: add configuration for redirect badges (T313896) (1/2, should be a no-op) (duration: 03m 15s)
  • 15:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:54 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 14:53 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 14:42 moritzm: installing openjdk-11 security updates
  • 14:39 btullis@puppetmaster1001: conftool action : set/pooled=inactive; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 14:39 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 14:38 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 14:34 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:28 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 14:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:13 urbanecm@deploy1002: Synchronized php-1.39.0-wmf.22/skins/Vector/: b5007c5: Revert "styles: Unify on standard external link icon"" (duration: 03m 16s)
  • 14:12 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 14:12 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 14:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:05 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 14:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2044.codfw.wmnet with OS bullseye
  • 14:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: bcb7b0d: Adjust width-height ratio of logo to fix display issue (T310961; 2/2) (duration: 03m 17s)
  • 14:04 urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/srwikisource{.png;-1.5x.png;-2x.png} (T310961)
  • 14:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:01 urbanecm@deploy1002: Synchronized static/images/project-logos/: bcb7b0d: srwikisource: Adjust width-height ratio of logo to fix display issue (T310961; 1/2) (duration: 03m 41s)
  • 14:01 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:01 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:58 urbanecm: UTC afternoon backport window is going to overflow by a couple of minutes
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
  • 13:44 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2044.codfw.wmnet with reason: host reimage
  • 13:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host elastic2044.codfw.wmnet with OS bullseye
  • 13:22 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REIMAGE (1 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster reimage (bullseye upgrade) - bking@cumin1001 - T289135
  • 11:50 moritzm: installing openjdk-8 security updates for stretch
  • 11:43 moritzm: uploaded openjdk-8 8u342-b07-1~deb9u1 for stretch-wikimedia
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P32124 and previous config saved to /var/cache/conftool/dbconfig/20220801-102714-ladsgroup.json
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32123 and previous config saved to /var/cache/conftool/dbconfig/20220801-101208-ladsgroup.json
  • 10:09 vgutierrez: test ATS 9.1.2 on cp6016 - T309651
  • 10:05 vgutierrez: test ATS 9.1.2 on cp6008 - T309651
  • 10:00 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@4da9195]: (no justification provided) (duration: 00m 19s)
  • 10:00 ebysans@deploy1002: Started deploy [airflow-dags/analytics@4da9195]: (no justification provided)
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P32122 and previous config saved to /var/cache/conftool/dbconfig/20220801-095702-ladsgroup.json
  • 09:56 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@85585b0]: (no justification provided) (duration: 00m 05s)
  • 09:56 ebysans@deploy1002: Started deploy [airflow-dags/analytics@85585b0]: (no justification provided)
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P32121 and previous config saved to /var/cache/conftool/dbconfig/20220801-094156-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T314041)', diff saved to https://phabricator.wikimedia.org/P32120 and previous config saved to /var/cache/conftool/dbconfig/20220801-093845-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: Maintenance
  • 09:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 09:21 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2004.codfw.wmnet
  • 09:10 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2004.codfw.wmnet
  • 09:10 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2003.codfw.wmnet
  • 09:01 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2003.codfw.wmnet
  • 09:00 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner2002.codfw.wmnet
  • 08:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:53 ladsgroup@deploy1002: Synchronized php-1.39.0-wmf.22/includes/api: Backport: api: Support for links migration in ApiQueryBacklinks (T312865 T314112) (duration: 03m 01s)
  • 08:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:50 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner2002.codfw.wmnet
  • 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1004.eqiad.wmnet
  • 08:48 godog: thanos-be2004: copy quarantined and tmp off sdb3 and into sdb4 for analysis and to free space - T314275
  • 08:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:47 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Stop writing to the old templatelinks columns in itwikisource (T312865) (duration: 03m 12s)
  • 08:43 vgutierrez: rolling upgrade of HAProxy to version 2.4.18
  • 08:43 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:41 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:39 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1004.eqiad.wmnet
  • 08:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1003.eqiad.wmnet
  • 08:28 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1003.eqiad.wmnet
  • 08:25 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab-runner1002.eqiad.wmnet
  • 08:14 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab-runner1002.eqiad.wmnet
  • 06:19 oblivian@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=(appservers|api)-ro,name=codfw
  • 06:14 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appservers-ro
  • 06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=appserver-ro
  • 06:13 oblivian@puppetmaster1001: conftool action : set/ttl=10; selector: dnsdisc=(appserver|api)-ro
  • 05:43 moritzm: installing Linux 5.10.127-2 on Gitlab runners
  • 01:00 krinkle@deploy1002: Synchronized multiversion/: Ic0dbcb (duration: 03m 31s)
  • 00:57 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:53 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 00:45 krinkle@deploy1002: Synchronized multiversion/MWMultiVersion.php: I9d363abd7cfef (duration: 03m 17s)
  • 00:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 00:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 00:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 00:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: appl

Other archives

2000s

2010s

2020s