Server Admin Log/Archive 45

From Wikitech

2021-07-31

2021-07-30

  • 22:44 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' .
  • 22:31 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 22:22 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 22:20 eileen: civicrm revision is 158ed65e00, config revision is 6011d9c471
  • 21:51 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 21:50 legoktm@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 21:50 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 21:49 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 21:49 legoktm@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:48 legoktm@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 21:47 legoktm@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 21:46 legoktm@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 20:39 ottomata: wiping kafka jumbo cluster in deployment-prep beta
  • 19:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare wd_propertysuggester streams - T287760 (duration: 00m 57s)
  • 16:13 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:10 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1298-1299].eqiad.wmnet
  • 15:30 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:20 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1298-1299].eqiad.wmnet
  • 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[8-9].eqiad.wmnet
  • 15:19 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[5-6].eqiad.wmnet
  • 15:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1295-1296].eqiad.wmnet
  • 15:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:07 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1295-1296].eqiad.wmnet
  • 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:58 mutante: mw1439, mw1440, mw1445, mw1446 - scap pull, repool as jobrunners after reimaging
  • 14:55 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw144[5-6].eqiad.wmnet
  • 14:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
  • 14:52 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
  • 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw144[5-6].eqiad.wmnet
  • 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1440.eqiad.wmnet
  • 14:46 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw1439.eqiad.wmnet
  • 14:39 topranks: Setting up BGP peering to Xiber LLC AS393950 on cr2-eqord, Equinix Chicago exchange.
  • 14:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
  • 14:19 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
  • 14:18 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
  • 14:17 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
  • 14:16 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
  • 14:15 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
  • 14:14 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
  • 14:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
  • 13:57 mutante: mw1439,mw1440,mw1445,mw1446 - converting from app/API to jobrunners - reimaging for row balance in eqiad
  • 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
  • 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1439-1440].eqiad.wmnet with reason: reimage
  • 13:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
  • 13:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw[1445-1446].eqiad.wmnet with reason: reimage
  • 13:26 joe: uploaded docker-report 0.0.13 to buster
  • 13:23 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw144[5-6].eqiad.wmnet
  • 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1440.eqiad.wmnet
  • 13:22 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1439.eqiad.wmnet
  • 11:23 moritzm: installing libsndfile security updates on stretch
  • 09:38 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily (duration: 00m 21s)
  • 09:37 mbsantos@deploy1002: Started deploy [kartotherian/deploy@289d3a9]: Add public source to render tegola MVT in maps2007 temporarily
  • 09:32 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007 (duration: 00m 21s)
  • 09:32 mbsantos@deploy1002: Started deploy [kartotherian/deploy@c6cfa85]: Add non-public source to render tegola MVT in maps2007
  • 08:56 topranks: running homer against asw2-a-eqiad and asw2-b-eqiad to bring homer in line with manual config added for buffer mem. T284592
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16934 and previous config saved to /var/cache/conftool/dbconfig/20210730-062545-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16933 and previous config saved to /var/cache/conftool/dbconfig/20210730-061041-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16932 and previous config saved to /var/cache/conftool/dbconfig/20210730-055537-root.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16931 and previous config saved to /var/cache/conftool/dbconfig/20210730-054031-root.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 15%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16930 and previous config saved to /var/cache/conftool/dbconfig/20210730-052527-root.json
  • 05:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/tests/phpunit/includes/media/PNGMetadataExtractorTest.php: fix broken PNG thumbnails T286273 (duration: 00m 57s)
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16929 and previous config saved to /var/cache/conftool/dbconfig/20210730-051024-root.json
  • 04:56 tstarling@deploy1002: Synchronized php-1.37.0-wmf.16/includes/media/PNGMetadataExtractor.php: fix broken PNG thumbnails T286273 (duration: 00m 57s)
  • 04:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: After upgrae', diff saved to https://phabricator.wikimedia.org/P16928 and previous config saved to /var/cache/conftool/dbconfig/20210730-045520-root.json

2021-07-29

  • 23:41 derick@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Merge new configs with existing testwiki definition (duration: 00m 57s)
  • 21:11 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.16 refs T281157
  • 20:59 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/Wikibase/client: Backport: Let language parameter accept null in Scribunto_LuaWikibaseEntityLibrary (T287704) (duration: 01m 09s)
  • 19:27 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15 refs T281157
  • 19:19 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.16 refs T281157
  • 18:37 urbanecm@deploy1002: Finished scap: 796fe8e: 927763c: SecurePoll backports (T283728, T284585) (duration: 17m 06s)
  • 18:19 urbanecm@deploy1002: Started scap: 796fe8e: 927763c: SecurePoll backports (T283728, T284585)
  • 18:19 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
  • 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.16/extensions/GlobalWatchlist/modules/SiteDisplay.js: 9a2383d: Display: Use HTML "dir" attribute for ltr/rtl (T287649) (duration: 01m 25s)
  • 18:11 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons. - razzi@cumin1001
  • 16:27 andrewbogott: adding uid=mdipietro,ou=people,dc=wikimedia,dc=org to cn=ops,ou=groups,dc=wikimedia,dc=org in ldap
  • 15:23 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:11 mmandere: pool lvs1013.eqiad.wmnet - T286032
  • 15:09 mmandere: pool dns1001.wikimedia.org - T286032
  • 15:07 mmandere: pool cp107[5-8].eqiad.wmnet - T286032
  • 14:48 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
  • 14:48 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1013.eqiad.wmnet with reason: Eqiad row A maintenance
  • 14:46 mmandere: depool lvs1013 - T286032
  • 14:45 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
  • 14:45 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1001.wikimedia.org with reason: Eqiad row A maintenance
  • 14:39 mmandere: depool dns1001 - T286032
  • 14:39 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
  • 14:38 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
  • 14:38 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
  • 14:38 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1075-1078].eqiad.wmnet with reason: Eqiad row A maintenance
  • 14:35 mmandere: depool cp107[5-8].eqiad.wmnet - T286032
  • 14:19 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:11 vgutierrez: restart pybal on lvs2009
  • 14:09 vgutierrez: restart pybal on lvs2010
  • 14:07 vgutierrez: restart pybal on lvs2008
  • 14:05 vgutierrez: restart pybal on lvs2007
  • 13:59 vgutierrez: restart pybal on lvs1014
  • 13:55 vgutierrez: restart pybal on lvs1015
  • 13:52 _joe_: restarting pybal on lvs1016
  • 12:56 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola (duration: 00m 21s)
  • 12:55 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@1e31cc6]: Increase mirrored traffic to tegola
  • 12:22 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti2026.codfw.wmnet
  • 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
  • 10:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2104.codfw.wmnet with reason: REIMAGE
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2104 T287230', diff saved to https://phabricator.wikimedia.org/P16925 and previous config saved to /var/cache/conftool/dbconfig/20210729-102753-marostegui.json
  • 09:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola (duration: 00m 22s)
  • 09:59 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@6960d32]: Increase mirrored traffic to tegola
  • 09:47 moritzm: installing Mariadb 10.3.29 updates from Buster point release (as packaged in Debian, not the WMF DB packages)
  • 09:40 jelto: uncordon kubestage1002.eqiad.wmnet as rsyslog was restarted and log shipping to logstash works again
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 09:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:33 moritzm: purging obsolete kernels from moscovium (disk space alerts for /)
  • 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moscovium.eqiad.wmnet
  • 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host moscovium.eqiad.wmnet
  • 07:55 elukey: roll restart uwsgi + celery on ores[12]* nodes to pick up aspell upgrades
  • 07:53 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 07:52 moritzm: restarting Tomcat on idp-test
  • 06:41 XioNoX: push pfw policies - T287203
  • 05:44 Amir1: adding "comunicaciones AT wikimediacolombia.org" as owner of wikimedia-co mailing list
  • 01:08 eileen: civicrm revision changed from 739c936298 to 158ed65e00, config revision is 6011d9c471

2021-07-28

  • 23:57 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: wgSkipSkins: Update defaults, hide modern (T287616) (duration: 01m 06s)
  • 23:50 thcipriani@deploy1002: Synchronized wmf-config: Config: Disable mobile contributions simplifications on Wikidata and Commons (T283988) (duration: 01m 58s)
  • 19:16 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.16 refs T281157 (duration: 01m 06s)
  • 19:15 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.16 refs T281157
  • 19:09 twentyafterfour: Preparing to deploy 1.37.0-wmf.16 to group1 wikis
  • 18:57 legoktm: mwmaint2002$ foreachwikiindblist wikimania refreshLinks.php - to start populating DPL tracking category
  • 18:36 legoktm@deploy1002: Finished scap: Add a tracking category to pages using the <DynamicPageList> tag (duration: 27m 16s)
  • 18:14 jbond: manually cleared out the puppetdb2002 queue
  • 18:08 legoktm@deploy1002: Started scap: Add a tracking category to pages using the <DynamicPageList> tag
  • 16:37 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 16:00 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 15:59 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 15:58 ryankemper: T287112 [WDQS] Re-pooled `wdqs2002`
  • 15:57 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@26273d8]: 0.3.77 (duration: 08m 55s)
  • 15:53 mutante: mw1434,mw1435,mw1436: scap pull, repooled, reimaged, converted from API to appserver for balancing (T279309)
  • 15:53 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[4-6].eqiad.wmnet
  • 15:52 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[4-6].eqiad.wmnet
  • 15:51 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.77` on canary `wdqs1003`; proceeding to rest of fleet
  • 15:48 ryankemper@deploy1002: Started deploy [wdqs/wdqs@26273d8]: 0.3.77
  • 15:47 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.77`. Pre-deploy tests passing on canary `wdqs1003`
  • 15:47 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
  • 14:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
  • 14:39 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
  • 14:33 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on mw1434.eqiad.wmnet with reason: known issue
  • 14:19 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
  • 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 14:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
  • 14:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
  • 14:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
  • 13:32 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw143[4-6].eqiad.wmnet
  • 13:29 moritzm: installing python2.7 security updates on stretch
  • 13:08 moritzm: installing python3.5 security updates on stretch
  • 12:27 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 11:27 moritzm: installing nginx security updates on thumbor*
  • 11:18 moritzm: installing nginx security updates on sodium (mirrors.wikimedia.org)
  • 11:03 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 11:03 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 10:11 moritzm: installing remaining nginx security updates on stretch
  • 10:09 godog: temp fix prometheus-icinga-am on alert1001
  • 09:40 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 09:40 urbanecm: Start server-side upload for 1 video file (T287482)
  • 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:29 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:24 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
  • 08:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1122.eqiad.wmnet with reason: REIMAGE
  • 08:27 Amir1: running several long-running queries against pc1007
  • 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:01 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 07:53 moritzm: installing aspell security updates on stretch
  • 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 29 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 29 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 40 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 40 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 6 hosts with reason: T287559
  • 07:20 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 6 hosts with reason: T287559
  • 07:07 godog: remove cloud*/syslog.log from centrallog2001 - T287559
  • 07:06 godog: remove node_pinger.prom from node-pinger hosts
  • 06:42 godog: remove obsolete user.log.manual-rotation from centrallog1001 to free disk space
  • 02:43 TimStarling: on mwmaint2002 fixing T286273 broken files using eval.php

2021-07-27

  • 23:53 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.16/skins/Vector: Backport: Restore print, links, table and message box styles (T278896) (duration: 01m 07s)
  • 23:15 cjming@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable user links on office + test wikis (T287391) (duration: 02m 00s)
  • 20:44 ryankemper: [WDQS] Returning `wdqs` dns discovery to the expected status of `(eqiad, codfw) = (depooled, pooled)`: `sudo confctl --object-type discovery select 'dnsdisc=wdqs,name=eqiad' set/pooled=false`
  • 20:44 legoktm: legoktm@wtp1025:~$ sudo systemctl restart php7.2-fpm # restart php-fpm, opcache hit ratio was warning
  • 20:43 ryankemper@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
  • 20:35 twentyafterfour@deploy1002: Pruned MediaWiki: 1.37.0-wmf.14 (duration: 03m 12s)
  • 20:25 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.16
  • 20:02 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.16 (duration: 36m 06s)
  • 19:26 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.16
  • 18:49 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate EchoMail and EchoInteraction to EventGate - T287210 (duration: 02m 28s)
  • 18:17 twentyafterfour: MediaWiki branch `1.37.0-wmf.16` prepped and patched in preparation for the upcoming deployment window.
  • 17:50 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:47 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 17:28 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2022.codfw.wmnet
  • 17:18 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2022.codfw.wmnet
  • 17:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2021.codfw.wmnet
  • 16:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
  • 16:49 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
  • 16:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
  • 16:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2021.codfw.wmnet
  • 16:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2020.codfw.wmnet
  • 16:43 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
  • 16:42 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 16:37 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
  • 16:34 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash2020.codfw.wmnet
  • 16:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 16:21 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 16:14 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:42 elukey: add disk_template drbd back to ml-serve-ctrl100[12] vms after performance testing - T287238
  • 15:22 dcausse: cirrus: reindexing 823 wikis in elastic@[eqiad, codfw and cloudelastic] to apply new mapping (weighted_tags) T147505
  • 15:22 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
  • 15:17 mmandere: pool lvs1014.eqiad.wmnet - T286061
  • 15:16 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
  • 15:16 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
  • 15:11 marostegui: Move m1-master from dbproxy1012 to dbproxy1014 T286061
  • 15:11 mmandere: pool authdns1001.wikimedia.org - T286061
  • 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
  • 15:09 mmandere: pool cp10[79-82].eqiad.wmnet - T286061
  • 15:05 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
  • 14:57 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
  • 14:55 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
  • 14:55 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1014.eqiad.wmnet with reason: Eqiad row B maintenance
  • 14:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1162 T287230', diff saved to https://phabricator.wikimedia.org/P16917 and previous config saved to /var/cache/conftool/dbconfig/20210727-145352-marostegui.json
  • 14:53 moritzm: disabling puppet for upcoming row B maintenance
  • 14:52 mmandere: depool lvs1014 - T286061
  • 14:52 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
  • 14:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
  • 14:51 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
  • 14:51 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on authdns1001.wikimedia.org with reason: Eqiad row B maintenance
  • 14:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 14:47 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
  • 14:47 mmandere@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
  • 14:46 mmandere@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1079-1082].eqiad.wmnet with reason: Eqiad row B maintenance
  • 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 14:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
  • 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 14:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1129.eqiad.wmnet with reason: REIMAGE
  • 14:40 mmandere: depool authdns1001 - T286061
  • 14:40 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-serve-ctrl2002.codfw.wmnet
  • 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 14:34 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
  • 14:33 mmandere: depool cp10[79-82]).eqiad.wmnet - T286061
  • 14:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
  • 14:30 topranks: Add peering to AS398196 - Cobalt Ridge at DE-CIX Dallas on cr2-codfw.
  • 14:29 elukey: reduce vcores for ml-serve-ctrl[12]00[12] after performance testing - T287238
  • 14:28 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
  • 14:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 T287230', diff saved to https://phabricator.wikimedia.org/P16916 and previous config saved to /var/cache/conftool/dbconfig/20210727-142520-marostegui.json
  • 14:19 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:16 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:13 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 14:11 moritzm: installing aspell security updates
  • 14:11 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 14:03 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'canary' .
  • 14:00 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics-external' for release 'production' .
  • 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:59 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:54 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 13:52 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 13:42 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5 (duration: 02m 29s)
  • 13:40 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:39 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Deploy v2.10.4-wmf5
  • 13:36 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:34 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:30 ottomata: deploying eventgate-analytics with native prometheus support. Doing this slowly on canary release first to ensure https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload is fixed.
  • 13:29 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 12:56 elukey: created component/iptables185 for buster-wikimedia + imported packages from buster-backports
  • 12:50 dcausse@deploy1002: Finished deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided) (duration: 06m 13s)
  • 12:43 dcausse@deploy1002: Started deploy [wikimedia/discovery/analytics@346ac10]: (no justification provided)
  • 11:23 Lucas_WMDE: EU backport+config window done
  • 11:20 oblivian@deploy1002: Synchronized debug.json: Config: Add the experimental kubernetes backend to mwdebug (T283056) (duration: 00m 56s)
  • 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add stream configuration for ContentTranslation events (T281982) (duration: 00m 58s)
  • 10:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1269.eqiad.wmnet
  • 10:16 jelto: gitlab-ansible playbook on gitlab2001.wikimedia.org END (PASS)
  • 10:11 mutante: replacing scap proxies: mw1269 with mw1420, mw1285 with mw1306
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16909 and previous config saved to /var/cache/conftool/dbconfig/20210727-101053-root.json
  • 10:10 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1269.eqiad.wmnet
  • 10:06 jelto: running gitlab-ansible playbook on gitlab2001.wikimedia.org
  • 10:05 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1269.eqiad.wmnet
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16908 and previous config saved to /var/cache/conftool/dbconfig/20210727-095549-root.json
  • 09:52 jynus: reverting query killer parameters on s3 codfw replicas
  • 09:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1285.eqiad.wmnet
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16906 and previous config saved to /var/cache/conftool/dbconfig/20210727-094046-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16905 and previous config saved to /var/cache/conftool/dbconfig/20210727-092542-root.json
  • 09:13 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1285.eqiad.wmnet
  • 09:12 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1285.eqiad.wmnet
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 15%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16904 and previous config saved to /var/cache/conftool/dbconfig/20210727-091038-root.json
  • 09:04 _joe_: restarting pybal on lvs2009 to pick up the new api depool threshold
  • 08:57 _joe_: repooling mw225[12] for apis
  • 08:56 _joe_: restart pybal on lvs2010 to pick up the depool threshold change
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16902 and previous config saved to /var/cache/conftool/dbconfig/20210727-085535-root.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2147 (re)pooling @ 5%: After mariadb restart and upgraed', diff saved to https://phabricator.wikimedia.org/P16901 and previous config saved to /var/cache/conftool/dbconfig/20210727-084031-root.json
  • 08:36 jynus: reenabled puppet on mwmaint1002
  • 08:29 volans@deploy1002: Finished deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next (duration: 01m 01s)
  • 08:28 volans@deploy1002: Started deploy [netbox/deploy@660ad14]: Test v2.10.4-wmf5 on netbox-next
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2147 to restart mysql', diff saved to https://phabricator.wikimedia.org/P16900 and previous config saved to /var/cache/conftool/dbconfig/20210727-082820-marostegui.json
  • 07:52 jynus: disabling puppet on mwmaint1002
  • 07:14 moritzm: installing krb security updates on buster
  • 06:50 elukey: install iptables from buster-backports (manually) on ml-serve-ctrl200[1,2] as test (+ reboot the nodes for a clean start) - T287238
  • 06:20 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Enable request language for RDF stubs in testwikidatawiki (T285795), Part II (duration: 00m 56s)
  • 06:18 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable request language for RDF stubs in testwikidatawiki (T285795), Part I (duration: 00m 57s)
  • 05:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1162.eqiad.wmnet with reason: REIMAGE
  • 05:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162 T287230', diff saved to https://phabricator.wikimedia.org/P16899 and previous config saved to /var/cache/conftool/dbconfig/20210727-051212-marostegui.json

2021-07-26

  • 23:37 legoktm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/Score/includes/Score.php: Increase lilypond version cache TTL to 1 hour (duration: 00m 57s)
  • 18:30 cstone: SmashPig revision changed from be272c02ce to 020d4eccd4,
  • 17:41 legoktm: ran `scap pull` and repooled mw2336.codfw.wmnet - T287394
  • 17:41 legoktm@cumin1001: conftool action : set/pooled=yes; selector: name=mw2336.codfw.wmnet
  • 17:40 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
  • 17:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1002.eqiad.wmnet with reason: REIMAGE
  • 16:06 legoktm: depooled mw2336.codfw.mwnet, mgmt is down too. T287394
  • 16:04 legoktm@cumin1001: conftool action : set/pooled=no; selector: name=mw2336.codfw.wmnet
  • 15:29 hashar: Restarted gerrit replica on gerrit2001.wikimedia.org # T287122
  • 15:24 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/AbuseFilterHooks.php: Backport: Don’t generate current content text twice, Part II (duration: 01m 49s)
  • 15:21 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/AbuseFilter/includes/VariableGenerator/RunVariableGenerator.php: Backport: Don’t generate current content text twice, Part I (duration: 01m 50s)
  • 15:19 topranks: Adding peering to AS139931 - Bangladesh Submarine Cable Company - at Equinix Singapore on cr3-eqsin
  • 14:42 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 13:42 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:55 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable DPL on ruwikinews (duration: 00m 27s)
  • 10:53 ladsgroup@deploy1002: Scap failed!: 3/6 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 10:52 ladsgroup@deploy1002: Scap failed!: 2/6 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 10:51 jynus: deploying 10 second mw user query limit on s3 codfw replicas
  • 10:49 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db2149', diff saved to https://phabricator.wikimedia.org/P16895 and previous config saved to /var/cache/conftool/dbconfig/20210726-104953-marostegui.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16894 and previous config saved to /var/cache/conftool/dbconfig/20210726-104649-marostegui.json
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db2149', diff saved to https://phabricator.wikimedia.org/P16893 and previous config saved to /var/cache/conftool/dbconfig/20210726-104613-marostegui.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2149', diff saved to https://phabricator.wikimedia.org/P16892 and previous config saved to /var/cache/conftool/dbconfig/20210726-103847-marostegui.json
  • 10:33 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:55 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:15 XioNoX: rollback sampling for T286038
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 08:26 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host sretest1001.eqiad.wmnet
  • 08:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 07:18 _joe_: docker-image prune on deneb T287222
  • 07:17 _joe_: manage-production-images prune on deneb, T287222
  • 07:08 marostegui: Optimize dewiki.logging in eqiad (there will be lag)
  • 06:39 moritzm: installing krb5 security updates
  • 05:55 Amir1: start cleaning up auto-review flagged revs logs in plwiki

2021-07-24

  • 11:04 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=commonswiki --reason='OTRS -> VRTS renaming process; see Phab:T280392 and Phab:T280397' --move-subpages 'Commons:OTRS' 'Commons:Volunteer Response Team' 'Martin Urbanec' # T287321

2021-07-23

  • 19:11 topranks: Successfully re-pooled eqiad - reversed change from yesterday after successful line card replacement in cr2-codfw - T287110
  • 19:02 topranks: Re-pooling eqiad again after successful replacement of linecard in cr2-codfw T287110
  • 18:26 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:24 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:14 topranks: Turning up et-0/0/[0-1] and et-0/2/[0-1] interfaces on cr2-codfw after line card replacement slot 0.
  • 18:12 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 16:15 effie: enable puppet on mc-gp* hosts
  • 15:47 papaul: powerdown wdqs2002 for IDRAC reset
  • 15:45 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:44 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:11 elukey: stop ml-serve-ctrl1001 + gnt-instance modify -t plain ml-serve-ctrl1001.eqiad.wmnet on ganeti1009 + start instance back - T287238
  • 14:36 _joe_: rebuilding httpd-fcgi, mediawiki-http fixing logging T285384
  • 14:16 brennen: gitlab1001: running ansible to deploy fix puma exporter listen address (T275170)
  • 13:35 otto@deploy1002: Finished deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - T271232 (duration: 03m 32s)
  • 13:31 otto@deploy1002: Started deploy [analytics/refinery@15521b3]: Add property disabling gobblin lock - T271232
  • 12:16 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
  • 12:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
  • 12:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
  • 12:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309
  • 11:50 marostegui: Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - T287244
  • 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1446.eqiad.wmnet
  • 11:17 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1445.eqiad.wmnet
  • 11:11 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1443.eqiad.wmnet
  • 11:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[3-6].eqiad.wmnet
  • 11:00 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
  • 11:00 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1443,1445-1446].eqiad.wmnet with reason: new host
  • 10:58 arturo: adding packages to buster-wikimedia/thirdparty/kubeadm-k8s-1-19 @ apt1001
  • 10:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1442.eqiad.wmnet
  • 09:57 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1441.eqiad.wmnet
  • 09:49 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1440.eqiad.wmnet
  • 09:47 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1439.eqiad.wmnet
  • 09:20 hashar@deploy1002: Finished deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation (duration: 00m 11s)
  • 09:20 hashar@deploy1002: Started deploy [integration/docroot@edae2b4]: doc: add footer link to wikitech documentation
  • 08:59 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw144[0-2].eqiad.wmnet
  • 08:58 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1439.eqiad.wmnet
  • 08:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
  • 08:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1439-1442].eqiad.wmnet with reason: new host
  • 08:24 elukey: run 'gnt-instance modify -t plain ml-serve-ctrl1002.eqiad.wmnet' on ganeti1009 as test to track down latency/perf issues with kubelets
  • 03:11 ryankemper: T287223 Installed `nginx-light` on all of `cloudelastic*`, and it looks like `relforge` didn't need the upgrade. This operation is done.
  • 03:09 ryankemper: T287223 Installed `nginx-light` on all of `elastic1*` (eqiad)
  • 03:06 ryankemper: T287223 Installed `nginx-light` on all of `elastic2*` (codfw)
  • 02:53 ejegg: updated Fundraising CiviCRM from 819c11307d to 739c936298
  • 02:26 ryankemper: [WDQS] Pooled `wdqs1004` (all caught up on its mountain of lag)
  • 01:28 ejegg: updated payments-wiki from 844b59ee42 to cc5d14ea7f
  • 01:20 legoktm: legoktm@deneb:~$ docker rmi docker-registry.wikimedia.org/mwcachedir:0.0.1 # T287222

2021-07-22

  • 23:35 derick@deploy1002: Synchronized php-1.37.0-wmf.15/includes/preferences/DefaultPreferencesFactory.php: Backport: Make sure enable responsive mode UI reflects actual preference value (T285402) (duration: 00m 56s)
  • 19:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize several EventLogging -> Event Platfom migrations - T282855 T238138 T282562 T271168 (duration: 00m 55s)
  • 19:08 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 19:07 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[1-2].eqiad.wmnet
  • 19:07 mutante: mw1421, mw1422 - scap pull, re-pool as new API servers after reimaging, previously appservers
  • 19:06 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 19:05 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[1-2].eqiad.wmnet
  • 19:04 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[1-2].eqiad.wmnet
  • 19:00 urbanecm: Start server-side upload for 1 video file (T287061)
  • 18:59 otto@deploy1002: Finished deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - T271232 (duration: 03m 22s)
  • 18:58 urbanecm: Start server-side upload for 1 video file (T286489)
  • 18:56 urbanecm: Start server-side upload for 1 video file (T286665)
  • 18:56 otto@deploy1002: Started deploy [analytics/refinery@3115f9e]: Set gobblin job.lock.dir after all - T271232
  • 18:53 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript namespaceDupes.php --wiki=hewikisource --fix # T286500
  • 18:52 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 26c23de: hewikisource: Add namespace aliases (T286500) (duration: 00m 55s)
  • 18:50 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
  • 18:48 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 599c220: enwikisource: Create upload-shared user group (T285130) (duration: 00m 56s)
  • 18:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 18:41 otto@deploy1002: Finished deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - T271232 (duration: 03m 18s)
  • 18:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 18:38 otto@deploy1002: Started deploy [analytics/refinery@1ef4fe1]: bin/gobbin wrapper now avoids launching if job is already running - T271232
  • 18:32 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6a90930: Enable the visual editor on the 2021 namespace on Wikimania wiki (T287197) (duration: 00m 55s)
  • 18:23 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 18:22 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 18:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f765832: Add digital.ub.umu.se to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T287204) (duration: 00m 55s)
  • 18:13 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 18:10 legoktm: testing dc switchover warmup script in eqiad
  • 18:10 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 17:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 100%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16866 and previous config saved to /var/cache/conftool/dbconfig/20210722-174357-root.json
  • 17:41 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service (duration: 00m 20s)
  • 17:41 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b414857]: Mirror 10% of maps2007 traffic to the Tegola service
  • 17:33 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
  • 17:33 mbsantos@deploy1002: Started deploy [kartotherian/deploy@bbb7ba8]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
  • 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 75%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16865 and previous config saved to /var/cache/conftool/dbconfig/20210722-172853-root.json
  • 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
  • 17:22 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
  • 17:22 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
  • 17:21 mbsantos@deploy1002: Started deploy [kartotherian/deploy@b173b4f]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
  • 17:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 50%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16864 and previous config saved to /var/cache/conftool/dbconfig/20210722-171349-root.json
  • 17:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 12s)
  • 17:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
  • 17:03 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 21s)
  • 17:03 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@67a0db1]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
  • 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 25%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16863 and previous config saved to /var/cache/conftool/dbconfig/20210722-165846-root.json
  • 16:56 brennen: gitlab1001: running ansible to deploy gerrit:706396 (T275170)
  • 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 15%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16862 and previous config saved to /var/cache/conftool/dbconfig/20210722-164342-root.json
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 10%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16861 and previous config saved to /var/cache/conftool/dbconfig/20210722-162838-root.json
  • 16:27 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
  • 16:25 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2010.codfw.wmnet with reason: REIMAGE
  • 16:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring (duration: 00m 20s)
  • 16:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@0a38bc5]: Rollback maps2007 mirroring
  • 16:20 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op) (duration: 00m 20s)
  • 16:20 mbsantos@deploy1002: Started deploy [kartotherian/deploy@fb4bc10]: Preparing maps2007 to mirror traffic to the Tegola service (no-op)
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2091 (re)pooling @ 5%: After onsite maintenance', diff saved to https://phabricator.wikimedia.org/P16860 and previous config saved to /var/cache/conftool/dbconfig/20210722-161333-root.json
  • 15:45 marostegui: Stop db2091 for onsite maintenance
  • 15:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091', diff saved to https://phabricator.wikimedia.org/P16859 and previous config saved to /var/cache/conftool/dbconfig/20210722-154408-marostegui.json
  • 15:22 moritzm: installing dnspython bugfix updates from Buster 10.10 point release
  • 15:14 mmandere: pool lvs1015 - T286065
  • 15:14 jynus: shutdown db2097 for hw servicing T287072
  • 15:11 moritzm: re-enabled puppet after row C switch maintenance completed
  • 15:11 mmandere: pool cp108[3-6].eqiad.wmnet - T286065
  • 14:58 moritzm: disabled puppet temporarily for Row C switch maintenance
  • 14:50 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
  • 14:50 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1015.eqiad.wmnet with reason: Eqiad row C maintenance
  • 14:47 mmandere: depool lvs1015 - T286065
  • 14:40 mmandere@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
  • 14:40 mmandere@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1083-1086].eqiad.wmnet with reason: Eqiad row C maintenance
  • 14:37 mmandere: depool cp108[3-6].eqiad.wmnet - T286065
  • 14:29 effie: restarting pybal in lvs2009 and lvs1015
  • 14:27 moritzm: installing libwebp security updates on stretch
  • 14:25 effie: restarting pybal in lvs2010 and lvs1016
  • 14:22 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:20 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0208fc2: Growth: Add mentor dashboard related config (T278920) (duration: 00m 55s)
  • 13:52 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:47 kormat@cumin1001: START - Cookbook sre.dns.netbox
  • 13:04 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.15
  • 12:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
  • 12:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1014.eqiad.wmnet with reason: REIMAGE
  • 12:40 Amir1: cleaning flaggedrevs auto-approve logs in dewiki
  • 12:17 Amir1: cleaning rest of auto-approve logs of ruwiki
  • 12:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
  • 12:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1421-1422].eqiad.wmnet with reason: new host
  • 11:36 Lucas_WMDE: EU backport+config window done
  • 11:35 hnowlan_: removing maps2010 from old maps cassandra cluster
  • 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized w/touch.php: Config: Avoid using MWHttpRequest::factory() (2/2) (duration: 01m 04s)
  • 11:34 lucaswerkmeister-wmde@deploy1002: Synchronized w/favicon.php: Config: Avoid using MWHttpRequest::factory() (1/2) (duration: 01m 04s)
  • 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized w/robots.php: Config: Avoid using WikiPage::factory() (duration: 01m 06s)
  • 10:59 mutante: mw1421, mw1422 - puppetmaster - cleaning certs, reimaged hosts
  • 10:45 effie: restart pybal on lvs2009 and lvs1015
  • 10:45 jiji@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=mwdebug
  • 10:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
  • 10:42 effie: restart pybal on lvs2010 and lvs1016
  • 10:40 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
  • 10:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
  • 10:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
  • 10:19 mutante: mw1421, mw1422 - converting from app to API server for balance in row A
  • 10:09 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1422.eqiad.wmnet
  • 10:01 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 10:01 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 09:11 XioNoX: depool eqiad to reduce load on one codfw-eqiad link - T287110
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 08:34 XioNoX: cr2-codfw> request chassis fpc slot 0 offline - T287110
  • 07:24 hashar@deploy1002: Finished deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0 (duration: 00m 09s)
  • 07:24 hashar@deploy1002: Started deploy [integration/docroot@b3e39b0]: build: Updating mediawiki/mediawiki-codesniffer to 37.0.0
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170 (s2, s7), pool db1105 (s2) and db1098 (s7) into dump T286888', diff saved to https://phabricator.wikimedia.org/P16844 and previous config saved to /var/cache/conftool/dbconfig/20210722-070114-marostegui.json
  • 06:20 ryankemper: [WDQS] Pooled `wdqs1006` (was still depooled following data-transfer cookbook runs from several hours ago)
  • 05:41 ryankemper: [WDQS] Restarted `wdqs-blazegraph` on `wdqs1013`
  • 05:31 ryankemper: T281327 [Elastic] Unbanned `elastic2043.codfw.wmnet` from all 3 cirrus/elasticsearch clusters; node is back in the fleet
  • 00:52 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE
  • 00:50 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2043.codfw.wmnet with reason: REIMAGE

2021-07-21

  • 23:15 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score on enwikisource, plwikisource. Disable on all private/lockeddown wikis (T257066) (duration: 01m 03s)
  • 22:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:41 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
  • 22:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh categories journal to resolve categories update lag unknown alert status" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs`
  • 22:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:36 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 20:27 dancy: testing upcoming Scap release on beta
  • 18:27 ryankemper: T281327 [Elastic] `sudo -i wmf-auto-reimage-host -p T281327 elastic2043.codfw.wmnet` on `ryankemper@cumin2001` tmux session `reimage_elastic2043`
  • 18:21 ryankemper: [WDQS] Restarted `wdqs-updater` on `wdqs1004`
  • 18:19 XioNoX: cr2-codfw> request chassis fpc slot 0 restart
  • 18:17 ryankemper: [WDQS] Depooled `wdqs1004` and restarted `wdqs-blazegraph`
  • 18:17 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
  • 18:16 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/DiscussionTools/modules/ReplyLinksController.js: 1453831: Do not teardown newtopictool interface if it was not setup (T287035) (duration: 01m 04s)
  • 18:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/DiscussionTools/modules/ReplyLinksController.js: aca510b: Do not teardown newtopictool interface if it was not setup (T287035) (duration: 01m 05s)
  • 18:14 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2003.codfw.wmnet with reason: REIMAGE
  • 17:12 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
  • 17:10 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2002.codfw.wmnet with reason: REIMAGE
  • 16:50 urbanecm: [urbanecm@mwmaint2002 ~]$ time /usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/updateMenteeData.php # T285811
  • 16:32 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.repo.php: Backport: Define PREFETCHING_TERM_LOOKUP for all types in client and repo (T287085) (2/2) – I think this might be the fastest way to fix the errors (duration: 01m 05s)
  • 16:27 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.15/extensions/WikibaseLexeme/WikibaseLexeme.entitytypes.php: restore previous state after previous scap failed on canaries with seemingly legitimate error (duration: 01m 04s)
  • 16:23 lucaswerkmeister-wmde@deploy1002: scap failed: average error rate on 2/6 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 16:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
  • 16:17 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2001.codfw.wmnet with reason: REIMAGE
  • 16:04 jbond: upload cas_6.3.2-1+wmf10u1 to apt
  • 15:45 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
  • 15:45 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:05:00 on 10.3.0.1,cumin1001.mgmt,debmonitor.wikimedia.org with reason: testing new feature
  • 15:35 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet
  • 15:34 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
  • 15:17 moritzm: installing intel-microcode security updates on stretch
  • 15:11 moritzm: installing apt bugfix updates from Buster 10.10 point release
  • 14:51 reedy@deploy1002: Finished scap: Fix some VE translation issues for T286679 (duration: 04m 45s)
  • 14:46 reedy@deploy1002: Started scap: Fix some VE translation issues for T286679
  • 14:40 papaul: powerdown ms-be2038 for BBU replacement
  • 14:33 dzahn@cumin1001: conftool action : set/weight=10; selector: name=mw143[7-8].eqiad.wmnet,service=canary
  • 14:19 hashar@deploy1002: Finished deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # T280197 (duration: 00m 09s)
  • 14:19 hashar@deploy1002: Started deploy [gerrit/gerrit@a5c9d35]: Update its-phabricator: Urlencode POST to conduit # T280197
  • 14:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1294.eqiad.wmnet
  • 14:07 Jeff_Green: authdns-update to remove deprecated records related to fundraising.wikimedia.org
  • 14:01 moritzm: imported systemd 241-5~bpo9+wmf1 to component/systemd241 T287036
  • 13:52 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1294.eqiad.wmnet
  • 13:49 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1293.eqiad.wmnet
  • 13:39 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1293.eqiad.wmnet
  • 13:20 jiji@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=mwdebug
  • 13:15 jiji@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mwdebug
  • 13:10 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.15 (duration: 01m 13s)
  • 13:09 godog: apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/705705 to puppetdb hosts
  • 13:09 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.15
  • 13:09 mutante: mw1293, mw1294 - formerly jobrunner canaries, depooled, replaced by new jobrunner canaries mw1437, mw1438
  • 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[34].eqiad.wmnet
  • 13:04 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[78].eqiad.wmnet
  • 12:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
  • 12:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1437-1438].eqiad.wmnet with reason: new host
  • 11:26 hashar@deploy1002: Finished deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles (duration: 00m 09s)
  • 11:26 hashar@deploy1002: Started deploy [integration/docroot@0515d9c]: Support linking to individual doc.wikimedia.org tiles
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d6699da: GrowthExperiments: Add more wikis to linkrecommendation experiment (T284481) (duration: 01m 31s)
  • 10:50 moritzm: installing systemd security updates on bullseye
  • 10:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
  • 10:14 effie: enable puppet on mw* servers
  • 09:57 XioNoX: Pushed FW rules to pfw3-eqiad/codfw - T287038
  • 09:34 jynus: restart db2097 T287072
  • 09:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.15
  • 09:05 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:00 hashar@deploy1002: Finished scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # T281156 (duration: 45m 51s)
  • 08:45 effie: disble puppet on codfw mw hosts to deploy 702592
  • 08:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 08:31 godog: upgrade karma on alert hosts - T284213
  • 08:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 T281058
  • 08:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s1 T281058
  • 08:17 effie: enable puppet on alert*
  • 08:15 hashar@deploy1002: Started scap: testwiki to php-1.37.0-wmf.15 and rebuild l10n cache # T281156
  • 08:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.12 (duration: 01m 35s)
  • 08:05 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.11 (duration: 11m 51s)
  • 08:02 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 07:56 XioNoX: push extra sampling on cr2-eqiad - T286038
  • 07:44 XioNoX: push extra sampling on cr1-eqiad - T286038
  • 07:38 XioNoX: update RIS peer IP on cr2-codfw
  • 07:16 godog: powercycle ms-be2048
  • 07:03 moritzm: installing systemd security updates on stretch
  • 06:51 effie: restart memcached on eqiad mc* hosts
  • 06:51 effie: enable puppet on mc* hosts
  • 06:35 effie: disable puppet on mc1* hosts and icinga - T271967
  • 05:40 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-07-20

  • 20:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: caa5a07: Set wgGEMentorDashboardBackendEnabled properly (T285811) (duration: 00m 57s)
  • 20:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/maintenance/updateMenteeData.php: dafd953: updateMenteeData: Make it possible to disable script per-wiki (T285811) (duration: 00m 58s)
  • 18:57 urbanecm: Start server-side upload for 4 large PNG files (T285708)
  • 18:05 Jeff_Green: authdns-update to point fundraising.wm.o CNAME to a new server
  • 17:57 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
  • 17:55 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1001.eqiad.wmnet with reason: REIMAGE
  • 17:06 rzl: enabled puppet on A:mw
  • 16:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
  • 16:54 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: dealing with an-master1001 rebuild issue
  • 16:53 rzl: disabled puppet on A:mw to test https://gerrit.wikimedia.org/r/676508
  • 16:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
  • 16:53 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 64 hosts with reason: dealing with an-master1001 rebuild issue
  • 16:44 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 16:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1297.eqiad.wmnet
  • 16:25 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1297.eqiad.wmnet
  • 16:21 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1290.eqiad.wmnet
  • 16:11 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1290.eqiad.wmnet
  • 16:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1289.eqiad.wmnet
  • 15:59 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1289.eqiad.wmnet
  • 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw129[07].eqiad.wmnet
  • 15:57 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1289.eqiad.wmnet
  • 15:48 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:23 vgutierrez: pool dns1002 - T286069
  • 15:21 vgutierrez: pool cp[1087-1090].eqiad.wmnet - T286069
  • 15:19 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
  • 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1297.eqiad.wmnet
  • 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1290.eqiad.wmnet
  • 15:14 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1289.eqiad.wmnet
  • 15:06 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T281058
  • 15:06 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T281058
  • 14:53 urbanecm: Start server-side upload for 7 large PNG files (T285708)
  • 14:51 herron: depooled and scheduled downtime for kafka-main100[45]
  • 14:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
  • 14:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs1016.eqiad.wmnet with reason: eqiad row D maintenance
  • 14:48 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
  • 14:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dns1002.wikimedia.org with reason: eqiad row D maintenance
  • 14:46 vgutierrez: depool dns1002 - T286069
  • 14:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
  • 14:40 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cp[1087-1090].eqiad.wmnet with reason: eqiad row D maintenance
  • 14:36 vgutierrez: depool cp[1087-1090].eqiad.wmnet - T286069
  • 14:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 T281058
  • 14:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 18 hosts with reason: Deploying schema change to s8 T281058
  • 14:25 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:25 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 14:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 14:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T281058
  • 14:12 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T281058
  • 14:09 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:08 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:03 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2009.codfw.wmnet
  • 14:00 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 13:56 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2008.codfw.wmnet
  • 13:50 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T281058
  • 13:50 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T281058
  • 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 13:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[89].codfw.wmnet
  • 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(10|0[1-9]).codfw.wmnet
  • 13:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T281058
  • 13:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T281058
  • 13:14 gehel: set/pooled=inactive on elastic1039 - disk failure - T285643
  • 13:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T281058
  • 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T281058
  • 13:13 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
  • 12:44 moritzm: installing systemd security updates on buster
  • 12:23 elukey: reboot ml-serve-ctrl vms to pick up new vcores settings
  • 12:22 elukey: bump vcpus from 2 to 4 on ml-serve-ctrl VMs on Ganeti (load/cpu usage increased steadily since we deployed kubelets on them)
  • 11:58 Lucas_WMDE: EU config+backport window done
  • 11:58 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: Avoid using User::newFrom* methods (3/3) (duration: 00m 56s)
  • 11:58 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
  • 11:58 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on maps1007.eqiad.wmnet with reason: Testing impact of tilerator
  • 11:56 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Avoid using User::newFrom* methods (2/3) (duration: 00m 56s)
  • 11:55 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/wikitech.php: Config: Avoid using User::newFrom* methods (1/3) (duration: 00m 56s)
  • 11:48 urbanecm@deploy1002: Synchronized logos/config.yaml: e52ae37: otrs_wikiwiki: Update logo to use VRT instead of OTRS (T280400; 3/3) (duration: 00m 56s)
  • 11:47 urbanecm@deploy1002: Synchronized wmf-config/logos.php: e52ae37: otrs_wikiwiki: Update logo to use VRT instead of OTRS (T280400; 2/3) (duration: 00m 56s)
  • 11:46 urbanecm@deploy1002: Synchronized static/images/project-logos: e52ae37: otrs_wikiwiki: Update logo to use VRT instead of OTRS (T280400; 1/3) (duration: 00m 57s)
  • 11:35 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add patroller group for ckbwiki (T285221) (duration: 00m 57s)
  • 11:23 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Config: Typo fix: "the the" -> "the" (T201491) (2/2, beta) (duration: 00m 56s)
  • 11:22 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Typo fix: "the the" -> "the" (T201491) (1/2, prod) (duration: 00m 57s)
  • 11:18 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update config for language switching on pilot wikis (T286459) (duration: 00m 59s)
  • 11:06 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:03 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:58 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:57 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:53 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:43 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps100[79].eqiad.wmnet
  • 10:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps100[79].eqiad.wmnet
  • 10:11 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:39 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T281058
  • 09:39 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T281058
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2352.codfw.wmnet
  • 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw2352.codfw.wmnet
  • 08:02 btullis: racadm serveraction powercycle on an-worker1106 due to CPU soft lock-ups on host
  • 07:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host idp-test2001.wikimedia.org
  • 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2001.wikimedia.org
  • 07:10 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
  • 03:17 eileen: civicrm revision changed from 20e9ef6bbb to 819c11307d, config revision is bb405c5232

2021-07-19

  • 20:48 urbanecm: Deploy security patch for T286884
  • 20:29 vgutierrez: pool text@codfw - T286921
  • 20:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:18 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 20:08 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/export/WikiExporter.php: Backport: prevent PageIdentity checks in RevisionStore from breaking xml dumps (T286877) (duration: 00m 58s)
  • 19:21 Jeff_Green: authdns-update to remove payments100[1-4].frack.eqiad.wmnet
  • 19:14 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/Revision/RevisionStore.php: Backport: Add sanity check to newRevisionFromRowAndSlots. (T286877) (duration: 00m 57s)
  • 18:53 vgutierrez: running puppet and restarting pybal on lvs2009 - T286921
  • 18:46 topranks: Running homer to re-enable port xe-2/0/43 on asw2-a2-codfw (lvs2009) - T286921
  • 18:46 brennen: gerrit1001: restarting gerrit
  • 18:40 vgutierrez: stop pybal on lvs2009 - T286921
  • 18:38 brennen: re-enabling puppet on gerrit1001]
  • 18:35 vgutierrez: running puppet and restarting pybal on lvs2010 - T286921
  • 18:27 ryankemper: T264053 Deploying fix for timer issue on relforge: `ryankemper@cumin1001:~$ sudo cumin -b 2 'P{relforge*}' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
  • 18:27 topranks: Running homer to re-enable port xe-2/0/44 on asw2-a2-codfw (lvs2010)
  • 18:27 ryankemper: T264053 Deploying fix for timer issue on cloudelastic: `ryankemper@cumin1001:~$ sudo cumin -b 6 'P{cloudelastic*}' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
  • 18:22 vgutierrez: disable puppet & stop pybal on lvs2010 - T286921
  • 18:20 vgutierrez: enabling pybal on lvs2007 - T286921
  • 18:19 ryankemper: T264053 Deploying fix for timer issue: `ryankemper@cumin1001:~$ sudo cumin -b 36 'P{elastic*}' 'sudo systemctl stop elasticsearch-disable-readahead.timer && sudo systemctl disable elasticsearch-disable-readahead.timer && rm -fv /etc/systemd/system/elasticsearch-disable-readahead.timer && rm -fv /usr/lib/systemd/system/elasticsearch-disable-readahead.timer && sudo run-puppet-agent'`
  • 18:14 topranks: Running homer to re-enable asw-a2-codfw xe-2/0/45 port [lvs2007]
  • 18:06 dancy@deploy1002: Synchronized .pipeline: Config: pipeline: Perform mergeMessageFileList and rebuildLocalisationCache separately (duration: 00m 56s)
  • 17:54 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
  • 17:54 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 22s)
  • 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:53 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 21s)
  • 17:53 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 15s)
  • 17:52 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:52 mbsantos@deploy1002: Finished deploy [tilerator/deploy@82e5f94]: (no justification provided) (duration: 00m 16s)
  • 17:51 mbsantos@deploy1002: Started deploy [tilerator/deploy@82e5f94]: (no justification provided)
  • 17:42 ryankemper: [Elastic] Noted `Jul 16 18:31:20 elastic2038 elasticsearch[957]: 2021-07-16 18:31:20,657 main ERROR Unknown GELF server hostname:udp:logstash.svc.eqiad.wmnet` in elasticsearch service logs (unit had been running for 2 days) thus the restart of the elasticsearch service
  • 17:41 ryankemper: [Elastic] Restarted elasticsearch services on `elastic2038`; afterwards restarted prometheus exporters; no units failed any longer
  • 17:30 volans: running puppet on elastic2038 after nework was restored
  • 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 14s)
  • 17:26 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:26 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 16s)
  • 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:25 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
  • 17:25 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:24 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
  • 17:24 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:23 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@978b674]: (no justification provided) (duration: 00m 21s)
  • 17:23 volans: running authdns-update to force-update authdns2001
  • 17:23 mbsantos@deploy1002: Started deploy [kartotherian/deploy@978b674]: (no justification provided)
  • 17:23 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:21 XioNoX: remove ns1 redirect - T286787
  • 17:19 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:17 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:14 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 17:13 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1286-1287].eqiad.wmnet
  • 17:10 XioNoX: enable asw-a2-codfw access ports - T286787
  • 17:04 XioNoX: enable cr1-codfw / et-0/0/0 - T286787
  • 16:54 brennen: gerrit up and running with manual configuration edit to use ipv4 address
  • 16:51 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=logstash2021.codfw.wmnet
  • 16:51 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1286-1287].eqiad.wmnet
  • 16:46 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1284.eqiad.wmnet
  • 16:40 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001 (duration: 00m 08s)
  • 16:40 hashar: Upgrading gerrit1001 with dancy & brennen
  • 16:40 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit1001
  • 16:40 XioNoX: update asw-a2-codfw serial number - T286787
  • 16:39 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 16:33 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1284.eqiad.wmnet
  • 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
  • 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: maintenance
  • 16:31 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
  • 16:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2001.wikimedia.org with reason: maintenance
  • 16:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
  • 16:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash2021.codfw.wmnet with reason: maintenace
  • 16:21 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 16:21 hashar: upgrading gerrit replica on gerrit2001 and restarting
  • 16:21 mutante: depooled logstash2021 for dcops maintenance work
  • 16:20 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=logstash2021.codfw.wmnet
  • 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[6-7].eqiad.wmnet
  • 16:19 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1284.eqiad.wmnet
  • 16:18 dancy@deploy1002: Finished deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001 (duration: 00m 10s)
  • 16:18 dancy@deploy1002: Started deploy [gerrit/gerrit@4f29981]: Gerrit to 3.2.11 on gerrit2001
  • 16:15 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 310be45f7 (duration: 00m 57s)
  • 16:12 mutante: mw1434, mw1435, mw1436 - new API appservers in production, pooled first time
  • 16:11 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[4-6].eqiad.wmnet
  • 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[5-6].eqiad.wmnet
  • 16:08 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1434.eqiad.wmnet
  • 15:59 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: I2bdfbd258e (duration: 00m 57s)
  • 15:57 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I069c7b53 (duration: 00m 58s)
  • 15:49 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 15:43 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 15:12 sukhe: ran homer for Gerrit 705374: Remove malmok.wikimedia.org from anycast_neighbors in codfw
  • 15:10 godog: +100G to prometheus/ops in codfw
  • 14:59 vgutierrez: rolling restart of eqiad pybal instances
  • 14:43 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
  • 14:43 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1434-1436].eqiad.wmnet with reason: new host
  • 14:42 vgutierrez: rolling restart of codfw pybal instances
  • 14:33 vgutierrez: rolling restart of eqsin pybal instances
  • 14:23 vgutierrez: rolling restart of ulsfo pybal instances
  • 14:01 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
  • 13:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1273-1275].eqiad.wmnet
  • 13:55 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
  • 13:55 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
  • 13:47 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1273-1275].eqiad.wmnet
  • 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
  • 13:44 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1272.eqiad.wmnet
  • 13:41 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 13:32 sukhe@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 13:31 sukhe@cumin1001: START - Cookbook sre.dns.netbox
  • 13:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1272.eqiad.wmnet
  • 13:21 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1270.eqiad.wmnet
  • 13:12 jayme@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:09 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1270.eqiad.wmnet
  • 13:09 sukhe@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts malmok.wikimedia.org
  • 13:08 jayme@cumin1001: START - Cookbook sre.dns.netbox
  • 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1272.eqiad.wmnet
  • 13:06 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw127[3-5].eqiad.wmnet
  • 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1415.eqiad.wmnet,service=canary
  • 13:01 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts malmok.wikimedia.org
  • 13:01 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw1414.eqiad.wmnet,service=canary
  • 11:40 moritzm: installing bluez security updates
  • 11:31 Lucas_WMDE: EU backport+config window done
  • 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Add config for updated PropertySuggester beta cluster (T285098) (beta-only) (duration: 00m 57s)
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host copernicium.wikimedia.org
  • 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host copernicium.wikimedia.org
  • 09:52 moritzm: imported megacli for bullseye-wikimedia T282272 T275873
  • 09:43 topranks: Running homer against cr2-eqdfw to change descr and move interface ae0, which connects to Facebook, into the external-links group.
  • 09:30 godog: bounce prometheus@k8s* on prometheus2004 due to cache not refreshing alert
  • 08:15 vgutierrez: depool codfw text traffic
  • 07:11 elukey: roll restart kafka mirror maker on kafka-main200* hosts - stuck after Friday's events/incident
  • 03:26 twentyafterfour: restarted phd on phab1001
  • 03:25 twentyafterfour: investigating PHD failure

2021-07-16

  • 19:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
  • 19:48 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
  • 19:10 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
  • 19:08 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on copernicium.wikimedia.org with reason: REIMAGE
  • 18:29 ryankemper: [Elastic] Kicked off powercycle on `elastic2038`, this will effectively restart its `elasticsearch_6@production-search-omega-codfw.service`. We're back to 3 eligible masters for `codfw-omega`
  • 18:28 ryankemper: [Elastic] Restarted `elasticsearch_6@production-search-omega-codfw.service` on `elastic2051`; will restart on `elastic2038` by powercycling the node from mgmt port given that it is ssh unreachable
  • 18:24 ryankemper: [Elastic] `puppet-merge`d https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973; ran puppet across `elastic2*` hosts: `sudo cumin 'P{elastic2*}' 'sudo run-puppet-agent'` (puppet run succeeded on all but the 3 nodes taken offline by the switch failure: `elastic[2037-2038,2055].codfw.wmnet`)
  • 18:19 ryankemper: [Elastic] Given that we will likely have switch A3 out of commission over the weekend, Search team is going to change masters so that we no longer have a master in row A3. New desired config: `B1 (elastic2042), C2 (elastic2047), D2 (elastic2051)`, see https://gerrit.wikimedia.org/r/c/operations/puppet/+/704973
  • 16:29 vgutierrez: restarting pybal on lvs2009 to decrease api depool threshold
  • 15:48 vgutierrez: restart pybal on lvs2010
  • 15:38 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:34 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 15:24 godog: downtime flappy pages in codfw for 40 minutes
  • 15:14 godog: set alert2001 as active in netbox (was staged) - T247966
  • 15:14 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifup ens2f1np1
  • 14:41 vgutierrez: vgutierrez@lvs2009:~$ sudo -i ifdown ens2f1np1
  • 14:40 topranks: Running homer to disable et-0/0/0 on cr1-codfw, which connects to currently dead device asw-a2-codfw T286787
  • 14:40 topranks: Ran homer against asw-a-codfw virtual-chassis to change the config for all ports on dead switch asw-a2-codfw to disabled.
  • 14:40 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:37 vgutierrez: vgutierrez@lvs2010:~$ sudo -i ifdown ens2f1np1
  • 14:14 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps1004.eqiad.wmnet
  • 14:11 jiji@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
  • 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 14:07 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 13:45 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
  • 13:45 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
  • 13:44 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 13:02 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw143[0-3].eqiad.wmnet
  • 12:56 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1429.eqiad.wmnet
  • 12:49 mutante: mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers (T279309)
  • 12:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
  • 12:48 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1430-1433].eqiad.wmnet with reason: new host
  • 12:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
  • 12:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: new host
  • 12:39 mutante: mw1412 through mw1428 - set to active in netbox (T279309)
  • 12:39 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 12:36 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1429.eqiad.wmnet
  • 12:35 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw143[0-3].eqiad.wmnet
  • 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw143[0-3].eqiad.wmnet
  • 12:35 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1429.eqiad.wmnet
  • 12:30 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 12:26 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[6-8].eqiad.wmnet
  • 12:17 mutante: mw1426,mw1427,mw1428 - scap pull
  • 12:16 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[6-8].eqiad.wmnet
  • 12:16 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[6-8].eqiad.wmnet
  • 12:14 mutante: mw1426, mw1427, mw1428, rebooting, new API servers moving into production
  • 12:12 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
  • 12:12 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1426-1428].eqiad.wmnet with reason: new host
  • 12:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 11:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
  • 11:33 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2009.codfw.wmnet with reason: Service profiling tests
  • 11:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 11:29 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 8:00:00 on planet1002.eqiad.wmnet with reason: known issue
  • 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
  • 09:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2007.codfw.wmnet
  • 09:47 jayme: cordon kubestage1002.eqiad.wmnet as it currently does not feed logs to logstash
  • 09:32 jelto: restart rsyslog on kubestage1001.eqiad.wmnet
  • 09:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for T273281
  • 09:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1031.eqiad.wmnet with reason: Rebooting for T273281
  • 08:54 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for T273281
  • 08:54 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1030.eqiad.wmnet with reason: Rebooting for T273281
  • 08:28 kormat@cumin1001: dbctl commit (dc=all): 'Depooling db1127 due to RAM failures T286763', diff saved to https://phabricator.wikimedia.org/P16827 and previous config saved to /var/cache/conftool/dbconfig/20210716-082829-kormat.json
  • 00:06 hoo: Updated the Wikidata property suggester with data from the 2021-07-12 JSON dump (with pre-applied T132839 workarounds)

2021-07-15

  • 23:32 brennen: checking stashbot: T286756
  • 23:28 brennen@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GlobalWatchlist/modules/watchlistUtils.js: Backport: Fix creation of mw.Message objects (T286385) (duration: 00m 57s)
  • 20:44 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=viwiki # T285811
  • 20:26 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=bnwiki # T285811
  • 20:11 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki # T285811
  • 20:10 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=cswiki #
  • 20:07 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 20:07 nskaggs@cumin1001: Added views for new wiki: shiwiki T284928
  • 19:54 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.14
  • 19:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
  • 19:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1013.eqiad.wmnet with reason: REIMAGE
  • 19:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
  • 19:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
  • 19:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1012.eqiad.wmnet with reason: REIMAGE
  • 19:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: REIMAGE
  • 19:45 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 19:28 volker-e@deploy1002: Finished deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: eebdc4d “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484) (duration: 00m 05s)
  • 19:28 volker-e@deploy1002: Started deploy [design/style-guide@eebdc4d]: Deploy design/style-guide: eebdc4d “Visual style – Icons”: Add Figma colors & icons file as source of truth (#484)
  • 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 19:26 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 19:05 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:35 reedy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/EventLogging/includes/JsonSchemaHooks.php: T286611 (duration: 01m 06s)
  • 18:34 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/EventLogging/includes/JsonSchemaHooks.php: T286611 (duration: 01m 07s)
  • 18:32 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:11 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs (duration: 05m 41s)
  • 17:07 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:05 otto@deploy1002: Started deploy [analytics/refinery@7a673c9] (hadoop-test): Deploy refinery-source 0.1.15 to hadoop-test with fixes for Refine jobs
  • 17:00 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:00 otto@deploy1002: Finished deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs (duration: 17m 21s)
  • 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
  • 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 16:43 otto@deploy1002: Started deploy [analytics/refinery@7a673c9]: Deploy refinery-source 0.1.15 with fixes for Refine jobs
  • 16:40 ejegg: updated payments-wiki from d9892207c1 to 844b59ee42
  • 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 16:39 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
  • 16:27 ejegg: updated fundraising CiviCRM from e0d53c92b5 to 20e9ef6bbb
  • 16:24 ejegg: updated payments-wiki from 0e7800027a to 844b59ee42
  • 16:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
  • 16:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for T273281
  • 16:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1029.eqiad.wmnet with reason: Rebooting for T273281
  • 15:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for T273281
  • 15:51 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1028.eqiad.wmnet with reason: Rebooting for T273281
  • 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for T273281
  • 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1027.eqiad.wmnet with reason: Rebooting for T273281
  • 15:16 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs: Allow admins of idwiki to change stablesettings (T268317), try II (duration: 01m 05s)
  • 15:03 Amir1: temporary becoming admin on idwiki to debug T268317
  • 15:02 moritzm: installing nginx security updates on ms-fe*
  • 14:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for T273281
  • 14:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1026.eqiad.wmnet with reason: Rebooting for T273281
  • 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
  • 14:40 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 14:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
  • 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for T273281
  • 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1024.eqiad.wmnet with reason: Rebooting for T273281
  • 14:11 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade T273281
  • 14:11 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2023.codfw.wmnet,es[1023-1025].eqiad.wmnet with reason: Rebooting es1024 (es5 eqiad primary) for kernel upgrade T273281
  • 14:10 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1301.eqiad.wmnet
  • 13:57 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
  • 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps200[7-9].codfw.wmnet
  • 13:47 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for T273281
  • 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es1021.eqiad.wmnet with reason: Rebooting for T273281
  • 13:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade T273281
  • 13:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on es2021.codfw.wmnet,es[1020-1022].eqiad.wmnet with reason: Rebooting es1021 (es4 eqiad primary) for kernel upgrade T273281
  • 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps20(0[1-6]|10).codfw.wmnet
  • 13:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps200[7-9].codfw.wmnet
  • 13:38 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1301.eqiad.wmnet
  • 13:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1300.eqiad.wmnet
  • 13:33 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw130[0-1].eqiad.wmnet
  • 13:22 mutante: mw1300, mw1301 - jobrunners going out of service, decom
  • 13:21 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1300.eqiad.wmnet
  • 13:20 jelto@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:17 jelto@cumin1001: START - Cookbook sre.dns.netbox
  • 13:12 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 13:10 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw130[0-1].eqiad.wmnet
  • 13:06 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1413.eqiad.wmnet
  • 13:05 mutante: mw1413 - pooling, was depooled but for unknown reason, dont see it in SAL, looks ok, scap pulled
  • 13:03 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw1422.eqiad.wmnet
  • 13:03 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for T273281
  • 13:03 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1165.eqiad.wmnet with reason: Rebooting for T273281
  • 13:02 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade T273281
  • 13:02 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebooting db1165 (s6 sanitarium master) for kernel upgrade T273281
  • 12:55 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for T273281
  • 12:55 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1127.eqiad.wmnet with reason: Rebooting for T273281
  • 12:54 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[3-5].eqiad.wmnet
  • 12:54 mutante: mw1423, mw1424, mw1425 - pooled as new API servers
  • 12:54 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
  • 12:53 dzahn@cumin1001: conftool action : set/weight=30; selector: name=mw142[3-5].eqiad.wmnet
  • 12:51 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw142[3-5].eqiad.wmnet
  • 12:41 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade T273281
  • 12:41 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db2142.codfw.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade T273281
  • 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for T273281
  • 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db1103.eqiad.wmnet with reason: Rebooting for T273281
  • 12:40 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade T273281
  • 12:40 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[1102-1103,1120,1137].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Rebooting db1103 (x1 primary) for kernel upgrade T273281
  • 12:35 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade T273281
  • 12:35 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1103.eqiad.wmnet with reason: Rebooting db1103 for kernel upgrade T273281
  • 12:34 mutante: mw1423, mw1424, mw1425 - scap pull
  • 12:31 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 12:09 mutante: mw1423,mw1424,mw1425 - rebooting
  • 11:48 moritzm: restarting restbase1028-1030 to pick up libuv security update
  • 11:47 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
  • 11:47 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1423-1425].eqiad.wmnet with reason: new host
  • 11:47 mutante: mw1423, mw1424, mw1425 - initial puppet run, new API appservers going into production
  • 11:43 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Make idwiki use protect mode of flaggedrevs (T268317) (duration: 01m 07s)
  • 11:40 moritzm: restarting Etherpad to pick up libuv security update
  • 11:37 moritzm: restarting Turnilo to pick up libuv security update
  • 11:34 moritzm: installing libuv1 security updates
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 10 hosts
  • 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
  • 11:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
  • 11:05 volans@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on thanos-fe2001.codfw.wmnet with reason: Extending downtime post-reimage
  • 10:56 volans: commented out cron-spam entries on thanos-fe2001, puppet is disabled, thanos-store.service fails to start - T285835
  • 10:41 godog: move wikibase.queryService.ui.app to wikibase.queryService.ui.index.app - T272128
  • 10:34 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 10:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 10:33 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 10:32 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 10:32 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:31 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 10:26 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
  • 10:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
  • 10:06 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 10:02 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:02 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 10:02 effie: disableing puppet on maps* for 704394
  • 09:38 dcausse@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'rdf-streaming-updater' for release 'main' .
  • 09:34 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 09:32 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 09:25 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 T278619
  • 09:25 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 12 hosts with reason: Deploying schema change to s3 T278619
  • 09:18 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 T278619
  • 09:18 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s1 T278619
  • 09:15 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 T278619
  • 09:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s7 T278619
  • 09:11 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-8].eqiad.wmnet
  • 09:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T278619
  • 09:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T278619
  • 09:04 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T278619
  • 09:04 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T278619
  • 08:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T278619
  • 08:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T278619
  • 08:56 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T278619
  • 08:56 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T278619
  • 08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 08:31 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2001.codfw.wmnet with reason: REIMAGE
  • 08:29 elukey: sudo rm /etc/rawdog/en/feeds/847a7185.state* on planet1002 (corrupted file) - backup in /home/elukey + restart planet-update-en.service
  • 08:12 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-8].eqiad.wmnet
  • 08:11 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
  • 08:11 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309
  • 07:48 moritzm: updated bullseye d-i image for latest daily build T275873
  • 07:31 godog: reimage thanos-fe2001 with bullseye - T285835
  • 07:23 elukey: restart planet-update-en.service on planet1002
  • 07:17 elukey: remove /etc/rawdog/en/{state,state.lock} on planet1002 (following what rawdog suggested) due to corrupted files (backups available in /home/elukey/en)
  • 06:51 elukey: restart phabricator_clean_tmp_files.service on phab1001 - transient error (tmp files already cleaned up)
  • 06:49 tstarling@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/UserOptionsManager.php: don't delete non-existent rows (T286521) (duration: 01m 06s)
  • 06:47 tstarling@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: don't delete non-existent rows (T286521) (duration: 01m 07s)
  • 05:50 kart_: Updated cxserver to 2021-07-14-124232-production (T282369, T284450)
  • 05:47 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:43 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:41 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 00:00 twentyafterfour: phabricator update deployed.

2021-07-14

  • 23:23 eileen: civicrm revision changed from b1c63470bb to e0d53c92b5, config revision is bb405c5232
  • 21:19 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.14 (duration: 01m 05s)
  • 21:18 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.14
  • 21:08 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/User.php: Backport: Move saving user options to onTransactionPreCommitOrIdle (T286521) (duration: 01m 05s)
  • 20:58 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/includes/user/User.php: Backport: Move saving user options to onTransactionPreCommitOrIdle (T286521) (duration: 01m 05s)
  • 20:51 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.14
  • 19:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.14/resources: Backport: Fix deprecated offset() on invalid DOM (T185629) (duration: 01m 07s)
  • 19:31 andrew@deploy1002: Finished deploy [horizon/deploy@156a984]: fix trove-dashboard bug (duration: 04m 18s)
  • 19:26 andrew@deploy1002: Started deploy [horizon/deploy@156a984]: fix trove-dashboard bug
  • 19:17 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 19:17 nskaggs@cumin1001: Added views for new wiki: dagwiki T284456
  • 18:55 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 18:54 nskaggs@cumin1001: END (ERROR) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=97)
  • 18:54 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 18:36 nskaggs@cumin1001: END (PASS) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=0)
  • 18:36 nskaggs@cumin1001: Added views for new wiki: banwikisource T284390
  • 18:30 razzi@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 18:14 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 17:52 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
  • 17:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
  • 17:49 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 10 hosts
  • 17:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 10 hosts
  • 17:39 razzi@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid's jvm daemons. - razzi@cumin1001
  • 17:35 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 17:35 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: Do not lock preferences row for a rememberpassword check (T286521) (duration: 01m 06s)
  • 17:00 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/CentralAuth/includes/specials/SpecialCentralAutoLogin.php: Backport: Do not lock preferences row for a rememberpassword check (T286521) (duration: 01m 05s)
  • 16:27 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
  • 16:26 root@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 16:11 dancy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Translate: Backport: TranslationAid: Handle empty message definition (T285830) and TranslationAid: Make sure to return successfully fetched definitions (T285830) (duration: 01m 09s)
  • 16:07 otto@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 15:37 moritzm: installing klibc security updates
  • 15:36 ottomata: deploying eventgate-analytics with direct service-runner promethues support
  • 15:34 ryankemper: [Elastic] Manually triggering readahead mitigation across whole fleet to prevent any further issues today: `ryankemper@cumin1001:~$ sudo cumin -b 12 'P{elastic*}' 'sudo systemctl restart elasticsearch-disable-readahead.service'` (still need to investigate why `elasticsearch-disable-readahead.timer` isn't re-firing every 30 mins as desired)
  • 15:34 moritzm: installing apache security updates on otrs1001 (ticket.wikimedia.org)
  • 15:34 otto@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 15:28 urbanecm: Start server-side upload of 3 large image files (T285708)
  • 15:16 moritzm: installing apache security updates on lists1001 (lists.wikimedia.org)
  • 14:51 moritzm: installing apache security updates on puppet masters
  • 14:47 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2384.codfw.wmnet
  • 14:47 effie: set mw2384 as inactive to investigate mw2383 issue - T286463
  • 14:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:44 moritzm: installing apache security updates on grafana*
  • 14:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:43 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:40 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:40 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:38 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw1422.eqiad.wmnet
  • 14:33 dcausse: runnning elasticsearch-madvise-random ES_PID on elastic2045
  • 14:31 dcausse: runnning elasticsearch-madvise-random 1022 on elastic2054
  • 14:23 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:19 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:19 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:13 elukey: restart php-fpm on mw2370
  • 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
  • 13:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 13:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277118
  • 13:09 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277118
  • 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1005.eqiad.wmnet
  • 12:43 urbanecm: Start server-side upload of 3 large image files (T285708)
  • 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1005.eqiad.wmnet
  • 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
  • 12:23 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 12:15 mutante: mw1422 - scap pull
  • 12:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1422.eqiad.wmnet
  • 12:02 moritzm: upgrading python3-wmflib fleetwide to 0.0.8 (needed for new logout.d wrapper)
  • 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
  • 12:01 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps2008.codfw.wmnet with reason: Bootstrapping cassandra in new cluster
  • 11:52 mutante: mw1422 - new setup, not in prod yet
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1733 hosts
  • 11:52 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
  • 11:52 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1422.eqiad.wmnet with reason: new host
  • 11:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1733 hosts
  • 11:49 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: Remove reviewer user group in ruwiki (T284589) (duration: 01m 05s)
  • 11:40 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
  • 11:39 ladsgroup@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs: Reduce levels for ruwiki to 1 (T284589) (duration: 01m 05s)
  • 11:37 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2008.codfw.wmnet with reason: REIMAGE
  • 11:23 ariel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw2383.codfw.wmnet
  • 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 72027e1: Disable indexing in NS_USER and NS_USER_TALK on bnwiki (T286152) (duration: 02m 07s)
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4dc11d2: Change category name of Babel extension on Javanese Wikipedia (T286165) (duration: 02m 10s)
  • 10:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
  • 10:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
  • 09:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277118
  • 09:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277118
  • 09:30 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277118
  • 09:30 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277118
  • 09:27 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php-1.37.0-wmf.14]$ time mwscript extensions/GrowthExperiments/maintenance/updateMenteeData.php --wiki=testwiki # T285811
  • 09:14 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277118
  • 09:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277118
  • 07:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277118
  • 07:49 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277118
  • 07:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277118
  • 07:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277118
  • 07:47 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T277118
  • 07:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 14 hosts with reason: Deploying schema change to s6 T277118
  • 00:58 eileen: process control updated to c291b3c
  • 00:58 eileen: c291b3c
  • 00:49 eileen: civicrm revision changed from bb62188ec6 to b1c63470bb, config revision is c291b3c689
  • 00:48 eileen: process-control config revision is c291b3c689
  • 00:15 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: fix conf cache conflict with Defines.php noticed in beta (duration: 02m 09s)

2021-07-13

  • 23:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.14/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: f362736: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector (T286587) (duration: 02m 08s)
  • 23:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/Specials/SpecialCreateAccountCampaign.php: f362736: SpecialCreateAccountCampaign: Ignore $wgLoginLanguageSelector (T286587) (duration: 02m 07s)
  • 23:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
  • 23:09 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
  • 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1006.eqiad.wmnet with reason: REIMAGE
  • 23:07 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1005.eqiad.wmnet with reason: REIMAGE
  • 23:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
  • 23:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1007.eqiad.wmnet with reason: REIMAGE
  • 22:22 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
  • 22:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1004.eqiad.wmnet with reason: REIMAGE
  • 22:18 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Use Score with lilypond's safe mode only (duration: 02m 06s)
  • 20:53 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 20:30 dancy@deploy1002: Synchronized php-1.37.0-wmf.14/includes/skins/Skin.php: Backport: links is flat array (T286040) (duration: 02m 07s)
  • 20:26 dancy@deploy1002: Pruned MediaWiki: 1.37.0-wmf.9 (duration: 04m 21s)
  • 20:19 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.14 (duration: 31m 56s)
  • 19:47 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.14
  • 19:02 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 17:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1283.eqiad.wmnet
  • 17:45 mutante: mw1283 - decom - powered off by cookbook
  • 17:44 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1283.eqiad.wmnet
  • 17:41 mutante: homer "asw2-a*eqiad*" commit "decom mw1282 - T280203"
  • 17:30 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
  • 17:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
  • 17:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mw1282.eqiad.wmnet
  • 17:09 mutante: mw1282 - decom, powered off
  • 17:08 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1282.eqiad.wmnet
  • 17:06 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1281.eqiad.wmnet
  • 17:05 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/user/UserOptionsManager.php: Backport: Do not lock user_preferences before updating (T286521) (duration: 01m 58s)
  • 16:59 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Firmware upgrade T286226
  • 16:59 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Firmware upgrade T286226
  • 16:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade T286226
  • 16:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1104.eqiad.wmnet with reason: Firmware upgrade T286226
  • 16:55 jbond: upload statograph to buster wikimedia
  • 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1281.eqiad.wmnet
  • 16:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom T28203
  • 16:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw[1282-1283].eqiad.wmnet with reason: decom T28203
  • 16:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom T28203
  • 16:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on mw1281.eqiad.wmnet with reason: decom T28203
  • 16:25 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw128[1-3].eqiad.wmnet
  • 15:34 topranks: Adding IX peering to AS393950 (Xiber LLC) on cr2-eqiad.
  • 15:20 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 15:19 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 14:52 volker-e@deploy1002: Finished deploy [design/style-guide@5c07233]: Deploy design/style-guide: 5c07233 “Components”: Add WikimediaUI theme Figma links to various components (#483) (duration: 00m 06s)
  • 14:52 volker-e@deploy1002: Started deploy [design/style-guide@5c07233]: Deploy design/style-guide: 5c07233 “Components”: Add WikimediaUI theme Figma links to various components (#483)
  • 14:35 nskaggs@cumin1001: END (FAIL) - Cookbook wmcs.wikireplicas.add_wiki (exit_code=99)
  • 14:35 nskaggs@cumin1001: START - Cookbook wmcs.wikireplicas.add_wiki
  • 13:57 otto@deploy1002: Finished deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job - T271232 (duration: 03m 28s)
  • 13:53 otto@deploy1002: Started deploy [analytics/refinery@a3bc8bc]: Add eventlogging_legacy gobblin job - T271232
  • 13:37 effie: rolling restart php-fpm across clusters - T286260
  • 13:33 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/lib/includes/SimpleCacheWithBagOStuff.php: Backport: Send TTL instead of expiry in unix timestamp in calling BagOStuff (T286260) (duration: 00m 58s)
  • 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 2 hosts
  • 13:29 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 2 hosts
  • 13:14 kormat: restarted replication on db1117:3325 T284622
  • 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Muehlenhoff out of all services on: 1732 hosts
  • 13:10 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
  • 13:10 hashar: Upgraded Apache on gerrit1001 and gerrit2001
  • 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
  • 13:08 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
  • 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 1732 hosts
  • 12:53 kormat: stopping replication on db1117:3325 T284622
  • 12:53 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 T284622
  • 12:53 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db1117.eqiad.wmnet with reason: Copy m5 from db1117 to db1183 T284622
  • 12:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 1732 hosts
  • 12:41 mutante: depooling and decom'ing eqiad API servers mw1281, mw1282, mw1283 - T280203
  • 12:40 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw128[1-3].eqiad.wmnet
  • 12:20 mutante: mwmaint1002 - scap pull after reimaging
  • 11:33 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
  • 11:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwmaint1002.eqiad.wmnet with reason: REIMAGE
  • 11:28 Lucas_WMDE: EU backport+config window done
  • 11:25 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Remove obsolete $wgShowDBErrorBacktrace config (duration: 01m 25s)
  • 11:13 mutante: mwmaint1002 - reimaging with buster (T267607)
  • 10:54 mutante: switching https://noc.wikimedia.org backened from eqiad to codfw for mwmaint1002 OS upgrade, not affecting config-master/pybal, tests passed (T267607)
  • 10:44 moritzm: upgrading apache on phab1001 (phabricator.wikimedia.org)
  • 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
  • 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2008.codfw.wmnet with reason: reimaging as buster replica
  • 10:39 hnowlan: running `nodetool decommission` on maps2008
  • 10:27 moritzm: installing apache security updates on alert1001 (icinga.wikimedia.org)
  • 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277116
  • 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s1 T277116
  • 10:18 moritzm: installing apache security updates on Logstash hosts
  • 09:58 moritzm: upgrading PHP/Apache on matomo1002 (piwik.wikimedia.org)
  • 09:40 moritzm: installing apache security updates on thanos-fe hosts
  • 09:38 moritzm: installing apache security updates on parsoid hosts
  • 09:31 effie: depool mw2383 T286463
  • 09:18 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 09:15 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 09:00 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277116
  • 09:00 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 12 hosts with reason: Deploying schema change to s3 T277116
  • 08:59 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
  • 08:59 volans@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on sretest1001.eqiad.wmnet with reason: testing the cookbook
  • 08:45 effie: depool mw2383 - T286463
  • 08:02 moritzm: upgrade bullseye pilot installs to latest state of bullseye
  • 07:06 moritzm: installing apache security updates on codfw mw* hosts
  • 06:53 elukey: systemctl reset-failed ifup@ens5 on gitlab2001 - T273026
  • 06:06 effie: pool mw2383 - T286463
  • 04:09 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 04:09 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 04:05 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 08m 28s)
  • 03:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
  • 03:55 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@36f74b3]: 0.3.76 (duration: 02m 22s)
  • 03:54 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.76` on canary `wdqs1003`; proceeding to rest of fleet
  • 03:53 ryankemper@deploy1002: Started deploy [wdqs/wdqs@36f74b3]: 0.3.76
  • 03:53 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.76`. Pre-deploy tests passing on canary `wdqs1003`

2021-07-12

  • 23:57 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1896efc: Add sayahna.org to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T286163) (duration: 00m 56s)
  • 23:51 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=T286396 # T286396
  • 23:50 urbanecm: urbanecm@mwmaint2002:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # T286396
  • 23:50 urbanecm: Delete Project:BROKENPesak at sr.wikipedia to be able to rerun namespaceDupes.php (T286396)
  • 23:45 urbanecm: urbanecm@mwmaint2001:~$ mwscript namespaceDupes.php --wiki=srwiki --fix --add-prefix=BROKEN # T286396
  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 284216a: Add few namespace aliases for Serbian Wikipedia (T286396) (duration: 00m 56s)
  • 23:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8a79bf7: enwiki: Delete Book namespace (T285766) (duration: 00m 57s)
  • 23:29 urbanecm@deploy1002: Synchronized static/images/: d007b9c: Remove unused celebration logos and wordmark (T286380) (duration: 00m 57s)
  • 23:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6c58149: Add editautoreviewprotected to bot on hewikisource (T275076) (duration: 00m 57s)
  • 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 40eade4: Enable RelatedArticles Extension in zhwikinews (T266933) (duration: 00m 57s)
  • 23:15 urbanecm: urbanecm@mwmaint1002:~$ mwscript namespaceDupes.php --wiki=zhwiktionary --fix --add-prefix=BROKEN # T286101, P16817
  • 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5ab00d1: zhwiktionary: Add templateeditor right (T286101) (duration: 00m 57s)
  • 23:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5822b2b: zhwiktionary: Add aliases for namespaces (T286101) (duration: 00m 57s)
  • 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ba0967f: zhwiktionary: Add Reconstruction namespace (T286101) (duration: 00m 57s)
  • 22:53 legoktm: root@urldownloader2002:/var/cache/apt# rm -rf * to free up space
  • 21:26 urbanecm: Start server-side upload for 2 video files (T286432, T286433)
  • 18:41 otto@deploy1002: Finished deploy [analytics/refinery@200b502]: Finalize event_default gobblin job - T271232 (duration: 03m 39s)
  • 18:37 otto@deploy1002: Started deploy [analytics/refinery@200b502]: Finalize event_default gobblin job - T271232
  • 18:12 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable Score using Shellbox on testwiki (T257066) (duration: 00m 58s)
  • 16:15 ppchelko@deploy1002: Finished deploy [restbase/deploy@b05ade3]: Add newly created wikis T284929 T284457 T284392 (duration: 21m 24s)
  • 16:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116 - extending downtime
  • 16:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116 - extending downtime
  • 15:54 ppchelko@deploy1002: Started deploy [restbase/deploy@b05ade3]: Add newly created wikis T284929 T284457 T284392
  • 15:31 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116
  • 15:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 18 hosts with reason: Deploying schema change to s4 T277116
  • 15:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277116
  • 15:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s7 T277116
  • 15:24 elukey: expand ML k8s iBGP neighbors to include the master nodes (ref: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/704104)
  • 15:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277116
  • 15:15 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Deploying schema change to s2 T277116
  • 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1002.wikimedia.org
  • 15:08 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277116
  • 15:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: Deploying schema change to s5 T277116
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1002.wikimedia.org
  • 14:58 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change T277116
  • 14:58 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: Deploying schema change T277116
  • 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica1001.wikimedia.org
  • 14:44 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica1001.wikimedia.org
  • 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2004.wikimedia.org
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2004.wikimedia.org
  • 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ldap-replica2003.wikimedia.org
  • 14:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ldap-replica2003.wikimedia.org
  • 14:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps2010.codfw.wmnet
  • 13:59 otto@deploy1002: Finished deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo - T271232 (duration: 03m 30s)
  • 13:56 otto@deploy1002: Started deploy [analytics/refinery@dd65f38]: event_default gobblin job - fix typo - T271232
  • 13:52 otto@deploy1002: Finished deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - T271232 (duration: 03m 16s)
  • 13:49 otto@deploy1002: Started deploy [analytics/refinery@0149c81]: Set event_default gobblin job max mappers=128 - T271232
  • 13:36 otto@deploy1002: Finished deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - T271232 (duration: 03m 37s)
  • 13:32 otto@deploy1002: Started deploy [analytics/refinery@1cb9e12]: Add event_default gobblin job - T271232
  • 12:51 volans@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:48 volans@cumin2002: START - Cookbook sre.dns.netbox
  • 12:42 volans: reverting Primary IP allocation for pc1011-1014, leaving only mgmt IPs - T282484
  • 12:34 hnowlan@puppetmaster1001: conftool action : set/weight=5; selector: name=maps2004.codfw.wmnet
  • 11:58 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Enable template search improvements on first wikis 2/2 (T284553) (duration: 00m 57s)
  • 11:54 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable template search improvements on first wikis 1/2 (T284553) (duration: 00m 56s)
  • 11:49 wmde-fisch@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/VisualEditor/modules/ve-mw/ui/widgets/ve.ui.MWTemplateTitleInputWidget.js: Backport: Always add 1 prefixsearch match when searching for templates (duration: 00m 57s)
  • 11:47 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps100[1-4].eqiad.wmnet
  • 11:45 hnowlan: adjusting weights of eqiad maps servers to reduce load on older spec machines
  • 11:40 moritzm: installing apache updates on mw1/eqiad hosts
  • 11:38 hnowlan: adjusting weights of codfw maps servers to reduce load on older spec machines
  • 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2004.codfw.wmnet
  • 11:34 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 773c956: Revert "Use ptwiki 20th anniversary logos" (T286380) (duration: 00m 57s)
  • 11:34 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2003.codfw.wmnet
  • 11:33 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: name=maps2001.codfw.wmnet
  • 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cd5f537: Revert "ptwiki: Use celebration logos in new vector" (T286380) (duration: 00m 57s)
  • 11:26 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add 'editautoreviewprotected' protection level to hewikisource (T275076) (duration: 00m 57s)
  • 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2010.codfw.wmnet
  • 11:19 hnowlan: testing a depool of maps2010 to ensure kartotherian load can cope with two less nodes
  • 11:12 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable transclusion back button on first wikis (T284553) (duration: 00m 58s)
  • 11:01 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2008.codfw.wmnet
  • 10:58 hnowlan: testing a depool of maps2008 to ensure kartotherian load can cope with one less node
  • 10:30 moritzm: installing apache updates on an-tool* hosts (affects Turnilo, Yarn, Superset, Hue) briefly
  • 10:11 elukey: add 10g disk to ml-serve-ctrl[12]00[12] for T285927
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
  • 10:05 mutante: planet - deleting state files, manually running update for all 161 en feeds - T285251
  • 10:03 effie: depool mw2383
  • 10:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
  • 10:01 godog: test thanos-compact upload with smaller part size - T285835
  • 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
  • 09:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1006.eqiad.wmnet
  • 09:12 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
  • 09:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1183.eqiad.wmnet with reason: REIMAGE
  • 09:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rdb1006.eqiad.wmnet
  • 09:07 godog: repool thanos-fe2002 - T285835
  • 08:38 godog: test a single frontend for thanos-swift / thanos-query to test "bad host" theory - T285835
  • 08:26 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/client: Backport: Remove subscribing to other aspect for entity usage (T286193) (duration: 00m 59s)
  • 07:44 jynus: restart db1102:x1 mariadb instance
  • 07:01 moritzm: installing apache2 security updates
  • 05:14 Amir1: start of mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --batch-size=10 --verbose --mime="application/pdf" --force --sleep 5 on screen - It will take days / week to finish (T275268)
  • 05:06 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: Enable json image metadata everywhere (T275268) (duration: 01m 05s)
  • 04:56 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.12/maintenance/refreshImageMetadata.php: Backport: Add --sleep option to refreshImageMetadata.php (duration: 01m 04s)
  • 04:10 Amir1: mwscript refreshImageMetadata.php --wiki=testcommonswiki --mediatype=OFFICE --batch-size=20 --verbose --mime="application/pdf" --force (T275268)
  • 04:08 ladsgroup@deploy1002: Synchronized wmf-config/filebackend.php: Config: Set testcommonswiki to use json image metadata (T275268) (duration: 01m 10s)

2021-07-09

  • 23:28 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 23:27 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 22:36 legoktm: running benchmarking scripts again shellbox
  • 14:49 otto@deploy1002: Finished deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - T271232 (duration: 03m 08s)
  • 14:46 otto@deploy1002: Started deploy [analytics/refinery@cdb3fc5] (hadoop-test): Deploy for finalize event_default_test gobblin job in hadoop test - T271232
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1118', diff saved to https://phabricator.wikimedia.org/P16809 and previous config saved to /var/cache/conftool/dbconfig/20210709-115609-marostegui.json
  • 11:40 _joe_: deleting coredns pod in codfw, potentially causing T286360
  • 10:13 _joe_: recreated all pods for zotero in codfw
  • 00:47 legoktm: zotero rolling restart didn't help, filed T286360 for DNS issues
  • 00:39 legoktm: doing a rolling restart of zotero in codfw to hopefully fix DNS ENOTFOUND issues

2021-07-08

  • 22:48 legoktm@deploy1002: Synchronized wmf-config/CommonSettings.php: Add configuration to use Score with Shellbox (still disabled) (2/2) - T281423 (duration: 00m 57s)
  • 22:46 legoktm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add configuration to use Score with Shellbox (still disabled) (1/2) - T281423 (duration: 00m 58s)
  • 19:29 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/includes/Score.php: Allow setting a different path for `convert` just for Score (2/2) (duration: 00m 57s)
  • 19:27 legoktm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Score/extension.json: Allow setting a different path for `convert` just for Score (1/2) (duration: 00m 58s)
  • 18:56 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:55 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:53 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 17:02 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1] (duration: 05m 38s)
  • 16:56 joal@deploy1002: Started deploy [analytics/refinery@51a73f1] (hadoop-test): Analytics deploy for Gobblin replacing Camus - hadoop-test [analytics/refinery@51a73f1]
  • 16:47 joal@deploy1002: Finished deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1] (duration: 03m 17s)
  • 16:44 joal@deploy1002: Started deploy [analytics/refinery@51a73f1]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@51a73f1]
  • 15:37 otto@deploy1002: Finished deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - T271232 (duration: 03m 06s)
  • 15:34 otto@deploy1002: Started deploy [analytics/refinery@9883dbf] (hadoop-test): Deploy for event_default_test job in hadoop test - T271232
  • 15:29 otto@deploy1002: Finished deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - T271232 (duration: 05m 27s)
  • 15:23 otto@deploy1002: Started deploy [analytics/refinery@51f4696] (hadoop-test): Deploy for eventlogging_legacy gobblin with final import path - T271232
  • 15:11 otto@deploy1002: Finished deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - T271232 (duration: 05m 42s)
  • 15:05 otto@deploy1002: Started deploy [analytics/refinery@42541e6] (hadoop-test): Deploy for eventlogging_legacy gobblin migration - T271232
  • 14:52 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Add consumers.analytics_hadoop-ingestion stream config settings for automated gobblin imports - T271232 T273901 (duration: 01m 09s)
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16807 and previous config saved to /var/cache/conftool/dbconfig/20210708-134421-root.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16806 and previous config saved to /var/cache/conftool/dbconfig/20210708-132917-root.json
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16805 and previous config saved to /var/cache/conftool/dbconfig/20210708-131414-root.json
  • 13:04 otto@deploy1002: Finished deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - T271232 (duration: 03m 22s)
  • 13:01 otto@deploy1002: Started deploy [analytics/refinery@2d4c645]: Make gobblin-netflow use production directory - T271232
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2103 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16804 and previous config saved to /var/cache/conftool/dbconfig/20210708-125910-root.json
  • 12:52 moritzm: installing klibc security updates on buster
  • 12:38 moritzm: installing openexr security updates
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2103', diff saved to https://phabricator.wikimedia.org/P16803 and previous config saved to /var/cache/conftool/dbconfig/20210708-105353-marostegui.json
  • 10:20 jbond: upgrade golang-cfssl
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16802 and previous config saved to /var/cache/conftool/dbconfig/20210708-100947-root.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16801 and previous config saved to /var/cache/conftool/dbconfig/20210708-095443-root.json
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16800 and previous config saved to /var/cache/conftool/dbconfig/20210708-093939-root.json
  • 09:25 jbond: upload golang-github-cloudflare-cfssl_1.6.0-1_amd64 to bullseye
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'db2116 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16799 and previous config saved to /var/cache/conftool/dbconfig/20210708-092436-root.json
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2116', diff saved to https://phabricator.wikimedia.org/P16798 and previous config saved to /var/cache/conftool/dbconfig/20210708-092411-marostegui.json
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16797 and previous config saved to /var/cache/conftool/dbconfig/20210708-090456-root.json
  • 09:01 oblivian@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:58 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16796 and previous config saved to /var/cache/conftool/dbconfig/20210708-084952-root.json
  • 08:50 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:42 moritzm: imported ganeti 2.16.0 for stretch-security/component/ganeti216 T284811
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16795 and previous config saved to /var/cache/conftool/dbconfig/20210708-083449-root.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2130 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16794 and previous config saved to /var/cache/conftool/dbconfig/20210708-081945-root.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2130', diff saved to https://phabricator.wikimedia.org/P16793 and previous config saved to /var/cache/conftool/dbconfig/20210708-081922-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16792 and previous config saved to /var/cache/conftool/dbconfig/20210708-060812-root.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16791 and previous config saved to /var/cache/conftool/dbconfig/20210708-055309-root.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16790 and previous config saved to /var/cache/conftool/dbconfig/20210708-053805-root.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db2092 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16789 and previous config saved to /var/cache/conftool/dbconfig/20210708-052302-root.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2092', diff saved to https://phabricator.wikimedia.org/P16788 and previous config saved to /var/cache/conftool/dbconfig/20210708-052216-marostegui.json

2021-07-07

  • 20:22 legoktm: repooling eqiad - https://gerrit.wikimedia.org/r/703561
  • 18:06 legoktm@deploy1002: Synchronized wmf-config/ProductionServices.php: Add Shellbox to {Production,Labs}Services.php (2/2) (duration: 00m 59s)
  • 18:05 legoktm@deploy1002: Synchronized wmf-config/LabsServices.php: Add Shellbox to {Production,Labs}Services.php (1/2) (duration: 00m 59s)
  • 18:04 otto@deploy1002: Finished deploy [analytics/refinery@46c0b84] (hadoop-test): Deploy for gobblin migration - Refine now supports gzip - T271232 (duration: 05m 28s)
  • 17:59 legoktm@deploy1002: Synchronized private/readme.php: Document $wgShellboxSecretKey in private/readme.php (duration: 01m 01s)
  • 17:58 otto@deploy1002: Started deploy [analytics/refinery@46c0b84] (hadoop-test): Deploy for gobblin migration - Refine now supports gzip - T271232
  • 17:54 otto@deploy1002: Finished deploy [analytics/refinery@46c0b84]: Deploy for gobblin migration - Refine now supports gzip - T271232 (duration: 17m 22s)
  • 17:36 otto@deploy1002: Started deploy [analytics/refinery@46c0b84]: Deploy for gobblin migration - Refine now supports gzip - T271232
  • 16:55 joal@deploy1002: Finished deploy [analytics/refinery@b5c4462]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@b5c4462] (duration: 03m 10s)
  • 16:52 joal@deploy1002: Started deploy [analytics/refinery@b5c4462]: Analytics deploy for Gobblin replacing Camus - an-launcher1002 only [analytics/refinery@b5c4462]
  • 16:28 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:15 joal@deploy1002: Finished deploy [analytics/refinery@b5c4462] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@b5c4462] (duration: 10m 21s)
  • 16:05 joal@deploy1002: Started deploy [analytics/refinery@b5c4462] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@b5c4462]
  • 16:03 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 16:01 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:25 oblivian@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:49 moritzm: installing djvulibre security updates
  • 14:05 _joe_: powercycling mw2267, stuck witout network, blank console
  • 13:25 otto@deploy1002: Finished deploy [analytics/refinery@8de71e6] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin dir fixes - T271232 (duration: 05m 41s)
  • 13:19 otto@deploy1002: Started deploy [analytics/refinery@8de71e6] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin dir fixes - T271232
  • 13:13 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 13:13 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 13:12 otto@deploy1002: Finished deploy [analytics/refinery@8de71e6]: analytics cluster deploy for webrequest gobblin job migration - T271232 (duration: 03m 11s)
  • 13:09 otto@deploy1002: Started deploy [analytics/refinery@8de71e6]: analytics cluster deploy for webrequest gobblin job migration - T271232
  • 12:12 urbanecm: Start server-side upload for 3 video files (T286173, T286175, T286174)
  • 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx1002.wikimedia.org
  • 11:49 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host mx1002.wikimedia.org
  • 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx2002.wikimedia.org
  • 11:29 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host mx2002.wikimedia.org
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16782 and previous config saved to /var/cache/conftool/dbconfig/20210707-112149-root.json
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16781 and previous config saved to /var/cache/conftool/dbconfig/20210707-110645-root.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16780 and previous config saved to /var/cache/conftool/dbconfig/20210707-105142-root.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2087:3316 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16779 and previous config saved to /var/cache/conftool/dbconfig/20210707-103638-root.json
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316', diff saved to https://phabricator.wikimedia.org/P16778 and previous config saved to /var/cache/conftool/dbconfig/20210707-103553-marostegui.json
  • 07:56 moritzm: bounced elasticsearch_5@production-logstash-eqiad on logstash1009
  • 07:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .

2021-07-06

  • 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 18:34 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 18:03 otto@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'eventgate-analytics' for release 'production' .
  • 17:25 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0] (duration: 05m 31s)
  • 17:20 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (hadoop-test): Analytics deploy for Gobblin replacing Camus - HADOOP-TEST [analytics/refinery@419d1f0]
  • 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0] (duration: 00m 07s)
  • 17:19 joal@deploy1002: Started deploy [analytics/refinery@419d1f0] (thin): Analytics deploy for Gobblin replacing Camus - THIN [analytics/refinery@419d1f0]
  • 17:19 joal@deploy1002: Finished deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0] (duration: 36m 59s)
  • 16:42 joal@deploy1002: Started deploy [analytics/refinery@419d1f0]: Analytics deploy for Gobblin replacing Camus [analytics/refinery@419d1f0]
  • 15:54 otto@deploy1002: Finished deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration (duration: 05m 24s)
  • 15:48 otto@deploy1002: Started deploy [analytics/refinery@a8e79f3] (hadoop-test): analytics test cluster deploy for webrequest_test gobblin job migration
  • 14:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16777 and previous config saved to /var/cache/conftool/dbconfig/20210706-140049-root.json
  • 13:53 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 13:49 otto@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
  • 13:49 otto@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 13:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16776 and previous config saved to /var/cache/conftool/dbconfig/20210706-134545-root.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16775 and previous config saved to /var/cache/conftool/dbconfig/20210706-133041-root.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2072 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16774 and previous config saved to /var/cache/conftool/dbconfig/20210706-131537-root.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 100%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16773 and previous config saved to /var/cache/conftool/dbconfig/20210706-120242-root.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P16772 and previous config saved to /var/cache/conftool/dbconfig/20210706-115820-marostegui.json
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16771 and previous config saved to /var/cache/conftool/dbconfig/20210706-115732-marostegui.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 75%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16770 and previous config saved to /var/cache/conftool/dbconfig/20210706-114739-root.json
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 50%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16769 and previous config saved to /var/cache/conftool/dbconfig/20210706-113235-root.json
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2071 (re)pooling @ 25%: Repool after index change', diff saved to https://phabricator.wikimedia.org/P16768 and previous config saved to /var/cache/conftool/dbconfig/20210706-111731-root.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2071', diff saved to https://phabricator.wikimedia.org/P16767 and previous config saved to /var/cache/conftool/dbconfig/20210706-111635-marostegui.json
  • 10:19 moritzm: installing jackson-databind security updates on buster
  • 09:01 _joe_: repooling wdqs1007 now that lag has caught up
  • 08:43 moritzm: installing libuv1 security updates on buster
  • 07:06 marostegui: Upgrade db1104 kernel
  • 06:54 moritzm: installing PHP 7.3 securiy updates on buster
  • 06:50 marostegui: Upgrade db1122 kernel
  • 06:35 marostegui: Upgrade db1138 kernel
  • 06:31 marostegui: Upgrade db1160 kernel
  • 00:56 eileen: process-control config revision is 8d46b52ed4

2021-07-05

  • 17:40 legoktm: published fixed docker-registry.discovery.wmnet/nodejs10-devel:0.0.4 image (T286212)
  • 15:24 _joe_: leaving wdqs1007 depooled so that the updater can recover faster, now at 16.5 hours of lag
  • 14:01 moritzm: uploaded nginx 1.13.9-1+wmf3 for stretch-wikimedoa
  • 12:50 marostegui: Stop MySQL on db1117:3321 to clone db1125 T286042
  • 11:29 moritzm: installing openexr security updates on stretch
  • 11:07 moritzm: installing tiff security updates on stretch
  • 10:48 moritzm: upgrading PHP on miscweb*
  • 10:37 jbond: enable puppet fleet wide to post puppetdb change
  • 10:29 marostegui: Optimize ruwiki.logging on s6 eqiad with replication T286102
  • 10:27 jbond: disable puppet fleet wide to preforem puppetdb change
  • 08:15 moritzm: rolling out debmonitor-client 0.3.0
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
  • 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases1002.eqiad.wmnet with reason: bump CPU count
  • 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
  • 07:55 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases2002.codfw.wmnet with reason: bump CPU count
  • 07:04 _joe_: restarting blazegraph, then restarting the updater again
  • 06:48 moritzm: start rasdaemon on sretest1001, didn't start after last reboot from a week ago
  • 06:47 _joe_: restart wdqs-updater on wdqs1007
  • 00:53 eileen: process-control config revision is a1717c7fde
  • 00:47 eileen: process-control config revision is 24565578f7

2021-07-04

2021-07-03

  • 17:46 elukey: depool eqsin due to loss of power redundancy (equinix maintenance) - T286113
  • 09:12 Amir1: restarting mailman3-web on lists1001 to pick up patches for T283659
  • 08:53 Amir1: patching postorius and mailmanclient on lists1001 for T283659

2021-07-02

  • 22:06 foks: removing three files for legal compliance
  • 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:36 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:24 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:22 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:08 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:04 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:59 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:54 kormat@cumin1001: START - Cookbook sre.dns.netbox
  • 15:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:17 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dragonfly-supernode1001.eqiad.wmnet
  • 15:07 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
  • 15:05 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dragonfly-supernode1001.eqiad.wmnet
  • 15:02 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
  • 14:54 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host dragonfly-supernode1001.eqiad.wmnet
  • 14:53 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
  • 14:52 kormat@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbstore1004.eqiad.wmnet
  • 14:40 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw142[0-1].eqiad.wmnet
  • 14:39 jelto@cumin1001: conftool action : set/pooled=yes; selector: name=mw141[4-9].eqiad.wmnet
  • 14:38 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1004.eqiad.wmnet
  • 14:23 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw142[0-1].eqiad.wmnet
  • 14:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw141[4-9].eqiad.wmnet
  • 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw142[0-1].eqiad.wmnet
  • 14:16 jelto@cumin1001: conftool action : set/weight=30; selector: name=mw141[4-9].eqiad.wmnet
  • 14:15 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw142[0-1].eqiad.wmnet
  • 14:12 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw141[4-9].eqiad.wmnet
  • 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts registry[2005-2008].codfw.wmnet
  • 13:54 jayme@cumin1001: START - Cookbook sre.hosts.decommission for hosts registry[2005-2008].codfw.wmnet
  • 13:32 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=registry200[5-8].codfw.wmnet,dc=codfw,cluster=docker-registry
  • 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 13:22 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 13:22 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 13:20 dzahn@cumin1001: conftool action : set/pooled=yes; selector: name=mw2380.codfw.wmnet
  • 13:11 mutante: mw2380 - rebooting
  • 13:09 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
  • 13:08 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw2380.codfw.wmnet
  • 13:08 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
  • 12:24 moritzm: added btullis to pwstore
  • 12:06 mutante: mw2380 /puppetmaster: reimaged, revoking old cert, signing new cert, initial puppet run T285603
  • 11:51 mutante: mw2380 - PXE booting - does not boot from hard disk
  • 11:28 mutante: powercycling mw2380, trying to make it boot
  • 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 11:15 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 11:15 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 11:14 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 10:33 jforrester@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/WikibaseMediaInfo: UploadWizard/WikibaseMediaInfo fix 3fd2873 for T285579 (duration: 00m 59s)
  • 09:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1268.eqiad.wmnet
  • 09:37 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/HomepageHooks.php: Backport: Fix handling of geEnabled flag (T285996) (duration: 00m 57s)
  • 09:36 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1268.eqiad.wmnet
  • 09:24 godog: test thanos 0.21.1 locally on thanos-fe2001 and depool the host - T285835
  • 09:19 dcausse: restart blazegraph on wdqs1013
  • 09:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1267.eqiad.wmnet
  • 09:04 mutante: decom'ing mw1267
  • 09:02 moritzm: installing node-hosted-git-info security updates
  • 09:02 tgr: deploying emergency backport: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/702808
  • 08:54 moritzm: installing golang-docker-credential-helpers security updates
  • 08:26 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1267.eqiad.wmnet
  • 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 08:24 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 08:24 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309
  • 08:03 moritzm: installing ipmitool security updates
  • 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1268.eqiad.wmnet
  • 07:54 dzahn@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1267.eqiad.wmnet
  • 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1268.eqiad.wmnet
  • 07:53 dzahn@cumin1001: conftool action : set/pooled=no; selector: name=mw1267.eqiad.wmnet
  • 07:25 dcausse: installing openjdk-8-dbg on wdqs1013
  • 03:14 ryankemper: T264053 `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo run-puppet-agent --force'`
  • 03:11 ryankemper: T264053 `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo apt update'` fixed the issue
  • 03:07 ryankemper: T264053 `Error: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold install elasticsearch-madvise' returned 100: Reading package lists...` grr
  • 03:07 ryankemper: T264053 `ryankemper@elastic2054:~$ sudo run-puppet-agent --force`
  • 03:06 ryankemper: T264053 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/702791; will run puppet on single host
  • 03:05 ryankemper: T264053 `sudo -E cumin 'P:elasticsearch::cirrus' 'sudo disable-puppet "verify new deb package works - T264053"'`
  • 03:02 legoktm: uploaded elasticsearch-madvise_0.1~deb9u1_amd64.changes to stretch-wikimedia on apt1001
  • 01:47 eileen: civicrm revision changed from e07c2be1a7 to bb62188ec6, config revision is 1739c53fcb
  • 01:16 legoktm: uploaded elasticsearch-madvise 0.1 to apt.wm.o (T264053)

2021-07-01

  • 23:29 thcipriani@deploy1002: Synchronized README: Config: Revert "deployment training: readme whitespace" (duration: 00m 56s)
  • 23:21 thcipriani@deploy1002: Synchronized README: Config: deployment training: readme whitespace (duration: 00m 57s)
  • 22:37 urbanecm: Start server-side upload for 1 video file (T285182)
  • 22:36 urbanecm: Start server-side upload for 1 video file (T285789)
  • 22:31 dancy@deploy1002: Synchronized .pipeline: Config: Use train-versions.json to map from version to image tag (T282824) (duration: 00m 57s)
  • 22:27 urbanecm: Start server-side upload for 1 video file (T285682)
  • 21:43 dancy@deploy1002: Synchronized .pipeline/config.yaml: Config: Temporarily disable notification for security patch failures (duration: 00m 57s)
  • 19:45 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.12
  • 19:41 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 12s)
  • 19:39 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
  • 19:35 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/tests/phpunit/includes/TitleMethodsTest.php: Backport: Consistently normalize Title::mFragment before setting (T285951) (duration: 01m 10s)
  • 19:34 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/includes/Title.php: Backport: Consistently normalize Title::mFragment before setting (T285951) (duration: 01m 10s)
  • 19:18 brennen@deploy1002: Synchronized php-1.37.0-wmf.12/.pipeline/config.yaml: Backport: Trigger update-train-versions job at end of wmf-publish pipeline (duration: 01m 08s)
  • 18:55 otto@deploy1002: Finished deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883] (duration: 05m 19s)
  • 18:50 otto@deploy1002: Started deploy [analytics/refinery@7dea883] (hadoop-test): Deploying to analytics-test cluster for testing gobblin [analytics/refinery@7dea883]
  • 18:43 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7995f7a: Use Vue.js for QuickSurveys on available wikis (T285890) (duration: 01m 09s)
  • 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: 654877f: EventDispatcher: Ensure we fetch page content from the primary database (T285895) (duration: 01m 12s)
  • 18:18 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/includes/Notifications/EventDispatcher.php: 6d90430: EventDispatcher: Ensure we fetch page content from the primary database (T285895) (duration: 01m 14s)
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:28 brennen@deploy1002: rebuilt and synchronized wikiversions files: Revert "group1 wikis to 1.37.0-wmf.12"
  • 16:27 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:23 reedy@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/ConfirmEdit/SimpleCaptcha/SimpleCaptcha.php: T285959 (duration: 01m 20s)
  • 16:11 vgutierrez: restart varnish-fe on cp3059 - T285953
  • 14:58 papaul: poweroff mw2380 for disk replacement
  • 14:57 jiji@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2380.codfw.wmnet
  • 14:53 effie: depool mw2380 for disk repair - T285603
  • 14:51 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 14:51 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:45 moritzm: installing glib2.0 security updates on buster
  • 13:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts maps2002.codfw.wmnet
  • 13:35 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts maps2002.codfw.wmnet
  • 13:03 marostegui: Deploy schema change on s2 eqiad master T276150
  • 12:49 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1266.eqiad.wmnet
  • 12:39 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1266.eqiad.wmnet
  • 12:37 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 12:29 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[1264-1265].eqiad.wmnet
  • 12:23 tgr: EU deploys done
  • 12:22 tgr@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/: Backport: Welcome tour: Mark as complete when notice is shown (T284800) SuggestedEdits: Return default JS data as 'noresults' (T285906) (duration: 01m 08s)
  • 12:20 tgr@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: Backport: Welcome tour: Mark as complete when notice is shown (T284800) SuggestedEdits: Return default JS data as 'noresults' (T285906) (duration: 01m 09s)
  • 12:19 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw[1264-1265].eqiad.wmnet
  • 12:09 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1263.eqiad.wmnet
  • 11:58 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1263.eqiad.wmnet
  • 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/Wikibase/: Backport: Stop using legacy entityNamespaces setting in onSetupAfterCache hook (T285472) (duration: 01m 15s)
  • 11:46 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1262.eqiad.wmnet
  • 11:35 elukey: reboot ml-serve-ctrl200[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
  • 11:35 marostegui: Deploy schema change on s8 eqiad master T276150
  • 11:33 elukey: reboot ml-serve-ctrl100[1,2] to increase vcpus/memory (1->2 vcores, 2->4g of memory)
  • 11:33 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1262.eqiad.wmnet
  • 11:19 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 11:10 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Avoid using MWNamespace (duration: 01m 06s)
  • 11:07 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 10:27 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 10:05 moritzm: installing remaining libgcrypt20 security updates
  • 09:56 moritzm: installing remaining gnutls28 security updates
  • 09:55 Amir1: start of clean up of autoreview logs in ruwiki (T285608)
  • 09:47 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'tegola-vector-tiles' for release 'main' .
  • 09:36 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:36 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:35 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:35 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:05 marostegui: Deploy schema change on s1 eqiad (db1157) master T277123
  • 08:52 marostegui: Deploy schema change on s1 eqiad (db1163) master T277123
  • 08:50 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw1261.eqiad.wmnet
  • 08:28 jelto@cumin1001: START - Cookbook sre.hosts.decommission for hosts mw1261.eqiad.wmnet
  • 08:23 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw126[2-6].eqiad.wmnet
  • 08:22 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw126[2-6].eqiad.wmnet
  • 08:13 jelto@cumin1001: conftool action : set/pooled=inactive; selector: name=mw1261.eqiad.wmnet
  • 08:11 jelto@cumin1001: conftool action : set/pooled=no; selector: name=mw1261.eqiad.wmnet
  • 07:06 marostegui: Deploy schema change on s4 eqiad (db1138) master T277123
  • 06:34 marostegui: Deploy schema change on s7 eqiad (db1136) masters T277123
  • 06:31 marostegui: Deploy schema change on s2,s8 eqiad masters T277123
  • 05:57 marostegui: Deploy schema change on s5 eqiad master (db1130) T277123
  • 05:55 marostegui: Deploy schema change on s6 eqiad master (db1173) T277123
  • 05:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1129', diff saved to https://phabricator.wikimedia.org/P16750 and previous config saved to /var/cache/conftool/dbconfig/20210701-055243-marostegui.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P16749 and previous config saved to /var/cache/conftool/dbconfig/20210701-052702-marostegui.json
  • 04:48 marostegui: Disconnect eqiad -> codfw replication from s1-s8

2021-06-30

  • 23:28 urbanecm: Evening B&C window finished
  • 23:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 667d880: Add Parsoid to wmgMonologChannels with warning level (duration: 01m 07s)
  • 23:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: 8e719d5: Add Parsoid to wmgMonologChannels (duration: 00m 38s)
  • 23:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8e719d5: Add Parsoid to wmgMonologChannels (duration: 01m 07s)
  • 21:43 Amir1: deleting auto-review logs from test2wiki (T285608)
  • 21:40 Amir1: end of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T284931 T284459 T284394)
  • 21:29 cstone: civicrm revision changed from 789c92d13b to e07c2be1a7
  • 21:23 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https (T284931 T284459 T284394)
  • 19:06 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.12 (duration: 01m 07s)
  • 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.12
  • 18:57 legoktm: legoktm@mwmaint2002:~$ sudo systemctl start mediawiki_job_purge_parsercache_pc[123] # to start split purge jobs ahead of the timers
  • 18:54 legoktm: legoktm@mwmaint2002:~$ sudo systemctl stop mediawiki_job_parser_cache_purging.service # to stop zombie service
  • 18:53 Amir1: adding urbanecm as admin of newprojects mailing list
  • 18:12 Jeff_Green: authdns-update to deploy A/PTR records for frdev1002.frack.eqiad.wmnet
  • 17:57 thcipriani: restart ci jenkins following upgrade
  • 17:54 thcipriani: restart releases-jenkins following upgrade
  • 17:16 moritzm: imported jenkins 2.289.2 to thirdparty/ci T285532
  • 16:30 urbanecm: mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki 'Tech/Server_switch_2020' 'Tech/Server_switch' 'Martin Urbanec' --move-subpages --reason='per phab:T285866' # T285866
  • 16:10 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 46s)
  • 16:08 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 01s)
  • 16:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating banwikisource (T284389) (duration: 01m 20s)
  • 16:04 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating banwikisource (T284389) (duration: 01m 16s)
  • 16:03 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating banwikisource (T284389) (duration: 01m 17s)
  • 16:02 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating banwikisource (T284389)
  • 16:00 urbanecm@deploy1002: Synchronized dblists: Creating banwikisource (T284389) (duration: 01m 17s)
  • 15:58 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating banwikisource (T284389) (duration: 01m 14s)
  • 15:57 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating banwikisource (T284389) (duration: 01m 13s)
  • 15:48 urbanecm@deploy1002: Synchronized langlist: Creating shiwiki (T284885) (duration: 01m 16s)
  • 15:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating shiwiki (T284885) (duration: 01m 16s)
  • 15:46 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating shiwiki (T284885) (duration: 01m 13s)
  • 15:44 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating shiwiki (T284885) (duration: 01m 15s)
  • 15:43 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating shiwiki (T284885)
  • 15:41 urbanecm@deploy1002: Synchronized dblists: Creating shiwiki (T284885) (duration: 01m 14s)
  • 15:40 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating shiwiki (T284885) (duration: 01m 14s)
  • 15:38 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating shiwiki (T284885) (duration: 01m 14s)
  • 15:31 urbanecm@deploy1002: Synchronized langlist: Creating dagwiki (T284450) (duration: 01m 12s)
  • 15:30 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Creating dagwiki (T284450) (duration: 01m 14s)
  • 15:28 urbanecm@deploy1002: Synchronized wmf-config/logos.php: Creating dagwiki (T284450) (duration: 01m 16s)
  • 15:27 urbanecm@deploy1002: Synchronized static/images/project-logos/: Creating dagwiki (T284450) (duration: 01m 16s)
  • 15:26 urbanecm@deploy1002: rebuilt and synchronized wikiversions files: Creating dagwiki (T284450)
  • 15:25 urbanecm: [urbanecm@mwmaint2002 ~]$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=dagwiki --cluster=all # T284450
  • 15:24 urbanecm@deploy1002: Synchronized dblists: Creating dagwiki (T284450) (duration: 01m 16s)
  • 15:22 urbanecm@deploy1002: Synchronized wmf-config/db-codfw.php: Creating dagwiki (T284450) (duration: 01m 13s)
  • 15:21 urbanecm@deploy1002: Synchronized wmf-config/db-eqiad.php: Creating dagwiki (T284450) (duration: 01m 16s)
  • 15:07 sukhe: restarted dnsdist.service and pdns-recursor.service on O:wikidough to install gnutls/gcrypt updates
  • 15:06 urbanecm: sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1'
  • 13:32 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 13:26 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
  • 13:26 moritzm: installing fluidsynth security updates on stretch
  • 13:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
  • 13:18 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
  • 13:18 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
  • 13:11 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
  • 13:04 mutante: switching docker-registry to nginx light variant T164456
  • 13:00 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
  • 12:53 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
  • 12:52 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
  • 12:46 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
  • 12:45 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 12:39 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 12:35 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
  • 12:29 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
  • 12:29 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
  • 12:24 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
  • 12:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
  • 12:17 kart_: Updated cxserver to 2021-06-30-112813-production (T284900, T284885)
  • 12:13 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
  • 12:11 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:06 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 12:01 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 11:46 Lucas_WMDE: EU backport+config window done
  • 11:46 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseClientRepoConceptBaseUri (T257260) (2/2, beta) (disregard the earlier /3, I’m skipping the test file after all) (duration: 01m 04s)
  • 11:44 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseClientRepoConceptBaseUri (T257260) (1/3, prod) (duration: 01m 16s)
  • 11:35 moritzm: rolling restart of FPM/Apache on mw canaries to pick up gnutls/gcrypt security updates
  • 11:11 moritzm: installing libgcrypt security updates on buster
  • 11:09 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug2001:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n /srv/mediawiki/php-1.37.0-wmf.1/cache /srv/mediawiki/php-1.37.0-wmf.1' # clean up old l10n cache
  • 11:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting Wikibase client repoConceptBaseUri (T257260) (duration: 01m 24s)
  • 10:44 moritzm: installing gnutls security updates on buster
  • 10:31 godog: add 200G to prometheus/eqiad for 'ops' instance
  • 09:35 godog: start swiftrepl-mw on ms-fe2005 post-switchover (credentials were missing) - T162123
  • 08:51 jelto: jelto@puppetmaster1001:~$ sudo puppet cert -s gitlab2001.wikimedia.org # approve puppet certificate request for gitlab2001, fingerprint checked
  • 08:47 topranks: Removing BGP peers for AS48237 (Etihad Etisalat) and AS11404 (Wave Division Holdings) from cr2-eqiad (peers have left Equinix IX)
  • 08:31 godog: remove sdf1 from thanos-be1003 in swift - T285835
  • 07:43 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host thanos-be1003.eqiad.wmnet
  • 07:43 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
  • 07:43 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host thanos-be1003.eqiad.wmnet
  • 07:37 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
  • 05:46 ryankemper: [Cirrus] Unbanned `elastic2045`; now only `elastic2033` is banned in `codfw`
  • 00:36 tstarling@deploy1002: Synchronized wmf-config/db-labs.php: gerrit 701995 SQL query log (duration: 01m 05s)
  • 00:35 tstarling@deploy1002: Synchronized wmf-config/db-eqiad.php: gerrit 701995 SQL query log (duration: 01m 06s)
  • 00:34 tstarling@deploy1002: Synchronized wmf-config/db-codfw.php: gerrit 701995 SQL query log (duration: 01m 06s)
  • 00:32 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: gerrit 701995 SQL query log (duration: 01m 05s)
  • 00:31 tstarling@deploy1002: Synchronized docroot/noc/db.php: gerrit 701995 SQL query log (duration: 01m 06s)
  • 00:27 tstarling@deploy1002: Synchronized wmf-config/logging.php: gerrit 701995 SQL query log (duration: 01m 15s)
  • 00:01 urbanecm: (following up previous SAL item) TrainBranchBot was removed from wmf-deployment group because of T285819

2021-06-29

  • 23:45 urbanecm: Evening B&C window done
  • 23:45 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 367bc98: 904d18720: flood flag changes for enwikibooks (T285594) (duration: 01m 07s)
  • 23:45 urbanecm: Remove TrainBranchBot from wmf-deployment Gerrit group, merges code to mediawiki-config without actually deploying it
  • 23:41 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: 8a5b835: SpecialEditGrowthConfig: Do not use relative => true (T285750) (duration: 01m 04s)
  • 23:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: c61fb17: SpecialEditGrowthConfig: Do not use relative => true (T285750) (duration: 01m 05s)
  • 23:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/DiscussionTools/: bad8266: Config option to enable topic subscriptions backend and dtenable=1 URL parameter (T284491) (duration: 01m 05s)
  • 23:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/: bad8266: Config option to enable topic subscriptions backend and dtenable=1 URL parameter (T284491) (duration: 01m 06s)
  • 23:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.12/extensions/DiscussionTools/: e77e002: Config option to enable topic subscriptions backend and dtenable=1 URL parameter (T284491) (duration: 01m 09s)
  • 21:58 maryum: deployed security patch T285515 to wmf.12
  • 21:51 maryum: deployed security patch T285515 to wmf.11
  • 21:44 maryum: deployed updated security patch for T285190 to wmf.12
  • 21:42 maryum: deployed updated security patch for T285190 to wmf.11
  • 21:31 sbassett: Reverted and deployed updated security patch for T285190 to wmf.12
  • 21:29 sbassett: Reverted and deployed updated security patch for T285190 to wmf.11
  • 21:19 sbassett: Deployed updated security patch for T285190 to wmf.11
  • 20:55 dancy: Deleted all CDB files on beta so they'll be recreated on the next scap sync-world run
  • 20:26 dancy: Reverting to scap 3.17.1-1+0~20210419163335.8~1.gbpa6b2e0 in beta
  • 19:43 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: REIMAGE
  • 19:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1020.eqiad.wmnet with reason: REIMAGE
  • 19:35 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: REIMAGE
  • 19:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: REIMAGE
  • 19:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: REIMAGE
  • 19:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: REIMAGE
  • 19:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: REIMAGE
  • 19:14 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: REIMAGE
  • 19:07 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.12
  • 18:57 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 18:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 18:34 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc3
  • 18:28 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc2
  • 18:21 Krinkle: krinkle@mwmaint2002.codfw: mwscript purgeParserCache.php --wiki=aawiki --age=1814400 --msleep 200 --tag pc1
  • 18:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 18:09 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: REIMAGE
  • 18:07 brennen@deploy1002: Pruned MediaWiki: 1.37.0-wmf.7 (duration: 04m 00s)
  • 17:59 urbanecm: Start server-side upload of ~2.5G of JPG files (T282755)
  • 17:52 brennen@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.12 (duration: 57m 11s)
  • 16:55 ryankemper: T281327 `[Cirrus -> codfw]` Current banned nodes are`elastic2043` and `elastic2045`; `elastic2043` can be unbanned after a re-image, and `elastic2045` can be unbanned in ~30 minutes after shards rebalance (had heavy shards scheduled)
  • 16:55 brennen@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.12
  • 16:45 brennen: 1.37.0-wmf.12 was branched at 3703c31 for T281153
  • 16:28 ebernhardson: temporarily ban elastic2045 from production-search-codfw
  • 15:43 dcausse: unbanning elastic2054
  • 15:30 dcausse: restarting blazegraph on wdqs1012
  • 15:17 effie: pool mw2383 back
  • 15:15 mutante: [mwlog2002:~] $ sudo systemctl start mw-log-cleanup
  • 15:06 dcausse: banning elastic2054
  • 14:53 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[1-2].codfw.wmnet,service=canary
  • 14:52 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw227[8-9].codfw.wmnet,service=canary
  • 14:52 dzahn@cumin1001: conftool action : set/weight=1; selector: name=mw225[1-2].codfw.wmnet,service=canary
  • 14:52 effie: depool mw2383 as it is misbehaving
  • 14:47 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 14:47 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 14:47 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw226[1-2].codfw.wmnet
  • 14:47 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw2290.codfw.wmnet
  • 14:46 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:46 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw22[7-8][0-9].codfw.wmnet
  • 14:45 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw225[1-8].codfw.wmnet
  • 14:44 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:44 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw225[1-8].codfw.wmnet,service=api_appserver
  • 14:43 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
  • 14:38 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
  • 14:38 _joe_: restarting pohp-fpm on mw2383
  • 14:38 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 14:37 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2103 (s1) weight a bit', diff saved to https://phabricator.wikimedia.org/P16739 and previous config saved to /var/cache/conftool/dbconfig/20210629-143742-marostegui.json
  • 14:37 _joe_: repooling mw2383
  • 14:36 _joe_: depooling mw2383
  • 14:30 legoktm@deploy1002: Synchronized wmf-config/db-codfw.php: fix trwikivoyage (duration: 01m 01s)
  • 14:29 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 14:29 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 14:28 Krinkle: TODO: Don't duplicate `sectionsByDB` between db-* files
  • 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:23 jayme@cumin1001: MediaWiki read-only period ends at: 2021-06-29 14:23:23.504447
  • 14:23 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 14:23 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 14:23 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 14:22 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 14:22 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 14:22 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 14:22 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 14:21 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 14:21 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 14:21 jayme@cumin1001: MediaWiki read-only period starts at: 2021-06-29 14:21:26.671853
  • 14:21 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 14:15 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 14:15 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 14:13 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 44 hosts with reason: DC switchover
  • 14:13 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 44 hosts with reason: DC switchover
  • 14:12 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 14:11 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 14:10 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 14:09 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 14:08 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 14:02 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 14:01 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 14:01 jayme@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 13:51 otto@deploy1002: Started deploy [analytics/refinery@edc31a2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@edc31a2]
  • 13:49 otto@deploy1002: Finished deploy [analytics/refinery@edc31a2] (thin): Regular analytics weekly train THIN [analytics/refinery@edc31a2] (duration: 00m 07s)
  • 13:49 otto@deploy1002: Started deploy [analytics/refinery@edc31a2] (thin): Regular analytics weekly train THIN [analytics/refinery@edc31a2]
  • 13:49 otto@deploy1002: Finished deploy [analytics/refinery@edc31a2]: Regular analytics weekly train [analytics/refinery@COMMIT_HASH] (duration: 17m 42s)
  • 13:35 volker-e@deploy1002: Finished deploy [design/style-guide@e97fccb]: Deploy design/style-guide: e97fccb styles: Add internationalization and accessibility note labels and treatments (#476) (duration: 00m 07s)
  • 13:34 volker-e@deploy1002: Started deploy [design/style-guide@e97fccb]: Deploy design/style-guide: e97fccb styles: Add internationalization and accessibility note labels and treatments (#476)
  • 13:31 otto@deploy1002: Started deploy [analytics/refinery@edc31a2]: Regular analytics weekly train [analytics/refinery@COMMIT_HASH]
  • 11:54 phuedx@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: vector: Finish enabling language switcher treatment A/B test on fawiki (T269093) (duration: 00m 56s)
  • 11:38 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/Wikibase/repo/: Backport: Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634), Part II (duration: 00m 58s)
  • 11:36 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/Wikibase/repo/includes/Rdf/PropertyStubRdfBuilder.php: Backport: Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634), Part I (duration: 00m 56s)
  • 11:35 ladsgroup@deploy1002: sync-file aborted: Backport: Use EntityLookup backed TermLookup for Rdf PropertyStubs (T285634) (duration: 00m 10s)
  • 10:30 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on acmechief* after switch towards nginx-light T164456
  • 09:27 moritzm: installing nettle security updates on buster
  • 08:47 elukey: repool mw13[55,84] after debugging - T285634
  • 08:46 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1384.eqiad.wmnet
  • 08:46 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=mw1355.eqiad.wmnet
  • 08:43 gehel@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=elastic1039.eqiad.wmnet
  • 08:25 elukey: cumin 'A:mw-eqiad' '/usr/local/sbin/restart-php7.2-fpm' -b 2 -s 30 - T285634
  • 08:21 elukey: depool mw1355 (mw appserver) for debugging - T285634
  • 08:21 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1355.eqiad.wmnet
  • 08:12 hashar: Upgrading Jenkins on contint2001 / contint1001 and restarting CI Jenkins # T285531
  • 08:03 hashar: Upgraded Jenkins on releases1002 / releases2002 # T285531
  • 08:02 hashar: Upgraded Jenkins on releases1002 / releases2002
  • 07:50 godog: remove 20G migration data /root/prometheus from prometheus4001 - T243057
  • 07:48 godog: remove old /root/prometheus data from prometheus4001
  • 07:05 moritzm: upgrading bullseye early installs to the latest state of testing T275873
  • 06:46 tstarling@deploy1002: Synchronized php-1.37.0-wmf.11/includes/MediaWiki.php: Add statsd action timing metric T284274 (duration: 00m 58s)
  • 02:47 cdanis: ✔️ cdanis@cumin2001.codfw.wmnet ~ 🕥🍺 sudo cumin -b16 'A:cp-upload and A:codfw' 'run-puppet-agent -q'
  • 02:34 ryankemper: T285643 Banned `elastic1039` from all 3 elasticsearch clusters and set `elastic1039.eqiad.wmnet` to failed in netbox
  • 02:27 cdanis: ✔️ cdanis@cumin2001.codfw.wmnet ~ 🕥🍺 sudo cumin -b16 'A:cp-upload' 'run-puppet-agent -q'
  • 02:25 eileen: civicrm revision changed from 927ab7cff7 to 789c92d13b, config revision is 1739c53fcb
  • 02:04 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@0e916b1]: 0.3.75 (duration: 08m 40s)
  • 01:57 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.75` on canary `wdqs1003`; proceeding to rest of fleet
  • 01:56 ryankemper@deploy1002: Started deploy [wdqs/wdqs@0e916b1]: 0.3.75
  • 01:50 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.75`. Pre-deploy tests passing on canary `wdqs1003`
  • 00:25 Krinkle: krinkle@mwmaint1002: purgeParserCache.php --tag pc1, ref T282761

2021-06-28

  • 23:07 urbanecm: Evening B&C window done
  • 23:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5ec855d: Enable Parsoid inspired media structure on test wikis (T51097) (duration: 00m 59s)
  • 22:51 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0)
  • 22:51 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-update-tendril
  • 22:50 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 22:48 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 22:48 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters (exit_code=0)
  • 22:44 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-run-puppet-on-db-masters
  • 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 22:43 legoktm@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2021-06-28 22:43:04.512602
  • 22:43 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 22:43 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 22:42 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 22:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0)
  • 22:42 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions
  • 22:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 22:42 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 22:42 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 22:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 22:41 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 22:41 legoktm@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2021-06-28 22:41:41.222740
  • 22:41 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 22:40 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 22:40 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 22:40 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 22:38 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 22:38 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 22:32 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 22:32 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 22:32 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 22:31 legoktm: starting DC switchover live test, which will "switch" us from codfw -> eqiad
  • 22:28 eileen: civicrm revision changed from 9d1203fb28 to 927ab7cff7, config revision is 1739c53fcb
  • 22:09 legoktm: live-hacked spicerack on cumin1001 to ignore x2, see https://phabricator.wikimedia.org/T285519#7182377
  • 21:55 Krinkle: krinkle@mwmaint1002: purgeParserCache.php --tag pc2, ref T282761
  • 20:03 cstone: payments-wiki revision is d9892207c1
  • 19:48 krinkle@deploy1002: Synchronized php-1.37.0-wmf.11/maintenance/: I618bc1 (duration: 00m 56s)
  • 19:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.11/includes/libs/objectcache/: T282761 - I618bc1 (duration: 00m 56s)
  • 19:45 krinkle@deploy1002: Synchronized php-1.37.0-wmf.11/includes/objectcache/SqlBagOStuff.php: T282761 - I618bc1 (duration: 00m 59s)
  • 18:40 ebernhardson@deploy1002: Synchronized wmf-config/: T281515: Prepare Cirrus more_like for dc switchover (duration: 01m 02s)
  • 18:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/WelcomeSurveyHooks.php: ecf1d6c: Make it possible to force opt-in/opt-out to Growth features during account creation (T284119; T284800; 3/3) (duration: 00m 55s)
  • 18:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/HelpPanelHooks.php: ecf1d6c: Make it possible to force opt-in/opt-out to Growth features during account creation (T284119; T284800; 2/3) (duration: 00m 55s)
  • 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/includes/HomepageHooks.php: ecf1d6c: Make it possible to force opt-in/opt-out to Growth features during account creation (T284119; T284800; 1/3) (duration: 00m 58s)
  • 18:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/VisualEditor/: 794a46c: Hotfix for broken "Extract show all to placeholder class" (T284636; T285571) (duration: 00m 57s)
  • 18:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4ae0fdd: Enable DiscussionTools topicsubscription as beta feature on partner wikis (T274280) (duration: 00m 57s)
  • 18:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5b59184: Remove redundant wgDiscussionToolsEnable overrides (duration: 00m 56s)
  • 18:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1043c93: Growth: Enable community configuration at all Growth wikis (T285423) (duration: 00m 56s)
  • 16:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:44 sukhe: Traffic: depool eqiad from user traffic
  • 15:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:30 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=thanos-.*,name=eqiad
  • 15:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:09 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0)
  • 15:08 jayme@cumin1001: START - Cookbook sre.switchdc.services.02-restore-ttl
  • 15:07 gehel: restarting wdqs-updater on all wdqs hosts for new configuration
  • 14:54 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0)
  • 14:53 jayme@cumin1001: Switching services swift, proton, mathoid, restbase, swift-ro, eventstreams, search, shellbox, eventgate-analytics-external, wdqs-internal, kartotherian, api-gateway, termbox, mobileapps, similar-users, wikifeeds, apertium, restbase-async, eventgate-main, eventgate-logging-external, ores, sessionstore, linkrecommendation, echostore, push-notifications, citoid, zotero, eventgate-analytics, wdqs, eventstreams-i
  • 14:53 jayme@cumin1001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 14:37 jayme@cumin1001: END (FAIL) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=99)
  • 14:36 jayme@cumin1001: Switching services kartotherian, proton, wdqs-internal, wikifeeds, zotero, recommendation-api, swift-ro, linkrecommendation, mobileapps, citoid, eventgate-analytics, push-notifications, eventstreams-internal, mathoid, similar-users, schema, apertium, restbase-async, shellbox, termbox, wdqs, ores, eventgate-analytics-external, swift, helm-charts, restbase, cxserver, search, sessionstore, eventstreams, api-gate
  • 14:36 jayme@cumin1001: START - Cookbook sre.switchdc.services.01-switch-dc
  • 14:35 jayme@cumin1001: END (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0)
  • 14:29 jayme@cumin1001: START - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep
  • 14:21 effie: restarted mw[1322,1329,1333,1350,1351,1352,1353,1354,1366,1367,1368,1370,1372,1373]
  • 14:07 effie: restarting busy php-fpm app servers
  • 13:07 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseRepoForeignRepositories (T257260) (2/2, beta) (duration: 00m 57s)
  • 13:06 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseRepoForeignRepositories (T257260) (1/2, prod) (duration: 00m 57s)
  • 12:59 moritzm: installing intel-microcode security updates on buster
  • 12:30 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes/media/MediaHandler.php: Backport: media: Handle lack of 'metadata' key from getSizeAndMetadata gracefully (T285490) (duration: 00m 56s)
  • 12:24 dcausse: repool wdqs1012
  • 12:00 Lucas_WMDE: EU backport+config window done
  • 11:50 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting Wikibase repo foreignRepositories (T257260) (duration: 00m 55s)
  • 11:40 XioNoX: push "Port cloud-in4 to Capirca" to cr1/2-eqiad
  • 11:38 XioNoX: push "Port cloud-in4 to Capirca" to cr1/2-codfw
  • 11:34 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e4a088f: vector: Enable language switcher treatment A/B test on fawiki (T269093) (duration: 00m 55s)
  • 11:28 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/modules/signup/campaign.less: cd16aa2: Donor campaign: fix signup page styling (T284740) (duration: 00m 56s)
  • 11:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9495d18: GrowthExperiments: Update campaign pattern (T284800) (duration: 00m 56s)
  • 11:20 Lucas_WMDE: lucaswerkmeister-wmde@mw1384:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache && rmdir /srv/mediawiki/php-1.37.0-wmf.1' # per comments in T157030 and similar tasks
  • 11:18 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from buster master maps1009
  • 11:18 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1007.eqiad.wmnet with reason: Resyncing from buster master maps1009
  • 11:18 Lucas_WMDE: lucaswerkmeister-wmde@mw1384:~$ scap pull # did not print any errors
  • 11:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ade641b: Deploy ContentTranslation out of Beta feature in 9 WPs (T284641) (duration: 00m 56s)
  • 10:44 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:43 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:25 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2007.codfw.wmnet with reason: REIMAGE
  • 10:23 mutante: sodium - restarted nginx
  • 10:23 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2007.codfw.wmnet with reason: REIMAGE
  • 10:22 mutante: sodium (mirrors.wikimedia.org) - switching to nginx light variant T164456
  • 10:11 vgutierrez: rolling upgrade of ATS on eqiad - T285535
  • 10:11 moritzm: installing remaining libxml2 security updates
  • 09:52 vgutierrez: rolling upgrade of ATS on esams - T285535
  • 09:42 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings-labs.php: Config: Remove $wmgWikibaseClientChangesDatabase (T257260) (2/2, beta) (duration: 00m 56s)
  • 09:41 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove $wmgWikibaseClientChangesDatabase (T257260) (1/2, prod) (duration: 00m 57s)
  • 09:40 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org
  • 09:40 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
  • 09:39 Lucas_WMDE: ^ wrong gerrit change used for message, sorry
  • 09:39 lucaswerkmeister-wmde@deploy1002: sync-file aborted: Config: Stop setting Wikibase repo foreignRepositories (T257260) (1/2, prod) (duration: 00m 10s)
  • 09:27 vgutierrez: rolling upgrade of ATS on eqsin - T285535
  • 09:14 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Stop setting Wikibase client changesDatabase (T257260) (duration: 00m 55s)
  • 08:56 vgutierrez: rolling upgrade of ATS on codfw - T285535
  • 08:53 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Set idGeneratorInErrorPingLimiter to 9 for Wikidata (T284538), Part II (duration: 00m 57s)
  • 08:51 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set idGeneratorInErrorPingLimiter to 9 for Wikidata (T284538), Part I (duration: 00m 56s)
  • 08:48 mutante: phab1001 - removing 2fa for my own account
  • 08:40 vgutierrez: rolling upgrade of ATS on ulsfo - T285535
  • 08:40 jayme: drain kubestage2002 for docker restart(s)
  • 08:33 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove idGeneratorRateLimiting from production config (T274157), Part II (duration: 00m 55s)
  • 08:31 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Remove idGeneratorRateLimiting from production config (T274157), Part I (duration: 00m 58s)
  • 08:27 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Remove special configurations for Dagbani in Wikibase code (T283168) (duration: 00m 56s)
  • 08:25 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
  • 08:23 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1171.eqiad.wmnet with reason: REIMAGE
  • 08:21 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Set Wikidata's main sandbox item (T219215), Part II (duration: 00m 56s)
  • 08:19 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set Wikidata's main sandbox item (T219215), Part I (duration: 00m 57s)
  • 08:19 jynus: stop and remove db1145:s5 db2099:s5 T283235
  • 07:58 dcausse: depool and restart blazegraph on wdqs1012
  • 07:57 jelto: jelto@cumin1001:~$ sudo cumin install* 'run-puppet-agent' # update DHCP entry for gitlab2001 on install[1003,2003,3001,4001,5001].wikimedia.org
  • 07:57 dcausse: repool wdqs1005
  • 07:46 hashar@deploy1002: Finished deploy [integration/docroot@cf677eb]: integration: Change agents dashboard link from Nagf to Grafana (duration: 00m 08s)
  • 07:46 hashar@deploy1002: Started deploy [integration/docroot@cf677eb]: integration: Change agents dashboard link from Nagf to Grafana
  • 06:16 XioNoX: remove BGP to AS13768 in AMS-IX

2021-06-27

  • 09:10 elukey: cumin 'A:mw-eqiad and not P{mw13[67,54,55,72,33,50,51,73,52,49,53,65,71,84,68,70,66,91,89,97,95,99,85,93,87]*} and not P{mw14[09,03,11,07,05,01]*} and not P{mw12[61-69]*} and not P{mwdebug*}' '/usr/local/sbin/restart-php7.2-fpm' -b 1 -s 30
  • 09:10 elukey: roll restart the remaining mw appservers to clear out apcu framentation (cumin command to follow)
  • 08:58 elukey: slow roll restart (cumin -b 1 -s 30) of mw126[1-7]'s php-fpm (75-80% of apcu fragmentation)
  • 08:37 elukey: restart php-fpm on mw1268 mw1269 - low idle workers
  • 08:23 elukey: restart php-fpm on mw1401

2021-06-26

  • 21:28 volans: upgraded spicerack to v0.0.56 on the cumin hosts (includes only bug fixes for the switchdc)
  • 21:23 volans: uploaded spicerack_0.0.56 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 16:37 elukey: restart php-fpm on mw1387
  • 15:43 elukey: restart php-fpm on mw1393
  • 15:39 elukey: restart php-fpm on mw1405 mw1399 mw1385
  • 15:37 elukey: restart php-fpm on mw1397 mw1395 mw1411 mw1407
  • 15:31 elukey: restart php-fpm on mw1391 mw1389 mw1403
  • 13:49 elukey: restart php-fpm on mw1368 mw1370 mw1366 mw1409
  • 13:43 elukey: depool mw1384 for investigation
  • 13:43 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=mw1384.eqiad.wmnet
  • 13:33 elukey: restart phpfpm on mw1353 mw1365 mw1371
  • 13:30 elukey: restart php-fpm on mw1351 mw1373 mw1352 mw1349
  • 13:23 elukey: restart-phpfpm on mw1350 (0 idle php workers)
  • 13:20 elukey: restart-phpfpm on mw1333 (0 idle php workers)
  • 10:08 elukey: restart php-fpm on mw1372 - T285593
  • 10:07 elukey: restart php-fpm on mw1372 - T285593
  • 09:45 elukey: restart php-fpm on mw135[4-5]
  • 09:44 elukey: restart php-fpm on mw1354
  • 09:38 elukey: reboot mw1414 (not reachable via ssh, nor via mgmt console)
  • 09:33 elukey: restart php-fpm on mw1367 (php fatal memory errors, php7adm /apcu-frag returns errors)

2021-06-25

  • 21:37 ebernhardson@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/CirrusSearch/: cirrus: Revert "Stop querying ores_articletopic" (3/3) (duration: 01m 01s)
  • 21:35 ebernhardson@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/CirrusSearch/includes/Wikimedia/WeightedTagsHooks.php: cirrus: Revert "Stop querying ores_articletopic" (2/3) (duration: 00m 58s)
  • 21:34 ebernhardson@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/CirrusSearch/includes/Parser/FullTextKeywordRegistry.php: cirrus: Revert "Stop querying ores_articletopic" (1/3) (duration: 00m 58s)
  • 20:32 legoktm: legoktm@mwmaint1002:~$ sudo systemctl reset-failed # to clear icinga alert
  • 20:28 legoktm: legoktm@mwmaint1002:~$ sudo systemctl start mediawiki_job_update_special_pages.service (T285583)
  • 20:21 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.Homepage.SuggestedEdits.js: eaec745: SuggestedEdits: Only log task impression for EditCardWidget (T283546; emergency deployment) (duration: 01m 00s)
  • 18:08 legoktm: legoktm@ms-fe2005:~$ sudo systemctl unmask swiftrepl-mw.service
  • 15:46 mutante: mw1326, mw1327, mw1328, mw1329 ... restarted php-fpm
  • 15:41 mutante: mw1330, mw1320, mw1321, mw1322 - restarted php-fpm
  • 15:38 mutante: [mw1330:~] $ sudo restart-php7.2-fpm
  • 15:36 mutante: [mw1332:~] $ sudo restart-php7.2-fpm
  • 15:28 mutante: [mw1319:~] $ sudo restart-php7.2-fpm
  • 15:20 rzl: rzl@mw1320:~$ sudo restart-php7.2-fpm # workers stuck since the ~14:00 request spike
  • 15:03 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:44 jelto@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host gitlab2001.wikimedia.org
  • 14:28 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps2007.codfw.wmnet with reason: reimaging as buster replica
  • 14:28 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps2007.codfw.wmnet with reason: reimaging as buster replica
  • 13:50 jelto@cumin1001: START - Cookbook sre.ganeti.makevm for new host gitlab2001.wikimedia.org
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 13:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 13:08 vgutierrez: update ATS to version 8.0.8-1wm4 on cp4026 and cp4032 - T285535
  • 13:06 vgutierrez: upload trafficserver 8.0.8-1wm4 to apt.wm.o (buster) - T285535
  • 12:28 moritzm: installing nmap bugfix update from Buster point release
  • 12:28 moritzm: installing nmal bugfix update from Buster point release
  • 11:28 moritzm: installing 4.19.194 kernels on Buster from latest 10.10 point release (no reboots, just rolling out the packages)
  • 09:15 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol[1003-1005].wikimedia.org with reason: openstack issue
  • 09:15 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol[1003-1005].wikimedia.org with reason: openstack issue
  • 09:13 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on cloudcontrol1003.wikimedia.org with reason: Known issue, working on it
  • 09:13 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on cloudcontrol1003.wikimedia.org with reason: Known issue, working on it
  • 09:04 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
  • 09:02 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
  • 09:02 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet
  • 08:55 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
  • 08:54 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2001.codfw.wmnet
  • 08:52 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2001.codfw.wmnet
  • 08:52 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2002.codfw.wmnet
  • 08:51 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2002.codfw.wmnet
  • 08:51 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2003.codfw.wmnet
  • 08:48 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2003.codfw.wmnet
  • 08:12 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
  • 08:07 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
  • 08:07 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet
  • 08:04 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1002.eqiad.wmnet
  • 08:04 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
  • 08:01 elukey: reboot an-worker1101 to unblock stuck GPU
  • 08:00 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
  • 08:00 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
  • 07:58 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
  • 07:58 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
  • 07:57 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
  • 07:42 moritzm: imported Jenkins 2.289.1 to thirdparty/ci for buster-wikimedia T285531
  • 07:30 dcausse: depool and restart blazegraph on wdqs1005
  • 07:17 dcausse: installing openjdk-8-dbg on wdqs1005 to debug blazegraph

2021-06-24

  • 23:02 legoktm: reverted cumin1001 spicerack live hacks
  • 22:57 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 22:55 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 22:55 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0)
  • 22:55 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restore-ttl
  • 22:36 volans: set x2 codfw master back to RW
  • 22:30 legoktm@cumin1001: END (ERROR) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=97)
  • 22:29 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 22:29 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 22:29 legoktm@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2021-06-24 22:29:25.643909
  • 22:29 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 22:09 legoktm@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
  • 22:09 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 22:06 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0)
  • 22:05 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-warmup-caches
  • 22:04 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 22:04 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 22:01 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 22:01 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 21:59 legoktm@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 21:59 legoktm@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 21:47 legoktm: live hacked spicerack on cumin1001 to revert https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/700963/
  • 20:58 legoktm: starting dry run and live test of DC switchover
  • 20:53 legoktm: legoktm@phab1001:~$ sudo /srv/phab/phabricator/bin/remove destroy M320 (spam)
  • 20:44 volans: uploaded spicerack_0.0.55 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 20:28 legoktm: re-enabled daily digests for wikimedia-l - T285486
  • 19:10 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.11
  • 19:07 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 04s)
  • 19:06 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
  • 19:04 dduvall: preparing to roll group2 to 1.37.0-wmf.11 (T281152) (cc risky patch contacts Amir1 Krinkle DannyS712)
  • 19:02 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:56 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:18 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:12 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:12 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 06s)
  • 17:11 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
  • 17:08 dduvall: re-rolling group1 to 1.37.0-wmf.11 (T281152) following deployment of blocker fixes (cc risky patch contacts Amir1 Krinkle DannyS712)
  • 16:12 twentyafterfour: restarted php7.3-fpm on phab1001
  • 15:43 hnowlan: running `nodetool decommission` on maps2007
  • 15:42 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2007.codfw.wmnet
  • 15:42 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2007.codfw.wmnet with reason: depooling and reimaging as buster replica
  • 15:42 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2007.codfw.wmnet with reason: depooling and reimaging as buster replica
  • 15:31 moritzm: installing jackson-databind security updates
  • 15:26 moritzm: installing ruby-websocket-extensions security updates
  • 15:02 hnowlan: reenabling puppet on P{C:Postgresql::Slave}
  • 14:59 moritzm: restarting mw canaries to pick up libxml2 security update
  • 14:57 moritzm: installing libxml2 security updates on buster
  • 14:46 hnowlan: Disabling puppet on P{C:Postgresql::Slave} (netboxdb2001,puppetdb2002, most maps hosts) to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/700071
  • 13:29 volans: uploaded python3-wmflib_0.0.8 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 12:45 tgr: EU deploys done
  • 12:44 tgr@deploy1002: Finished scap: Backport: Re-apply "Add custom signup flow for donors", step 3 (T284799 T284740 T284800 T285281) (duration: 26m 07s)
  • 12:18 tgr@deploy1002: Started scap: Backport: Re-apply "Add custom signup flow for donors", step 3 (T284799 T284740 T284800 T285281)
  • 12:08 tgr@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments: Backport: Re-apply "Add custom signup flow for donors", step 2 (T284799 T284740 T284800 T285281) (duration: 01m 06s)
  • 11:53 jayme: import dragonfly_1.0.6-1 into buster-wikimedia
  • 11:44 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on registry2008.codfw.wmnet with reason: Dragonfly tests (jayme)
  • 11:44 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on registry2008.codfw.wmnet with reason: Dragonfly tests (jayme)
  • 11:37 jayme: depooling registry2008 for some dragonfly testing
  • 11:37 jayme@cumin1001: conftool action : set/pooled=inactive; selector: name=registry2008.codfw.wmnet,dc=codfw,cluster=docker-registry
  • 11:34 tgr@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments: Backport: Re-apply "Add custom signup flow for donors", step 1 (T284799 T284740 T284800 T285281) (duration: 01m 06s)
  • 11:25 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Update $wgNamespacesToBeSearchedDefault for wikimania (T284793) (duration: 01m 07s)
  • 11:21 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable OCR tool on all Wikisources (T285311) (duration: 01m 06s)
  • 11:11 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Enable link recommendation feature for more wikis (T284481) (duration: 01m 07s)
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16723 and previous config saved to /var/cache/conftool/dbconfig/20210624-092226-marostegui.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16722 and previous config saved to /var/cache/conftool/dbconfig/20210624-092157-marostegui.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16721 and previous config saved to /var/cache/conftool/dbconfig/20210624-092105-marostegui.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16720 and previous config saved to /var/cache/conftool/dbconfig/20210624-092029-marostegui.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s1 weights T284897', diff saved to https://phabricator.wikimedia.org/P16719 and previous config saved to /var/cache/conftool/dbconfig/20210624-091949-marostegui.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s2 weights T284897', diff saved to https://phabricator.wikimedia.org/P16718 and previous config saved to /var/cache/conftool/dbconfig/20210624-091753-marostegui.json
  • 09:02 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes: Backport: media: Make the file metadata "_error" check looser (T285431) (duration: 01m 12s)
  • 08:55 legoktm: root@lists1001:/var/log/mailman# rm -rf *
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s3 weights T284897', diff saved to https://phabricator.wikimedia.org/P16717 and previous config saved to /var/cache/conftool/dbconfig/20210624-084147-marostegui.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 weights T284897', diff saved to https://phabricator.wikimedia.org/P16716 and previous config saved to /var/cache/conftool/dbconfig/20210624-081409-marostegui.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 weights T284897', diff saved to https://phabricator.wikimedia.org/P16715 and previous config saved to /var/cache/conftool/dbconfig/20210624-081251-marostegui.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s5 weights T284897', diff saved to https://phabricator.wikimedia.org/P16714 and previous config saved to /var/cache/conftool/dbconfig/20210624-081137-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1130 from s5 api T284897', diff saved to https://phabricator.wikimedia.org/P16713 and previous config saved to /var/cache/conftool/dbconfig/20210624-080945-marostegui.json
  • 08:09 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on 216 hosts with reason: Change replication monitoring config T284897
  • 08:08 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 0:45:00 on 216 hosts with reason: Change replication monitoring config T284897
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s6 weights T284897', diff saved to https://phabricator.wikimedia.org/P16712 and previous config saved to /var/cache/conftool/dbconfig/20210624-075613-marostegui.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s7 weights T284897', diff saved to https://phabricator.wikimedia.org/P16711 and previous config saved to /var/cache/conftool/dbconfig/20210624-074200-marostegui.json
  • 07:35 eileen: civicrm revision changed from 6d3dd6e5a5 to 9d1203fb28, config revision is 735af27f0d
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Adjust s8 weights T284897', diff saved to https://phabricator.wikimedia.org/P16710 and previous config saved to /var/cache/conftool/dbconfig/20210624-072657-marostegui.json
  • 03:57 dwisehaupt: civicrm revision is 6d3dd6e5a5, config revision is 735af27f0d
  • 03:26 dwisehaupt: civicrm revision is 6d3dd6e5a5, config revision is 1e8e9ac7b9
  • 00:25 eileen: civicrm revision changed from bd906975f0 to 6d3dd6e5a5, config revision is 821e5889f7
  • 00:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1447.eqiad.wmnet with reason: REIMAGE
  • 00:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
  • 00:13 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1447.eqiad.wmnet with reason: REIMAGE
  • 00:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
  • 00:11 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1446.eqiad.wmnet with reason: REIMAGE
  • 00:10 eileen: process-control config revision is 821e5889f7
  • 00:09 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1445.eqiad.wmnet with reason: REIMAGE
  • 00:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1443.eqiad.wmnet with reason: REIMAGE
  • 00:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1442.eqiad.wmnet with reason: REIMAGE
  • 00:05 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1443.eqiad.wmnet with reason: REIMAGE
  • 00:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1441.eqiad.wmnet with reason: REIMAGE
  • 00:03 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1442.eqiad.wmnet with reason: REIMAGE
  • 00:02 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
  • 00:01 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
  • 00:01 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1441.eqiad.wmnet with reason: REIMAGE

2021-06-23

  • 23:59 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE
  • 23:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1438.eqiad.wmnet with reason: REIMAGE
  • 23:57 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE
  • 23:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1437.eqiad.wmnet with reason: REIMAGE
  • 23:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1438.eqiad.wmnet with reason: REIMAGE
  • 23:54 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
  • 23:53 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1437.eqiad.wmnet with reason: REIMAGE
  • 23:52 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
  • 23:51 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE
  • 23:50 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
  • 23:49 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE
  • 23:48 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1433.eqiad.wmnet with reason: REIMAGE
  • 23:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1432.eqiad.wmnet with reason: REIMAGE
  • 23:46 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE
  • 23:45 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1431.eqiad.wmnet with reason: REIMAGE
  • 23:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1433.eqiad.wmnet with reason: REIMAGE
  • 23:43 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1432.eqiad.wmnet with reason: REIMAGE
  • 23:42 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1429.eqiad.wmnet with reason: REIMAGE
  • 23:41 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1430.eqiad.wmnet with reason: REIMAGE
  • 23:41 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1431.eqiad.wmnet with reason: REIMAGE
  • 23:39 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1428.eqiad.wmnet with reason: REIMAGE
  • 23:39 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1430.eqiad.wmnet with reason: REIMAGE
  • 23:37 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: REIMAGE
  • 23:36 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1427.eqiad.wmnet with reason: REIMAGE
  • 23:35 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1428.eqiad.wmnet with reason: REIMAGE
  • 23:34 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1426.eqiad.wmnet with reason: REIMAGE
  • 23:33 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1427.eqiad.wmnet with reason: REIMAGE
  • 23:32 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1425.eqiad.wmnet with reason: REIMAGE
  • 23:31 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1426.eqiad.wmnet with reason: REIMAGE
  • 23:30 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1424.eqiad.wmnet with reason: REIMAGE
  • 23:29 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1425.eqiad.wmnet with reason: REIMAGE
  • 23:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1423.eqiad.wmnet with reason: REIMAGE
  • 23:27 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1424.eqiad.wmnet with reason: REIMAGE
  • 23:26 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
  • 23:25 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1423.eqiad.wmnet with reason: REIMAGE
  • 23:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
  • 23:23 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE
  • 23:22 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.9
  • 23:21 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1420.eqiad.wmnet with reason: REIMAGE
  • 23:21 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE
  • 23:19 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1419.eqiad.wmnet with reason: REIMAGE
  • 23:19 dduvall: rolling back 1.37.0-wmf.11 from group1 (T281152) due to reoccurrence of "PHP Notice: Undefined index: frameCount" now at PNGHandler.php:156 (T285431)
  • 23:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1420.eqiad.wmnet with reason: REIMAGE
  • 23:17 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1418.eqiad.wmnet with reason: REIMAGE
  • 23:17 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1419.eqiad.wmnet with reason: REIMAGE
  • 23:15 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1417.eqiad.wmnet with reason: REIMAGE
  • 23:15 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1418.eqiad.wmnet with reason: REIMAGE
  • 23:14 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 04s)
  • 23:13 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1416.eqiad.wmnet with reason: REIMAGE
  • 23:13 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
  • 23:13 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1417.eqiad.wmnet with reason: REIMAGE
  • 23:11 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1415.eqiad.wmnet with reason: REIMAGE
  • 23:11 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1416.eqiad.wmnet with reason: REIMAGE
  • 23:10 dduvall: re-rolling group1 to 1.37.0-wmf.11 (T281152) following deployment of blocker fixes
  • 23:09 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1415.eqiad.wmnet with reason: REIMAGE
  • 23:05 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes/media/GIFHandler.php: Backport: Check for _error in getting metadata array in GIFHandler (T285431) (duration: 01m 06s)
  • 22:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.11/includes/media/PNGHandler.php: Backport: Check for _error in getting metadata array in PNGHandler (T285431) (duration: 01m 06s)
  • 22:26 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1414.eqiad.wmnet with reason: REIMAGE
  • 22:24 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1414.eqiad.wmnet with reason: REIMAGE
  • 21:45 sbassett: Deployed updated security patch for T285190 to wmf.9 and wmf.11
  • 20:55 ejegg: updated payments-wiki from 42cfbe832d to d9892207c1
  • 20:38 eileen: civicrm revision changed from 53d103f672 to bd906975f0, config revision is 6a88618c3e
  • 20:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 20:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:42 dduvall@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.9
  • 19:39 dduvall: rolling back wmf.11 from group1 due to increase in logspam possibly related to noted risky patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/693298 (cc T281152 and patch contact Amir1)
  • 19:35 herron: rebooting kafkamon hosts for updates
  • 19:26 dduvall@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 06s)
  • 19:25 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11
  • 19:20 dduvall: preparing to promote wmf.11 group1 (T281152) cc'ing risky patch contacts Amir1, Krinkle, DannyS712
  • 19:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6e0f5ad: Enable GrowthExperiments donor landing page for testing (T284799) (duration: 01m 05s)
  • 19:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/: 2338e53: Revert "Add custom signup flow for donors" (T284740; T284800; T285281) (duration: 01m 06s)
  • 18:58 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:55 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/: REVERT: 76e5fc9: Add custom signup flow for donors (T284740; T284800; T285281) (duration: 00m 38s)
  • 18:55 urbanecm@deploy1002: sync-file aborted: REVERT: 76e5fc9: Add custom signup flow for donors (T284740; T284800; T285281) (duration: 00m 01s)
  • 18:54 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:54 urbanecm@deploy1002: Scap failed!: 6/9 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 18:53 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: 76e5fc9: Add custom signup flow for donors (T284740; T284800; T285281) (duration: 01m 07s)
  • 18:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/WikimediaEvents/extension.json: 01f034b: Finalize WMDEBanner* schema migration to Event Platform (T282562) (duration: 01m 05s)
  • 18:35 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: 17efbaf: EditGrowthConfig: Suggested edit "Learn more" link should support interwiki (T279886; T285385) (duration: 01m 06s)
  • 18:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3a2fc6e: Enable $wgSecurePollSingleTransferableVoteEnabled on beta sites (duration: 01m 05s)
  • 18:31 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@0535b94]: expect eventgate events for all datacenters, second try (duration: 09m 11s)
  • 18:22 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@0535b94]: expect eventgate events for all datacenters, second try
  • 18:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b4a7867: Make Growth features available to newcomers at lvwiki and skwiki (T278191; T284149) (duration: 01m 06s)
  • 17:58 herron: beginning rolling reboots of kafka-main100[1-5] for updates
  • 17:57 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable canary events for NavigationTiming ext streams - T271208, T266798 (duration: 01m 29s)
  • 17:07 herron: beginning rolling reboots of kafka-main200[1-5] for updates
  • 16:42 XioNoX: re-start sending traffic on the codfw-eqsin Telia transport link
  • 15:17 topranks: Removing peering to AS64050 / "BGP Consultancy Pte Ltd" at AMS-IX (cr2-esams). Peer has left IX.
  • 14:54 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s1
  • 14:53 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s8
  • 13:54 effie: rolling restart thanos-fe* to pick up new tegola-vector-tiles account - T283049
  • 13:45 volans: uploaded cumin_4.1.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:27 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s4
  • 12:59 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s3
  • 12:46 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s7
  • 12:35 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s6
  • 12:26 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s5
  • 12:15 urbanecm: [urbanecm@mwmaint1002 ~]$ foreachwikiindblist s2 recountCategories.php --mode=pages && foreachwikiindblist s2 recountCategories.php --mode=subcats && foreachwikiindblist s2 recountCategories.php --mode=files # T170737
  • 11:46 XioNoX: Simplify labs-in4/6 firewall filters - CR700939
  • 11:10 topranks: Removing peering to AS39651 / "Com Hem AB" at AMS-IX (cr2-esams). Peer has left IX.
  • 10:44 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:35 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@9f16a6b]: (no justification provided) (duration: 00m 20s)
  • 09:35 mbsantos@deploy1002: Started deploy [kartotherian/deploy@9f16a6b]: (no justification provided)
  • 09:22 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:48 volans: sudo systemctl start ferm.service on thanos-fe2002 (DNS query timeout)
  • 08:34 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@9f16a6b]: (no justification provided) (duration: 00m 14s)
  • 08:34 mbsantos@deploy1002: Started deploy [kartotherian/deploy@9f16a6b]: (no justification provided)
  • 07:57 kart_: cxserver: Removed Matxin MT support and added more language support to Elia MT (T285199, T284900)
  • 07:54 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 07:49 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 07:46 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 07:26 legoktm: uploaded mailman3_3.3.3-1~bpo10+6_amd64.changes on apt1001
  • 07:08 legoktm: updating mailman packages on lists1001 and restarting (T285120, T280889)
  • 06:56 ryankemper: [WDQS] `ryankemper@wdqs1006:~$ sudo pool`
  • 06:37 ryankemper: [WDQS] `ryankemper@wdqs2001:~$ sudo pool`
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16703 and previous config saved to /var/cache/conftool/dbconfig/20210623-062819-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16702 and previous config saved to /var/cache/conftool/dbconfig/20210623-061316-root.json
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16701 and previous config saved to /var/cache/conftool/dbconfig/20210623-055812-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Start repooling db1100', diff saved to https://phabricator.wikimedia.org/P16700 and previous config saved to /var/cache/conftool/dbconfig/20210623-054252-marostegui.json
  • 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16699 and previous config saved to /var/cache/conftool/dbconfig/20210623-045217-root.json
  • 01:04 eileen: process-control config revision is 6a88618c3e
  • 00:50 eileen: civicrm revision changed from c745d4f075 to 03bead707d, config revision is 4ab72c1033
  • 00:40 legoktm: uploaded new versions of flufl.bounce_4.0-1_amd64.changes hyperkitty_1.3.4-2~bpo10+4_amd64.changes mailman3_3.3.3-1~bpo10+5_amd64.changes mailman-hyperkitty_1.1.0-10~bpo10+1_amd64.changes to apt1001
  • 00:02 Trey314159: reindexing Portuguese wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (T284185)

2021-06-22

  • 23:23 ebernhardson@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable canary events for search event streams (duration: 01m 05s)
  • 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7865f27: Add unwatchedpages to rollbacker on frwiki (T285334) (duration: 01m 06s)
  • 23:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9a594f0: Enable Growth features in dark mode at nlwiki (T285254; 3/3) (duration: 01m 07s)
  • 23:05 urbanecm@deploy1002: Synchronized wmf-config/config/nlwiki.yaml: 9a594f0: Enable Growth features in dark mode at nlwiki (T285254; 2/3) (duration: 01m 05s)
  • 23:04 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 9a594f0: Enable Growth features in dark mode at nlwiki (T285254; 1/3) (duration: 01m 37s)
  • 22:42 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript recountCategories.php --wiki=zhwiki --mode=subcats # T170737
  • 22:41 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript recountCategories.php --wiki=zhwiki --mode=pages # T170737
  • 22:38 urbanecm: mwscript recountCategories.php --wiki=eowiktionary --mode={pages,subcats,files} (T170737)
  • 21:05 eileen: civicrm revision changed from 629bd3b7b7 to c745d4f075, config revision is 4ab72c1033
  • 21:05 ejegg: updated payments-wiki from 7be0534b91 to 42cfbe832d
  • 20:46 brennen: gitlab1001: running ansible to deploy CAS: stop marking users as external (T274461)
  • 20:23 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-web1001.eqiad.wmnet with reason: REIMAGE
  • 20:19 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-web1001.eqiad.wmnet with reason: REIMAGE
  • 20:12 Trey314159: reindexing Portuguese wikis on elastic@eqiad, elastic@codfw, and cloudelastic (T284185)
  • 20:12 Trey314159: reindexing Dutch wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (T284185)
  • 19:58 brennen: gitlab1001: run ansible to deploy https://gerrit.wikimedia.org/r/c/operations/gitlab-ansible/+/699812 (T264231)
  • 19:26 legoktm: set mediawiki-l message acceptance to discard non-member posts instead of reject
  • 19:09 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.11
  • 19:06 dduvall: preparing to promote wmf.11 group0 (T281152) cc'ing risking patch contacts Amir1, Krinkle, DannyS712
  • 19:01 dduvall@deploy1002: Pruned MediaWiki: 1.37.0-wmf.6 (duration: 03m 35s)
  • 18:46 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@75d35b4]: revert expect eventgate canary events in all dcs (duration: 04m 23s)
  • 18:42 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@75d35b4]: revert expect eventgate canary events in all dcs
  • 18:31 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thumbor1006.eqiad.wmnet with reason: REIMAGE
  • 18:30 awight@deploy1002: Synchronized php-1.37.0-wmf.11/extensions/VisualEditor: Backport: Revert "Fall back from explicit parameter order to TemplateData sort" () (duration: 01m 09s)
  • 18:28 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thumbor1006.eqiad.wmnet with reason: REIMAGE
  • 18:28 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thumbor1005.eqiad.wmnet with reason: REIMAGE
  • 18:27 awight@deploy1002: sync-file aborted: Backport: Revert "Fall back from explicit parameter order to TemplateData sort" () (duration: 00m 40s)
  • 18:26 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on thumbor1005.eqiad.wmnet with reason: REIMAGE
  • 18:19 legoktm: pulled in updates for thirdparty/kubeadm-k8s-1-18 buster-wikimedia on apt1001
  • 17:47 brennen: gitlab1001: run ansible to deploy https://gerrit.wikimedia.org/r/700851 (T274463)
  • 17:43 dduvall: testwikis to 1.37.0-wmf.11 (cc open blockers T285125 T285118 T271011)
  • 17:41 dduvall@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.11 (duration: 30m 59s)
  • 17:21 moritzm: installing isc-dhcp security updates
  • 17:18 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:16 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:14 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 17:11 moritzm: installing ruby-websocket-extensions security updates
  • 17:10 dduvall@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.11
  • 17:08 mbsantos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:07 moritzm: installing velocity security updates
  • 17:07 mbsantos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 17:04 dduvall: 1.37.0-wmf.11 was branched at c161d3b for T281152
  • 17:04 mbsantos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'wikifeeds' for release 'staging' .
  • 16:41 Trey314159: reindexing Dutch wikis on elastic@eqiad, elastic@codfw, and cloudelastic (T284185)
  • 14:57 dcausse@deploy1002: Finished deploy [wdqs/wdqs@b082ccc]: wdqs 0.3.74 (duration: 13m 26s)
  • 14:43 dcausse@deploy1002: Started deploy [wdqs/wdqs@b082ccc]: wdqs 0.3.74
  • 14:37 XioNoX: start updating analytics firewall rules to capirca generated ones on cr2-eqiad - T279429
  • 14:35 hoo: Updated the Wikidata property suggester with data from the 2021-05-31 JSON dump (with pre-applied T132839 workarounds)
  • 14:01 XioNoX: start updating analytics firewall rules to capirca generated ones on cr1-eqiad - T279429
  • 13:49 kormat: disabling puppet on A:db-all for T285079
  • 13:38 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki-staging/php]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=nlwiki --phab=T285254 # T285254
  • 13:37 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki-staging]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=nlwiki growthexperiments # T285254
  • 13:33 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Correctly enable Vector language switcher treatment A/B test (T269093) (duration: 00m 57s)
  • 13:29 urbanecm: urbanecm@mwmaint1002:~$ foreachwikiindblist growthexperiments extensions/WikimediaMaintenance/createExtensionTables.php growthexperiments # T266913
  • 13:29 Trey314159: reindexing German wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (T284185)
  • 12:04 Lucas_WMDE: backport+config window done
  • 12:03 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable new Vector Languages-in-header feature & AB test for pilot wikis (T269093) (duration: 00m 56s)
  • 11:58 Lucas_WMDE: lucaswerkmeister-wmde@mwdebug1001:~$ sudo -u mwdeploy sh -c 'rm /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n/l10n_cache-*.cdb && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache/l10n && rmdir /srv/mediawiki/php-1.37.0-wmf.1/cache && rmdir /srv/mediawiki/php-1.37.0-wmf.1' # per comments in T157030 and similar tasks
  • 11:54 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/UniversalLanguageSelector/: Backport: launchULS: Add context to interface.language.change hook (T280770) (duration: 00m 57s)
  • 11:35 moritzm: installing fluidsynth security updates
  • 11:17 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: enwiki: Remove 'collectionsaveascommunitypage' from the 'autoconfirmed' user group (T283523) (duration: 00m 56s)
  • 11:06 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16691 and previous config saved to /var/cache/conftool/dbconfig/20210622-110619-kormat.json
  • 10:51 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16690 and previous config saved to /var/cache/conftool/dbconfig/20210622-105115-kormat.json
  • 10:36 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16689 and previous config saved to /var/cache/conftool/dbconfig/20210622-103612-kormat.json
  • 10:21 kormat@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16688 and previous config saved to /var/cache/conftool/dbconfig/20210622-102108-kormat.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16687 and previous config saved to /var/cache/conftool/dbconfig/20210622-094019-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16686 and previous config saved to /var/cache/conftool/dbconfig/20210622-092515-root.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16685 and previous config saved to /var/cache/conftool/dbconfig/20210622-092056-root.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16684 and previous config saved to /var/cache/conftool/dbconfig/20210622-091012-root.json
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16683 and previous config saved to /var/cache/conftool/dbconfig/20210622-090552-root.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166 after upgrade', diff saved to https://phabricator.wikimedia.org/P16682 and previous config saved to /var/cache/conftool/dbconfig/20210622-085508-root.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16681 and previous config saved to /var/cache/conftool/dbconfig/20210622-085049-root.json
  • 08:49 marostegui: Upgrade db1166
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16680 and previous config saved to /var/cache/conftool/dbconfig/20210622-084915-marostegui.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169 after schema change', diff saved to https://phabricator.wikimedia.org/P16679 and previous config saved to /var/cache/conftool/dbconfig/20210622-083545-root.json
  • 07:53 joe: uploaded wmf-certificates package to buster-wikimedia/main, T284417
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169 T283499', diff saved to https://phabricator.wikimedia.org/P16678 and previous config saved to /var/cache/conftool/dbconfig/20210622-072828-marostegui.json
  • 06:43 dcausse: repool wdqs1005
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1100.eqiad.wmnet with reason: REIMAGE
  • 06:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1100.eqiad.wmnet with reason: REIMAGE
  • 05:06 marostegui: Stop replication on old s5 master ( db1100) - T284529
  • 05:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool old master running 10.1 T284529', diff saved to https://phabricator.wikimedia.org/P16677 and previous config saved to /var/cache/conftool/dbconfig/20210622-050602-marostegui.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1130 to s5 master and set section read-write T284529', diff saved to https://phabricator.wikimedia.org/P16676 and previous config saved to /var/cache/conftool/dbconfig/20210622-050123-root.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T284529', diff saved to https://phabricator.wikimedia.org/P16675 and previous config saved to /var/cache/conftool/dbconfig/20210622-050036-root.json
  • 05:00 marostegui: Starting s5 eqiad failover from db1100 to db1130 - T284529
  • 04:20 marostegui: Start topology changes for s5 switchover T284529
  • 04:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Master switchover s5 T284529
  • 04:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Master switchover s5 T284529
  • 04:11 eileen: process-control config revision is 4ab72c1033
  • 01:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti2026.codfw.wmnet with reason: REIMAGE
  • 01:00 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2026.codfw.wmnet with reason: REIMAGE
  • 00:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti2025.codfw.wmnet with reason: REIMAGE
  • 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2025.codfw.wmnet with reason: REIMAGE

2021-06-21

  • 23:16 krinkle@deploy1002: Synchronized wmf-config/mc.php: I13646a5557c9 (duration: 00m 55s)
  • 23:12 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I302a71 (duration: 00m 56s)
  • 23:08 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: Idcac4d (duration: 00m 56s)
  • 23:05 krinkle@deploy1002: Synchronized wmf-config/mc.php: I877a3e (duration: 00m 57s)
  • 23:04 krinkle@deploy1002: Synchronized wmf-config/mc.php: Icc2676 (duration: 00m 56s)
  • 22:57 krinkle@deploy1002: Synchronized wmf-config/mc.php: Iea94283c53 (duration: 00m 57s)
  • 22:53 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: Iea94283c53 (duration: 00m 57s)
  • 22:42 eileen: civicrm revision changed from 0fca489063 to 629bd3b7b7, config revision is 2aed6ff89b
  • 22:41 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=viwiki --fix # T284868 # P16674
  • 22:13 eileen: civicrm revision changed from acbcce94a2 to 0fca489063, config revision is 2aed6ff89b
  • 21:11 sbassett: Deployed security patch for T285190
  • 19:19 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on doh1001.wikimedia.org with reason: temporarily depooling host
  • 19:19 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on doh1001.wikimedia.org with reason: temporarily depooling host
  • 18:41 ppchelko@deploy1002: Synchronized wmf-config/wikitech.php: Replace uses of AbstractBlock::getTarget() T284141 (duration: 00m 58s)
  • 18:30 urbanecm@deploy1002: Synchronized wmf-config/PoolCounterSettings.php: af61f1a: Add pool counter for automated search requests (T284479) (duration: 00m 59s)
  • 18:30 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@40b4b2f]: T273854 Airflow dag to extract and process sparql queries (duration: 07m 11s)
  • 18:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f7db2b9: Enable wikilove on hewikisource (T284864) (duration: 00m 56s)
  • 18:26 brennen: gitlab1001: running ansible for copying latest backup to dedicated folder (T274463)
  • 18:24 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hewikisource wikilove # T284864
  • 18:23 urbanecm: Correction: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hiwikisource wikilove # T284864
  • 18:23 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=hiwikisource # T284864
  • 18:22 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@40b4b2f]: T273854 Airflow dag to extract and process sparql queries
  • 18:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: dd0fecb: Rename Portal and Portal talk namespaces on viwiki (T284868) (duration: 00m 56s)
  • 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5d8b9df: Disable Education Program namespaces in enwiki (T285193) (duration: 00m 58s)
  • 18:14 urbanecm@deploy1002: Synchronized wmf-config/abusefilter.php: 5a51dd2: Add `managechangetags` to the `abusefilter` group on eswiki (T285167) (duration: 00m 56s)
  • 18:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 219dd5b: eswiki AbuseFilter config changes (T284797; 2/2) (duration: 00m 56s)
  • 18:10 urbanecm@deploy1002: Synchronized wmf-config/abusefilter.php: 219dd5b: eswiki AbuseFilter config changes (T284797; 1/2) (duration: 01m 07s)
  • 17:40 ebernhardson: post-deploy restart airflow-webserver and airflow-scheduler on an-airflow1001
  • 17:32 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2337592]: airflow: expect eventgate canary events in all dcs (duration: 04m 24s)
  • 17:27 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2337592]: airflow: expect eventgate canary events in all dcs
  • 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:32 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:46 papaul: poweroff elastic2043 for maintenance
  • 15:25 hashar: Updated operations-puppet-tests-buster-docker Jenkins job to use latest Docker image https://gerrit.wikimedia.org/r/c/integration/config/+/700648
  • 15:06 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1009.eqiad.wmnet
  • 15:02 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1004.eqiad.wmnet
  • 15:01 hnowlan@cumin1001: START - Cookbook sre.hosts.reboot-single for host maps1009.eqiad.wmnet
  • 14:57 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1004.eqiad.wmnet
  • 14:57 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1003.eqiad.wmnet
  • 14:52 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1003.eqiad.wmnet
  • 14:52 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 14:47 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
  • 14:44 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 14:40 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 14:39 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet
  • 14:37 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1002.eqiad.wmnet
  • 14:37 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet
  • 14:34 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl1001.eqiad.wmnet
  • 14:30 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1003.eqiad.wmnet
  • 14:28 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1003.eqiad.wmnet
  • 14:24 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1002.eqiad.wmnet
  • 14:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1123.eqiad.wmnet with reason: REIMAGE
  • 14:22 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1002.eqiad.wmnet
  • 14:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1123.eqiad.wmnet with reason: REIMAGE
  • 14:21 volans: deployed spicerack release v0.0.54 on the cumin hosts
  • 14:19 XioNoX: reboot scs-c1-codfw - T285229
  • 14:18 klausman@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd1001.eqiad.wmnet
  • 14:17 XioNoX: reboot scs-a1-codfw - T285229
  • 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1008.eqiad.wmnet
  • 14:16 klausman@cumin1001: START - Cookbook sre.hosts.reboot-single for host ml-etcd1001.eqiad.wmnet
  • 14:14 klausman: starting update of ML team's etcd machines in eqiad
  • 14:14 volans: uploaded spicerack_0.0.54 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 14:11 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2003.codfw.wmnet
  • 14:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1008.eqiad.wmnet
  • 14:06 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 14:05 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2003.codfw.wmnet
  • 14:04 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2004.codfw.wmnet
  • 13:58 XioNoX: reboot scs-eqsin - T285229
  • 13:58 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2004.codfw.wmnet
  • 13:57 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2001.codfw.wmnet
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1006.eqiad.wmnet
  • 13:56 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 13:55 jynus: stopping replication at db1171:s3 at db1123-bin.004363:906878073
  • 13:51 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2001.codfw.wmnet
  • 13:51 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve2002.codfw.wmnet
  • 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1006.eqiad.wmnet
  • 13:48 XioNoX: reboot scs-ulsfo
  • 13:45 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve2002.codfw.wmnet
  • 13:40 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet
  • 13:38 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2001.codfw.wmnet
  • 13:35 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet
  • 13:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/MobileFrontend/includes/ExtMobileFrontend.php: Backport: Avoid loading the whole entity when it only needs description. (T269960) (duration: 00m 58s)
  • 13:28 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-serve-ctrl2002.codfw.wmnet
  • 13:24 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2003.codfw.wmnet
  • 13:21 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2003.codfw.wmnet
  • 13:21 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2002.codfw.wmnet
  • 13:19 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2002.codfw.wmnet
  • 13:17 klausman@cumin2001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-etcd2001.codfw.wmnet
  • 13:14 klausman@cumin2001: START - Cookbook sre.hosts.reboot-single for host ml-etcd2001.codfw.wmnet
  • 13:12 elukey: upload istioctl 1.9.5 to {buster,stretch}-wikimedia
  • 13:12 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 40 hosts with reason: Merged broken patch
  • 13:12 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 40 hosts with reason: Merged broken patch
  • 13:09 klausman: starting update of ML team's etcd machines in codfw
  • 12:55 godog: move librenms alerts with "max alerts" == -1 to "interval" being 15m - T285205
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16672 and previous config saved to /var/cache/conftool/dbconfig/20210621-124030-root.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16671 and previous config saved to /var/cache/conftool/dbconfig/20210621-123906-root.json
  • 12:35 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Wikibase: Backport: Rewrite SerializationModifier to be more efficient (duration: 01m 02s)
  • 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1010.eqiad.wmnet
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16670 and previous config saved to /var/cache/conftool/dbconfig/20210621-122526-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16669 and previous config saved to /var/cache/conftool/dbconfig/20210621-122403-root.json
  • 12:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1010.eqiad.wmnet
  • 12:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2008.codfw.wmnet
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16668 and previous config saved to /var/cache/conftool/dbconfig/20210621-121023-root.json
  • 12:10 godog: bump space for k8s and ops prometheus on prometheus1004 (prometheus1003 has been expanded previously but not logged)
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16667 and previous config saved to /var/cache/conftool/dbconfig/20210621-120859-root.json
  • 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2008.codfw.wmnet
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16665 and previous config saved to /var/cache/conftool/dbconfig/20210621-115519-root.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 T283499', diff saved to https://phabricator.wikimedia.org/P16664 and previous config saved to /var/cache/conftool/dbconfig/20210621-115441-marostegui.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after schema change', diff saved to https://phabricator.wikimedia.org/P16663 and previous config saved to /var/cache/conftool/dbconfig/20210621-115355-root.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 T283499', diff saved to https://phabricator.wikimedia.org/P16662 and previous config saved to /var/cache/conftool/dbconfig/20210621-115143-marostegui.json
  • 11:23 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0bf35e0: Disable indexing user (sub)pages and draft-related pages on hrwiki (T284384) (duration: 00m 56s)
  • 11:21 urbanecm@deploy1002: Synchronized logos/config.yaml: 1b97376: Change vi.wikisource logo to the same logo being used at en.wikisource (T284612) (duration: 00m 56s)
  • 11:20 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 1b97376: Change vi.wikisource logo to the same logo being used at en.wikisource (T284612) (duration: 00m 57s)
  • 11:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 11:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 464cc0b: ptwikinews: Remove NS ID 102,103 (T285163) (duration: 00m 56s)
  • 11:11 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Add WMCS public addresses to $wgSoftBlockRanges (duration: 00m 56s)
  • 11:04 jbond@deploy1002: Finished deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 (duration: 02m 53s)
  • 11:01 jbond@deploy1002: Started deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4
  • 10:55 moritzm: restarting FPM on mw canaries to pick up nettle security updates
  • 10:45 volans@deploy1002: Finished deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv (duration: 00m 54s)
  • 10:45 moritzm: installing nettle security updates on buster
  • 10:44 volans@deploy1002: Started deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv
  • 10:44 volans@deploy1002: Finished deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv (duration: 00m 54s)
  • 10:43 volans@deploy1002: Started deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv
  • 10:41 volans@deploy1002: Finished deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv (duration: 00m 50s)
  • 10:41 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:40 volans@deploy1002: Started deploy [netbox/deploy@977f7b6]: Force re-creation of the virtualenv
  • 10:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:37 jbond@deploy1002: Finished deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 (duration: 02m 22s)
  • 10:36 jbond@deploy1002: Started deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4
  • 10:36 jbond@deploy1002: Finished deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 to netbox-next (duration: 00m 56s)
  • 10:29 jbond@deploy1002: Started deploy [netbox/deploy@977f7b6]: deploy v2.10.4-wmf4 to netbox-next
  • 10:27 jbond@deploy1002: Finished deploy [netbox/deploy@6b69f2c]: deploy v2.10.4-wmf4 to netbox-next (duration: 03m 12s)
  • 10:24 jbond@deploy1002: Started deploy [netbox/deploy@6b69f2c]: deploy v2.10.4-wmf4 to netbox-next
  • 10:22 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 02m 22s)
  • 10:20 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
  • 10:19 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 02m 13s)
  • 10:17 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
  • 10:16 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 01m 03s)
  • 10:15 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
  • 10:15 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next (duration: 01m 30s)
  • 10:13 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 to netbox-next
  • 10:13 jbond@deploy1002: Finished deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4 (duration: 03m 10s)
  • 10:10 jbond@deploy1002: Started deploy [netbox/deploy@c7762f8]: deploy v2.10.4-wmf4
  • 09:55 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/FlaggedRevs: Backport: Drop LocalFile::getHistory hook handler (T284777 T277883) (duration: 00m 58s)
  • 09:52 ladsgroup@deploy1002: Synchronized wmf-config/Wikibase.php: Config: Enable wikisource group as langlink group of sourcewiki (T275958) (duration: 00m 56s)
  • 09:44 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set $wmgWikibaseTmpSerializeEmptyListsAsObjects to true everywhere (T241422) (duration: 00m 57s)
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16659 and previous config saved to /var/cache/conftool/dbconfig/20210621-094049-root.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1130 with weight 0 T284529', diff saved to https://phabricator.wikimedia.org/P16658 and previous config saved to /var/cache/conftool/dbconfig/20210621-092623-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16657 and previous config saved to /var/cache/conftool/dbconfig/20210621-092545-root.json
  • 09:19 ladsgroup@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 04m 49s)
  • 09:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16656 and previous config saved to /var/cache/conftool/dbconfig/20210621-091041-root.json
  • 09:02 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:56 marostegui: Deploy T266486 T268392 T273360 on db1123
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16655 and previous config saved to /var/cache/conftool/dbconfig/20210621-085538-root.json
  • 08:31 dcausse: depooling wdqs1005 (lag)
  • 07:47 moritzm: updated buster d-i image for Buster 10.10 point release (which included ABI bump for Linux kernel)
  • 07:44 jayme: started debian-weekly-rebuild.service on deneb (it failed due to 404 on snapshots.debian.org yesterday)
  • 06:49 moritzm: installing libwebp security updates on buster
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16654 and previous config saved to /var/cache/conftool/dbconfig/20210621-062156-root.json
  • 06:20 marostegui: Re-add rev_page_id to db1135 T163532 T285149
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 T163532', diff saved to https://phabricator.wikimedia.org/P16653 and previous config saved to /var/cache/conftool/dbconfig/20210621-062014-marostegui.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16652 and previous config saved to /var/cache/conftool/dbconfig/20210621-060652-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16651 and previous config saved to /var/cache/conftool/dbconfig/20210621-055149-root.json
  • 05:50 kart_: cxserver: Added support for Elia MT + Updated to 2021-06-10-074331-production (T276059, T275803, T276246, T283513, T255231, T237028)
  • 05:41 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311 after re-adding rev_page_id index', diff saved to https://phabricator.wikimedia.org/P16650 and previous config saved to /var/cache/conftool/dbconfig/20210621-053645-root.json
  • 05:33 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:31 kormat: stopping replication on db1123 T283131
  • 05:25 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 05:11 kormat@cumin1001: dbctl commit (dc=all): 'Depool db1123 until it's reimaged to buster T284648', diff saved to https://phabricator.wikimedia.org/P16649 and previous config saved to /var/cache/conftool/dbconfig/20210621-051149-kormat.json
  • 05:05 kormat@cumin1001: dbctl commit (dc=all): 'Promote db1157 to s3 master and set section read-write T284648', diff saved to https://phabricator.wikimedia.org/P16648 and previous config saved to /var/cache/conftool/dbconfig/20210621-050506-kormat.json
  • 05:03 kormat@cumin1001: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T284648', diff saved to https://phabricator.wikimedia.org/P16647 and previous config saved to /var/cache/conftool/dbconfig/20210621-050304-kormat.json
  • 05:02 kormat: Starting s3 eqiad failover from db1123 to db1157 - T284648
  • 04:49 kormat@cumin1001: dbctl commit (dc=all): 'Set db1157 with weight 0 T284648', diff saved to https://phabricator.wikimedia.org/P16646 and previous config saved to /var/cache/conftool/dbconfig/20210621-044955-kormat.json
  • 04:48 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 21 hosts with reason: Master switchover s3 T284648
  • 04:48 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 21 hosts with reason: Master switchover s3 T284648
  • 04:40 marostegui: Re-add rev_page_id to db1099:3311 T163532 T285149
  • 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 T163532', diff saved to https://phabricator.wikimedia.org/P16645 and previous config saved to /var/cache/conftool/dbconfig/20210621-043941-marostegui.json

2021-06-18

  • 20:55 Krinkle: Remove doc1001:/srv/doc/mediawiki-core/wmf-1.36.0-wmf.31-testing
  • 13:29 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16640 and previous config saved to /var/cache/conftool/dbconfig/20210618-125306-root.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16639 and previous config saved to /var/cache/conftool/dbconfig/20210618-123802-root.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16638 and previous config saved to /var/cache/conftool/dbconfig/20210618-122526-root.json
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16637 and previous config saved to /var/cache/conftool/dbconfig/20210618-122259-root.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16636 and previous config saved to /var/cache/conftool/dbconfig/20210618-121022-root.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16635 and previous config saved to /var/cache/conftool/dbconfig/20210618-120755-root.json
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16634 and previous config saved to /var/cache/conftool/dbconfig/20210618-115518-root.json
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16633 and previous config saved to /var/cache/conftool/dbconfig/20210618-114015-root.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16631 and previous config saved to /var/cache/conftool/dbconfig/20210618-112739-marostegui.json
  • 09:44 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:21 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:49 XioNoX: eqsin-codfw link re-enabled but drained
  • 08:39 legoktm: finished adding shellbox LVS entry, https://shellbox.svc.eqiad.wmnet:4008/ and https://shellbox.svc.codfw.wmnet:4008/ now work (T281423)
  • 08:30 XioNoX: cr1-codfw# set interfaces xe-5/1/2 disable
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16630 and previous config saved to /var/cache/conftool/dbconfig/20210618-081737-root.json
  • 08:06 legoktm@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=shellbox
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16629 and previous config saved to /var/cache/conftool/dbconfig/20210618-080233-root.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16628 and previous config saved to /var/cache/conftool/dbconfig/20210618-074729-root.json
  • 07:44 legoktm: restarting pybal on lvs1015, lvs2009 (active) - T281423
  • 07:35 legoktm: restarting pybal on lvs1016, lvs2010 to add shellbox
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16627 and previous config saved to /var/cache/conftool/dbconfig/20210618-073225-root.json
  • 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2010.codfw.wmnet
  • 07:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2010.codfw.wmnet
  • 06:58 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1002.wikimedia.org
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16626 and previous config saved to /var/cache/conftool/dbconfig/20210618-063632-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P16625 and previous config saved to /var/cache/conftool/dbconfig/20210618-062452-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16624 and previous config saved to /var/cache/conftool/dbconfig/20210618-062129-root.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16623 and previous config saved to /var/cache/conftool/dbconfig/20210618-060625-root.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16622 and previous config saved to /var/cache/conftool/dbconfig/20210618-060452-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16621 and previous config saved to /var/cache/conftool/dbconfig/20210618-055122-root.json
  • 05:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16620 and previous config saved to /var/cache/conftool/dbconfig/20210618-054949-root.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165', diff saved to https://phabricator.wikimedia.org/P16619 and previous config saved to /var/cache/conftool/dbconfig/20210618-054841-marostegui.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16618 and previous config saved to /var/cache/conftool/dbconfig/20210618-054659-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16617 and previous config saved to /var/cache/conftool/dbconfig/20210618-053445-root.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16616 and previous config saved to /var/cache/conftool/dbconfig/20210618-053156-root.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16615 and previous config saved to /var/cache/conftool/dbconfig/20210618-051942-root.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131', diff saved to https://phabricator.wikimedia.org/P16614 and previous config saved to /var/cache/conftool/dbconfig/20210618-051712-marostegui.json
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16613 and previous config saved to /var/cache/conftool/dbconfig/20210618-051652-root.json
  • 05:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16612 and previous config saved to /var/cache/conftool/dbconfig/20210618-050148-root.json
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16611 and previous config saved to /var/cache/conftool/dbconfig/20210618-045808-marostegui.json
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16610 and previous config saved to /var/cache/conftool/dbconfig/20210618-045743-marostegui.json
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16609 and previous config saved to /var/cache/conftool/dbconfig/20210618-045355-marostegui.json

2021-06-17

  • 21:49 legoktm: regenerating pipermail redirects to skip those with duplicate message-ids (T280731)
  • 18:24 ryankemper: T285106 [WDQS] `ryankemper@wdqs2001:~$ sudo depool`
  • 18:01 dancy: Deployed latest scap code to beta cluster
  • 13:28 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Wikibase/client/includes/ClientHooks.php: Backport: client: Bring back using the client setting for langlink group (T284854) (duration: 00m 58s)
  • 13:28 jbond: add prometheus-jmx-exporter to bullseye-wikimedia
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16604 and previous config saved to /var/cache/conftool/dbconfig/20210617-121146-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16603 and previous config saved to /var/cache/conftool/dbconfig/20210617-120109-root.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16602 and previous config saved to /var/cache/conftool/dbconfig/20210617-115643-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16601 and previous config saved to /var/cache/conftool/dbconfig/20210617-115319-root.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16600 and previous config saved to /var/cache/conftool/dbconfig/20210617-114605-root.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16599 and previous config saved to /var/cache/conftool/dbconfig/20210617-114139-root.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16598 and previous config saved to /var/cache/conftool/dbconfig/20210617-113816-root.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16597 and previous config saved to /var/cache/conftool/dbconfig/20210617-113101-root.json
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16596 and previous config saved to /var/cache/conftool/dbconfig/20210617-112635-root.json
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P16595 and previous config saved to /var/cache/conftool/dbconfig/20210617-112431-marostegui.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16594 and previous config saved to /var/cache/conftool/dbconfig/20210617-112312-root.json
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16593 and previous config saved to /var/cache/conftool/dbconfig/20210617-111558-root.json
  • 11:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16592 and previous config saved to /var/cache/conftool/dbconfig/20210617-111026-marostegui.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16591 and previous config saved to /var/cache/conftool/dbconfig/20210617-110808-root.json
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16590 and previous config saved to /var/cache/conftool/dbconfig/20210617-110656-root.json
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16589 and previous config saved to /var/cache/conftool/dbconfig/20210617-110200-marostegui.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16588 and previous config saved to /var/cache/conftool/dbconfig/20210617-105153-root.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16587 and previous config saved to /var/cache/conftool/dbconfig/20210617-103649-root.json
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130 after schema change', diff saved to https://phabricator.wikimedia.org/P16586 and previous config saved to /var/cache/conftool/dbconfig/20210617-102145-root.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130', diff saved to https://phabricator.wikimedia.org/P16585 and previous config saved to /var/cache/conftool/dbconfig/20210617-101827-marostegui.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16584 and previous config saved to /var/cache/conftool/dbconfig/20210617-100445-root.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16583 and previous config saved to /var/cache/conftool/dbconfig/20210617-094942-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16582 and previous config saved to /var/cache/conftool/dbconfig/20210617-093438-root.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16581 and previous config saved to /var/cache/conftool/dbconfig/20210617-092056-root.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161 after schema change', diff saved to https://phabricator.wikimedia.org/P16580 and previous config saved to /var/cache/conftool/dbconfig/20210617-091934-root.json
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161', diff saved to https://phabricator.wikimedia.org/P16579 and previous config saved to /var/cache/conftool/dbconfig/20210617-090947-marostegui.json
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16578 and previous config saved to /var/cache/conftool/dbconfig/20210617-090552-root.json
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16577 and previous config saved to /var/cache/conftool/dbconfig/20210617-085048-root.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16576 and previous config saved to /var/cache/conftool/dbconfig/20210617-084941-root.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110 after schema change', diff saved to https://phabricator.wikimedia.org/P16575 and previous config saved to /var/cache/conftool/dbconfig/20210617-083545-root.json
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16574 and previous config saved to /var/cache/conftool/dbconfig/20210617-083438-root.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P16573 and previous config saved to /var/cache/conftool/dbconfig/20210617-083005-marostegui.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P16572 and previous config saved to /var/cache/conftool/dbconfig/20210617-082939-marostegui.json
  • 08:28 elukey: upload istioctl 1.6.14-1 to buster-wikimedia
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16571 and previous config saved to /var/cache/conftool/dbconfig/20210617-082437-root.json
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315', diff saved to https://phabricator.wikimedia.org/P16570 and previous config saved to /var/cache/conftool/dbconfig/20210617-082409-marostegui.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16569 and previous config saved to /var/cache/conftool/dbconfig/20210617-081934-root.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16568 and previous config saved to /var/cache/conftool/dbconfig/20210617-080933-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16567 and previous config saved to /var/cache/conftool/dbconfig/20210617-080430-root.json
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16566 and previous config saved to /var/cache/conftool/dbconfig/20210617-075825-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16565 and previous config saved to /var/cache/conftool/dbconfig/20210617-075429-root.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16564 and previous config saved to /var/cache/conftool/dbconfig/20210617-073926-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168', diff saved to https://phabricator.wikimedia.org/P16563 and previous config saved to /var/cache/conftool/dbconfig/20210617-073305-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16562 and previous config saved to /var/cache/conftool/dbconfig/20210617-073229-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16561 and previous config saved to /var/cache/conftool/dbconfig/20210617-071726-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16560 and previous config saved to /var/cache/conftool/dbconfig/20210617-070222-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16559 and previous config saved to /var/cache/conftool/dbconfig/20210617-064717-root.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16558 and previous config saved to /var/cache/conftool/dbconfig/20210617-063135-marostegui.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16557 and previous config saved to /var/cache/conftool/dbconfig/20210617-062514-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16556 and previous config saved to /var/cache/conftool/dbconfig/20210617-061010-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16555 and previous config saved to /var/cache/conftool/dbconfig/20210617-055507-root.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16554 and previous config saved to /var/cache/conftool/dbconfig/20210617-054003-root.json
  • 05:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165', diff saved to https://phabricator.wikimedia.org/P16553 and previous config saved to /var/cache/conftool/dbconfig/20210617-053455-marostegui.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16552 and previous config saved to /var/cache/conftool/dbconfig/20210617-053105-root.json
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16551 and previous config saved to /var/cache/conftool/dbconfig/20210617-051601-root.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16550 and previous config saved to /var/cache/conftool/dbconfig/20210617-050057-root.json
  • 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16549 and previous config saved to /var/cache/conftool/dbconfig/20210617-044554-root.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P16548 and previous config saved to /var/cache/conftool/dbconfig/20210617-044146-marostegui.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16547 and previous config saved to /var/cache/conftool/dbconfig/20210617-044132-marostegui.json
  • 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16546 and previous config saved to /var/cache/conftool/dbconfig/20210617-043130-marostegui.json

2021-06-16

  • 21:35 legoktm@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 21:32 legoktm@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 17:41 dancy: Reverted Scap release on beta
  • 16:18 topranks: Resetting metric on Telia CCT IC-331929, cr1-codfw and cr3-eqsin.
  • 15:22 dancy: testing upcoming Scap release on beta
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16545 and previous config saved to /var/cache/conftool/dbconfig/20210616-125329-root.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16544 and previous config saved to /var/cache/conftool/dbconfig/20210616-123826-root.json
  • 12:34 kormat: deploying heartbeat service puppet change
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16543 and previous config saved to /var/cache/conftool/dbconfig/20210616-122322-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16541 and previous config saved to /var/cache/conftool/dbconfig/20210616-120818-root.json
  • 12:01 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
  • 12:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131', diff saved to https://phabricator.wikimedia.org/P16540 and previous config saved to /var/cache/conftool/dbconfig/20210616-120015-marostegui.json
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16539 and previous config saved to /var/cache/conftool/dbconfig/20210616-112115-root.json
  • 11:20 hnowlan: running `nodetool cleanup` on maps1005
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16538 and previous config saved to /var/cache/conftool/dbconfig/20210616-110612-root.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16537 and previous config saved to /var/cache/conftool/dbconfig/20210616-105108-root.json
  • 10:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1007.eqiad.wmnet with reason: REIMAGE
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16536 and previous config saved to /var/cache/conftool/dbconfig/20210616-103604-root.json
  • 10:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1007.eqiad.wmnet with reason: REIMAGE
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16535 and previous config saved to /var/cache/conftool/dbconfig/20210616-102349-marostegui.json
  • 09:52 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1007.eqiad.wmnet
  • 09:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
  • 09:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1007.eqiad.wmnet with reason: Reparenting from maps1009
  • 09:50 hnowlan: disabling puppet on maps1* to reparent maps1007 from new master maps1009
  • 09:47 kormat: truncating all pc* tables on pc1010 T282761
  • 09:40 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1009 as pc3 primary T282761 (duration: 00m 59s)
  • 09:04 kormat: Deploying wmfmariadbpy 0.7.1 T284819
  • 09:04 kormat: uploaded wmfmariadbpy 0.7.1 to apt.wm.o
  • 08:24 Amir1: running "update flaggedrevs set fr_quality = 0 where fr_quality != 0;" on all wikis where flagged revs is enabled (T279761)
  • 07:27 dcausse: cleanup old /var/log/airflow/scheduler logs to reclaim space on an-airflow1001
  • 06:55 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:52 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 05:06 marostegui: Upgrade clouddb1014

2021-06-15

  • 17:54 dancy: testing upcoming Scap release on beta
  • 17:21 mutante: new Wikimedia language "shi" added - Shilha /ˈʃɪlhə/ is a Berber language native to Shilha people. The endonym is TaclḼit /taʃlʜijt/, and in recent English publications the language is often rendered Tashelhiyt or Tashelhit.
  • 17:17 mutante: new Wikimedia language "dag" added - Dagbani (or Dagbane), also known as Dagbanli and Dagbanle, is a Gur language spoken in Ghana.
  • 17:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-master1002.eqiad.wmnet with reason: REIMAGE
  • 17:09 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-master1002.eqiad.wmnet with reason: REIMAGE
  • 16:11 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on an-master1002.eqiad.wmnet with reason: Update operating system to bullseye
  • 16:11 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on an-master1002.eqiad.wmnet with reason: Update operating system to bullseye
  • 14:55 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:51 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:25 XioNoX: re-enable cr1-codfw:xe-5/1/2
  • 13:23 marostegui: Upgrade clouddb1018
  • 13:15 effie: enable puppet on canaries
  • 13:10 effie: disable puppet on canaries to deploy 699908
  • 10:45 XioNoX: re-enable cr1-codfw:xe-5/1/2
  • 09:42 XioNoX: cr1-codfw# set interfaces xe-5/1/2 disable
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2080', diff saved to https://phabricator.wikimedia.org/P16533 and previous config saved to /var/cache/conftool/dbconfig/20210615-092511-marostegui.json
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318, db2082', diff saved to https://phabricator.wikimedia.org/P16532 and previous config saved to /var/cache/conftool/dbconfig/20210615-092409-marostegui.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318', diff saved to https://phabricator.wikimedia.org/P16531 and previous config saved to /var/cache/conftool/dbconfig/20210615-090802-marostegui.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2083', diff saved to https://phabricator.wikimedia.org/P16530 and previous config saved to /var/cache/conftool/dbconfig/20210615-090650-marostegui.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2084', diff saved to https://phabricator.wikimedia.org/P16529 and previous config saved to /var/cache/conftool/dbconfig/20210615-090243-marostegui.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2081', diff saved to https://phabricator.wikimedia.org/P16528 and previous config saved to /var/cache/conftool/dbconfig/20210615-090206-marostegui.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2082', diff saved to https://phabricator.wikimedia.org/P16527 and previous config saved to /var/cache/conftool/dbconfig/20210615-085953-marostegui.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091', diff saved to https://phabricator.wikimedia.org/P16526 and previous config saved to /var/cache/conftool/dbconfig/20210615-085938-marostegui.json
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2080 db2083 db2084 db2091', diff saved to https://phabricator.wikimedia.org/P16525 and previous config saved to /var/cache/conftool/dbconfig/20210615-083233-marostegui.json
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2081', diff saved to https://phabricator.wikimedia.org/P16524 and previous config saved to /var/cache/conftool/dbconfig/20210615-082857-marostegui.json
  • 06:10 XioNoX: roll OSPF link-protection to all routers - T167306
  • 02:30 eileen: civicrm revision changed from d9d61dad0b to acbcce94a2, config revision is 2aed6ff89b
  • 01:22 eileen: civicrm revision changed from 28ace1b86f to d9d61dad0b, config revision is 2aed6ff89b
  • 00:37 eileen: civicrm revision changed from 31d07115a0 to 28ace1b86f, config revision is 2aed6ff89b

2021-06-14

  • 21:40 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@baeee47]: T261407 bulk_daemon: Deploy prioritized topics (duration: 00m 49s)
  • 21:40 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@baeee47]: T261407 bulk_daemon: Deploy prioritized topics
  • 19:27 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1003.eqiad.wmnet
  • 19:21 twentyafterfour_: applying hotfix for T284397 and restarting php7.3-fpm on phab1001
  • 18:30 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1003.eqiad.wmnet
  • 17:05 jforrester@deploy1002: Finished deploy [integration/docroot@22061b6]: Actually add mediawiki/tools/api-testing JSDoc to doc.wikimedia for T236915 (duration: 00m 07s)
  • 17:05 jforrester@deploy1002: Started deploy [integration/docroot@22061b6]: Actually add mediawiki/tools/api-testing JSDoc to doc.wikimedia for T236915
  • 16:46 jforrester@deploy1002: Finished deploy [integration/docroot@ca7af97]: Add mediawiki/tools/api-testing JSDoc to doc.wikimedia for T236915 (duration: 00m 07s)
  • 16:46 jforrester@deploy1002: Started deploy [integration/docroot@ca7af97]: Add mediawiki/tools/api-testing JSDoc to doc.wikimedia for T236915
  • 15:56 razzi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1002.eqiad.wmnet
  • 15:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16521 and previous config saved to /var/cache/conftool/dbconfig/20210614-155258-root.json
  • 15:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16520 and previous config saved to /var/cache/conftool/dbconfig/20210614-153754-root.json
  • 15:24 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 15:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16519 and previous config saved to /var/cache/conftool/dbconfig/20210614-152250-root.json
  • 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1005.eqiad.wmnet
  • 15:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16518 and previous config saved to /var/cache/conftool/dbconfig/20210614-150747-root.json
  • 15:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1005.eqiad.wmnet
  • 15:04 razzi@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1002.eqiad.wmnet
  • 15:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1004.eqiad.wmnet
  • 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1004.eqiad.wmnet
  • 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 10%: Repool db1142 after upgrade', diff saved to https://phabricator.wikimedia.org/P16517 and previous config saved to /var/cache/conftool/dbconfig/20210614-145243-root.json
  • 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1003.eqiad.wmnet
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16516 and previous config saved to /var/cache/conftool/dbconfig/20210614-145039-root.json
  • 14:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1003.eqiad.wmnet
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16515 and previous config saved to /var/cache/conftool/dbconfig/20210614-144130-marostegui.json
  • 14:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1002.eqiad.wmnet
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16514 and previous config saved to /var/cache/conftool/dbconfig/20210614-143536-root.json
  • 14:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1002.eqiad.wmnet
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16513 and previous config saved to /var/cache/conftool/dbconfig/20210614-143224-root.json
  • 14:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16512 and previous config saved to /var/cache/conftool/dbconfig/20210614-143211-root.json
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1001.eqiad.wmnet
  • 14:27 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate CentralNotice{BannerHistory,Impression} to EventGate on all wikis - T271168 (duration: 00m 57s)
  • 14:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1001.eqiad.wmnet
  • 14:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2007.codfw.wmnet
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16511 and previous config saved to /var/cache/conftool/dbconfig/20210614-142032-root.json
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16510 and previous config saved to /var/cache/conftool/dbconfig/20210614-142014-root.json
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16509 and previous config saved to /var/cache/conftool/dbconfig/20210614-141720-root.json
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16508 and previous config saved to /var/cache/conftool/dbconfig/20210614-141707-root.json
  • 14:17 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate CentralNotice{BannerHistory,Impression} to EventGate on testwiki - T271168 (duration: 00m 57s)
  • 14:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2007.codfw.wmnet
  • 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2006.codfw.wmnet
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16507 and previous config saved to /var/cache/conftool/dbconfig/20210614-140529-root.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16506 and previous config saved to /var/cache/conftool/dbconfig/20210614-140511-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16505 and previous config saved to /var/cache/conftool/dbconfig/20210614-140217-root.json
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16504 and previous config saved to /var/cache/conftool/dbconfig/20210614-140203-root.json
  • 14:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2006.codfw.wmnet
  • 13:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16503 and previous config saved to /var/cache/conftool/dbconfig/20210614-135456-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 10%: Repool db1147 after upgrade', diff saved to https://phabricator.wikimedia.org/P16502 and previous config saved to /var/cache/conftool/dbconfig/20210614-135025-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16501 and previous config saved to /var/cache/conftool/dbconfig/20210614-135007-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16500 and previous config saved to /var/cache/conftool/dbconfig/20210614-134713-root.json
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16499 and previous config saved to /var/cache/conftool/dbconfig/20210614-134700-root.json
  • 13:43 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16498 and previous config saved to /var/cache/conftool/dbconfig/20210614-133953-root.json
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16497 and previous config saved to /var/cache/conftool/dbconfig/20210614-133801-marostegui.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16496 and previous config saved to /var/cache/conftool/dbconfig/20210614-133503-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16495 and previous config saved to /var/cache/conftool/dbconfig/20210614-133442-root.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 10%: Repool db1170:3317 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16494 and previous config saved to /var/cache/conftool/dbconfig/20210614-133210-root.json
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 10%: Repool db1170:3312 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16493 and previous config saved to /var/cache/conftool/dbconfig/20210614-133156-root.json
  • 13:29 effie: restart memcached on codfw
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 50%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16492 and previous config saved to /var/cache/conftool/dbconfig/20210614-132449-root.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312 db1170:3317 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16491 and previous config saved to /var/cache/conftool/dbconfig/20210614-132235-marostegui.json
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Repool es1032 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16490 and previous config saved to /var/cache/conftool/dbconfig/20210614-132000-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16489 and previous config saved to /var/cache/conftool/dbconfig/20210614-131938-root.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16488 and previous config saved to /var/cache/conftool/dbconfig/20210614-130946-root.json
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1032 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16487 and previous config saved to /var/cache/conftool/dbconfig/20210614-130723-marostegui.json
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16486 and previous config saved to /var/cache/conftool/dbconfig/20210614-130547-root.json
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16485 and previous config saved to /var/cache/conftool/dbconfig/20210614-130435-root.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: Repool es1033 after kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16484 and previous config saved to /var/cache/conftool/dbconfig/20210614-125442-root.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16483 and previous config saved to /var/cache/conftool/dbconfig/20210614-125043-root.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16482 and previous config saved to /var/cache/conftool/dbconfig/20210614-124931-root.json
  • 12:37 XioNoX: configure OSPF link-protection on cr3/4-ulsfo - T167306
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16481 and previous config saved to /var/cache/conftool/dbconfig/20210614-123539-root.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1033 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16480 and previous config saved to /var/cache/conftool/dbconfig/20210614-123512-marostegui.json
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: Repool es1034 after upgrade', diff saved to https://phabricator.wikimedia.org/P16479 and previous config saved to /var/cache/conftool/dbconfig/20210614-123427-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Restore es1028 original weight', diff saved to https://phabricator.wikimedia.org/P16478 and previous config saved to /var/cache/conftool/dbconfig/20210614-122322-marostegui.json
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Give some weight to es1028 while es1034 gets upgraded', diff saved to https://phabricator.wikimedia.org/P16477 and previous config saved to /var/cache/conftool/dbconfig/20210614-122242-marostegui.json
  • 12:22 dcausse: re-pooling wdqs1012
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1034 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16476 and previous config saved to /var/cache/conftool/dbconfig/20210614-122212-marostegui.json
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repool db1174 after schema change', diff saved to https://phabricator.wikimedia.org/P16475 and previous config saved to /var/cache/conftool/dbconfig/20210614-122036-root.json
  • 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2005.codfw.wmnet
  • 12:17 XioNoX: configure OSPF link-protection on cr3-ulsfo:xe-0/1/1 - T167306
  • 12:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2005.codfw.wmnet
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1148', diff saved to https://phabricator.wikimedia.org/P16474 and previous config saved to /var/cache/conftool/dbconfig/20210614-121101-marostegui.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16473 and previous config saved to /var/cache/conftool/dbconfig/20210614-121031-marostegui.json
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2004.codfw.wmnet
  • 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2004.codfw.wmnet
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16472 and previous config saved to /var/cache/conftool/dbconfig/20210614-120112-marostegui.json
  • 11:28 effie: restart memcached on mc2019
  • 11:09 effie: restart memcached on codfw memcached gutter pool (mc-gp2* hosts)
  • 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2003.codfw.wmnet
  • 10:52 topranks: T283163: Adding "metric-out minimum-igp" to all internal/Confed BGP groups on CR routers.
  • 10:46 effie: enable puppet on mc*
  • 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2003.codfw.wmnet
  • 10:39 effie: disable puppet on mc* hosts
  • 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2001.codfw.wmnet
  • 10:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2001.codfw.wmnet
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16471 and previous config saved to /var/cache/conftool/dbconfig/20210614-101839-root.json
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16469 and previous config saved to /var/cache/conftool/dbconfig/20210614-100336-root.json
  • 09:56 jbond@deploy1002: Finished deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 (duration: 02m 37s)
  • 09:54 jbond@deploy1002: Started deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16467 and previous config saved to /var/cache/conftool/dbconfig/20210614-094832-root.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: Repool db1131 after schema change', diff saved to https://phabricator.wikimedia.org/P16466 and previous config saved to /var/cache/conftool/dbconfig/20210614-093329-root.json
  • 09:22 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1131 for schema change', diff saved to https://phabricator.wikimedia.org/P16465 and previous config saved to /var/cache/conftool/dbconfig/20210614-092234-marostegui.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16464 and previous config saved to /var/cache/conftool/dbconfig/20210614-092125-root.json
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16463 and previous config saved to /var/cache/conftool/dbconfig/20210614-090622-root.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16462 and previous config saved to /var/cache/conftool/dbconfig/20210614-085118-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165 after schema change', diff saved to https://phabricator.wikimedia.org/P16461 and previous config saved to /var/cache/conftool/dbconfig/20210614-083614-root.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 for schema change', diff saved to https://phabricator.wikimedia.org/P16460 and previous config saved to /var/cache/conftool/dbconfig/20210614-081239-marostegui.json
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16459 and previous config saved to /var/cache/conftool/dbconfig/20210614-081031-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2148', diff saved to https://phabricator.wikimedia.org/P16458 and previous config saved to /var/cache/conftool/dbconfig/20210614-080552-marostegui.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16456 and previous config saved to /var/cache/conftool/dbconfig/20210614-075528-root.json
  • 07:51 marostegui: Depool clouddb1013 to upgrade mysql
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16455 and previous config saved to /var/cache/conftool/dbconfig/20210614-074024-root.json
  • 07:30 marostegui: Reboot db2148 T284852
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2148 T284852', diff saved to https://phabricator.wikimedia.org/P16454 and previous config saved to /var/cache/conftool/dbconfig/20210614-072930-marostegui.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168 after schema change', diff saved to https://phabricator.wikimedia.org/P16453 and previous config saved to /var/cache/conftool/dbconfig/20210614-072520-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P16452 and previous config saved to /var/cache/conftool/dbconfig/20210614-071839-marostegui.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16451 and previous config saved to /var/cache/conftool/dbconfig/20210614-071742-root.json
  • 07:15 dcausse: restart blazegraph and depool wdqs1012
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16450 and previous config saved to /var/cache/conftool/dbconfig/20210614-070238-root.json
  • 07:01 moritzm: restarting mw canaries to pick up libwebp security updates
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16449 and previous config saved to /var/cache/conftool/dbconfig/20210614-064734-root.json
  • 06:39 moritzm: installing libwep security updates on buster
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180 after schema change', diff saved to https://phabricator.wikimedia.org/P16448 and previous config saved to /var/cache/conftool/dbconfig/20210614-063231-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 for schema change', diff saved to https://phabricator.wikimedia.org/P16447 and previous config saved to /var/cache/conftool/dbconfig/20210614-062554-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 100%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16446 and previous config saved to /var/cache/conftool/dbconfig/20210614-061226-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16445 and previous config saved to /var/cache/conftool/dbconfig/20210614-060119-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 75%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16444 and previous config saved to /var/cache/conftool/dbconfig/20210614-055723-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16443 and previous config saved to /var/cache/conftool/dbconfig/20210614-054615-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 50%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16442 and previous config saved to /var/cache/conftool/dbconfig/20210614-054219-root.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16441 and previous config saved to /var/cache/conftool/dbconfig/20210614-053112-root.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 25%: Repool db1113:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16440 and previous config saved to /var/cache/conftool/dbconfig/20210614-052715-root.json
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P16439 and previous config saved to /var/cache/conftool/dbconfig/20210614-051930-marostegui.json
  • 05:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P16438 and previous config saved to /var/cache/conftool/dbconfig/20210614-051608-root.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P16437 and previous config saved to /var/cache/conftool/dbconfig/20210614-051522-marostegui.json

2021-06-12

  • 13:49 rzl@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: alert noise, no impact, x2 is unused
  • 13:49 rzl@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: alert noise, no impact, x2 is unused

2021-06-11

  • 23:37 mutante: removing firewall hole for mgmt networks to install* because it turned out it cant be used for firmware upgrades
  • 22:08 brennen: gitlab.wikimedia.org currently up with recommended config applied; test data deleted; users can register but not create projects. brennen, dancy, and thcipriani currently marked as admins. may need to reset data again, but hopefully not.
  • 21:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2014.codfw.wmnet with reason: REIMAGE
  • 21:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2014.codfw.wmnet with reason: REIMAGE
  • 21:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2013.codfw.wmnet with reason: REIMAGE
  • 20:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2013.codfw.wmnet with reason: REIMAGE
  • 20:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2012.codfw.wmnet with reason: REIMAGE
  • 20:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2012.codfw.wmnet with reason: REIMAGE
  • 19:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on pc2011.codfw.wmnet with reason: REIMAGE
  • 19:25 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2011.codfw.wmnet with reason: REIMAGE
  • 16:40 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1004
  • 16:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1004
  • 15:01 reedy@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/MediaSearch/extension.json: Make MediaSearch default search experience for all users (duration: 00m 57s)
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16432 and previous config saved to /var/cache/conftool/dbconfig/20210611-150018-root.json
  • 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16431 and previous config saved to /var/cache/conftool/dbconfig/20210611-144514-root.json
  • 14:44 mbsantos@deploy1002: Finished deploy [tilerator/deploy@6bfdab5]: (no justification provided) (duration: 00m 05s)
  • 14:44 mbsantos@deploy1002: Started deploy [tilerator/deploy@6bfdab5]: (no justification provided)
  • 14:43 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@5d7c993]: (no justification provided) (duration: 00m 05s)
  • 14:42 mbsantos@deploy1002: Started deploy [kartotherian/deploy@5d7c993]: (no justification provided)
  • 14:36 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
  • 14:36 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
  • 14:35 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:35 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:34 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:34 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
  • 14:33 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:33 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:32 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:31 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16430 and previous config saved to /var/cache/conftool/dbconfig/20210611-143010-root.json
  • 14:22 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:22 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:20 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:20 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1008.eqiad.wmnet
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143 after upgrade', diff saved to https://phabricator.wikimedia.org/P16429 and previous config saved to /var/cache/conftool/dbconfig/20210611-141506-root.json
  • 13:53 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1008.eqiad.wmnet
  • 13:53 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
  • 13:53 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1008.eqiad.wmnet with reason: Reparenting from maps1009
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16428 and previous config saved to /var/cache/conftool/dbconfig/20210611-135248-marostegui.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1153', diff saved to https://phabricator.wikimedia.org/P16427 and previous config saved to /var/cache/conftool/dbconfig/20210611-135036-marostegui.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1153 for kernel upgrade', diff saved to https://phabricator.wikimedia.org/P16426 and previous config saved to /var/cache/conftool/dbconfig/20210611-133527-marostegui.json
  • 10:46 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 07:29 moritzm: restarting archiva to pick up OpenJDK security updates
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
  • 07:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
  • 06:56 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:56 elukey: rm -rf empty dir /etc/apache2/sites-enabled/.links2 on webperf1001 to avoid puppet changes at every run
  • 05:47 elukey: run systemctl reset-failed ifup@en5.service on doh1001 - T273026
  • 01:10 eileen: process-control config revision is 2aed6ff89b

2021-06-10

  • 23:29 derick@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Citoid/modules/ve/ve.ui.CitoidInspector.js: Backport: CitoidInspector: rename getParameterNames to getOrderedParameterNames (T284786) (duration: 00m 57s)
  • 21:40 urbanecm: End of urbanecm@mwmaint1002:~$ foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php discussiontools # T282699
  • 21:36 urbanecm: Start of urbanecm@mwmaint1002:~$ foreachwiki extensions/WikimediaMaintenance/createExtensionTables.php discussiontools # T282699
  • 21:33 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=testwiki discussiontools # T282699
  • 20:13 mutante: installed tftp client on install1003 for debugging
  • 20:00 jhuneidi@deploy1002: Pruned MediaWiki: 1.37.0-wmf.5 (duration: 03m 33s)
  • 19:31 ryankemper: T265547 Cleanup following merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/698025: `sudo -E cumin -b 5 'P:analytics::cluster::elasticsearch' 'sudo rm -rfv /etc/mjolnir /srv/deployment/search/mjolnir'`
  • 19:09 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.9 refs T281150
  • 18:49 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/WikimediaMaintenance/dumpInterwiki.php: b21904e: Remove sep11 interwiki link from dumpinterwiki.php (duration: 01m 08s)
  • 18:45 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 23s)
  • 18:39 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 03s)
  • 18:38 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/UniversalLanguageSelector/resources/js/ext.uls.launch.js: 8aeab13: Fire language change hook (T280770) (duration: 01m 07s)
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: d26968c: wgWelcomeSurveyExperimentalGroups: Use new syntax in CS.php (T284597; T284735) (duration: 01m 08s)
  • 17:11 moritzm: updating bullseye installer image to latest daily image (kernel ABI changed again) T275873
  • 17:09 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:06 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 17:03 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 16:53 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 16:51 moritzm: installing rails security updates
  • 16:37 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: no-op for Beta I2a42c222003 (duration: 01m 07s)
  • 16:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:29 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:24 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 15:09 papaul: power down ms-be2038 for BBU replacement
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16417 and previous config saved to /var/cache/conftool/dbconfig/20210610-123201-root.json
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16416 and previous config saved to /var/cache/conftool/dbconfig/20210610-121657-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 60%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16415 and previous config saved to /var/cache/conftool/dbconfig/20210610-120153-root.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16414 and previous config saved to /var/cache/conftool/dbconfig/20210610-114650-root.json
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 40%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16413 and previous config saved to /var/cache/conftool/dbconfig/20210610-113146-root.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 30%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16412 and previous config saved to /var/cache/conftool/dbconfig/20210610-111643-root.json
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 20%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16411 and previous config saved to /var/cache/conftool/dbconfig/20210610-110139-root.json
  • 11:00 jbond@deploy1002: Finished deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 to netbox-next (duration: 00m 53s)
  • 10:59 jbond@deploy1002: Started deploy [netbox/deploy@e9f2382]: deploy v2.10.4-wmf4 to netbox-next
  • 10:47 topranks: T283163: Adding "metric-out minimum-igp" to BGP group Confed_eqord on eqiad, codfw and eqdfw CRs.
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16410 and previous config saved to /var/cache/conftool/dbconfig/20210610-104635-root.json
  • 10:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/WikiEditor/modules/jquery.wikiEditor.js: 8a17c43: Fix call to renamed var (T284716) (duration: 01m 25s)
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 5%: Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16409 and previous config saved to /var/cache/conftool/dbconfig/20210610-103132-root.json
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16408 and previous config saved to /var/cache/conftool/dbconfig/20210610-103032-marostegui.json
  • 10:29 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:28 kormat: running optimize tables against pc1009 (pc3) T282761
  • 10:25 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:21 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16407 and previous config saved to /var/cache/conftool/dbconfig/20210610-101858-root.json
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16406 and previous config saved to /var/cache/conftool/dbconfig/20210610-100355-root.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 60%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16405 and previous config saved to /var/cache/conftool/dbconfig/20210610-094851-root.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16404 and previous config saved to /var/cache/conftool/dbconfig/20210610-093346-root.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316', diff saved to https://phabricator.wikimedia.org/P16402 and previous config saved to /var/cache/conftool/dbconfig/20210610-093003-marostegui.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16401 and previous config saved to /var/cache/conftool/dbconfig/20210610-092246-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 40%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16399 and previous config saved to /var/cache/conftool/dbconfig/20210610-091842-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 30%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16398 and previous config saved to /var/cache/conftool/dbconfig/20210610-090345-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 30%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16397 and previous config saved to /var/cache/conftool/dbconfig/20210610-090339-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 20%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16396 and previous config saved to /var/cache/conftool/dbconfig/20210610-084841-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 20%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16395 and previous config saved to /var/cache/conftool/dbconfig/20210610-084835-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 10%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16394 and previous config saved to /var/cache/conftool/dbconfig/20210610-083338-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 10%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16393 and previous config saved to /var/cache/conftool/dbconfig/20210610-083332-root.json
  • 08:25 volans: uploaded spicerack_0.0.53 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 5%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16392 and previous config saved to /var/cache/conftool/dbconfig/20210610-081834-root.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 5%: Repool db1098:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P16391 and previous config saved to /var/cache/conftool/dbconfig/20210610-081828-root.json
  • 08:17 marostegui: Drop several grants from labswiki (wikitech) T282074
  • 07:57 jynus: reset-failed on cumin1001 after backup rerun
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P16389 and previous config saved to /var/cache/conftool/dbconfig/20210610-075702-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16388 and previous config saved to /var/cache/conftool/dbconfig/20210610-075247-marostegui.json
  • 07:44 jynus: retrying s6 snapshots on eqiad, acking demon failure
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 100%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16387 and previous config saved to /var/cache/conftool/dbconfig/20210610-073727-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 75%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16386 and previous config saved to /var/cache/conftool/dbconfig/20210610-072224-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 50%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16385 and previous config saved to /var/cache/conftool/dbconfig/20210610-070720-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3316 (re)pooling @ 25%: Repool db1098:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16384 and previous config saved to /var/cache/conftool/dbconfig/20210610-065217-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16383 and previous config saved to /var/cache/conftool/dbconfig/20210610-064916-root.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316', diff saved to https://phabricator.wikimedia.org/P16382 and previous config saved to /var/cache/conftool/dbconfig/20210610-063745-marostegui.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16381 and previous config saved to /var/cache/conftool/dbconfig/20210610-063412-root.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16380 and previous config saved to /var/cache/conftool/dbconfig/20210610-061909-root.json
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16379 and previous config saved to /var/cache/conftool/dbconfig/20210610-061806-root.json
  • 06:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16378 and previous config saved to /var/cache/conftool/dbconfig/20210610-060405-root.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16377 and previous config saved to /var/cache/conftool/dbconfig/20210610-060302-root.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16376 and previous config saved to /var/cache/conftool/dbconfig/20210610-055327-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16375 and previous config saved to /var/cache/conftool/dbconfig/20210610-055037-root.json
  • 05:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16374 and previous config saved to /var/cache/conftool/dbconfig/20210610-054802-root.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16373 and previous config saved to /var/cache/conftool/dbconfig/20210610-054759-root.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16372 and previous config saved to /var/cache/conftool/dbconfig/20210610-053534-root.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P16371 and previous config saved to /var/cache/conftool/dbconfig/20210610-053259-root.json
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315 after schema change', diff saved to https://phabricator.wikimedia.org/P16370 and previous config saved to /var/cache/conftool/dbconfig/20210610-053255-root.json
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16369 and previous config saved to /var/cache/conftool/dbconfig/20210610-052421-marostegui.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16368 and previous config saved to /var/cache/conftool/dbconfig/20210610-052030-root.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16367 and previous config saved to /var/cache/conftool/dbconfig/20210610-052017-marostegui.json
  • 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130 after upgrade', diff saved to https://phabricator.wikimedia.org/P16366 and previous config saved to /var/cache/conftool/dbconfig/20210610-050526-root.json

2021-06-09

  • 22:12 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh1002.wikimedia.org
  • 22:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1002.wikimedia.org
  • 21:59 dzahn@cumin1001: END (ERROR) - Cookbook sre.ganeti.makevm (exit_code=97) for new host doh1002.wikimedia.org
  • 21:53 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1002.wikimedia.org
  • 21:51 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh1001.wikimedia.org
  • 21:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh1001.wikimedia.org
  • 21:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/DiscussionTools/modules/dt-ve/CommentTargetWidget.less: Backport: Update surface styles for VE changes (T284567) (duration: 01m 14s)
  • 21:40 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/includes/language/LanguageConverter.php: Backport: Revert "Add type hint to constructor of LanguageConverter" (T284685) (duration: 01m 24s)
  • 21:08 mutante: rsyncing static-bugzilla HTML from miscweb1002 to deploy1002
  • 21:00 mutante: deploy1002 - creating temp dir /srv/miscweb to rsync static-bugzilla data to, coming from miscweb1002 T281538
  • 20:36 mutante: deployed temp ferm change on deployment servers to let miscweb dump data, puppetized. scap pull from mwdebug1001 works, deployment good to go
  • 19:08 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.9 refs T281150 (duration: 01m 07s)
  • 19:06 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.9 refs T281150
  • 18:07 Krinkle: krinkle@mwmaint1002$ mwscript deleteEqualMessages.php (foreachwiki)
  • 17:52 Krinkle: krinkle@mwmaint1002$ mwscript deleteEqualMessages.php --wiki rmywiki
  • 17:32 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudmetrics1002.eqiad.wmnet
  • 17:32 aborrero@cumin1001: START - Cookbook sre.hosts.remove-downtime for cloudmetrics1002.eqiad.wmnet
  • 17:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
  • 17:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
  • 17:16 jayme: updated python3-docker-report to 0.0.12 on chartmuseum2001.codfw.wmnet,chartmuseum1001.eqiad.wmnet,deneb.codfw.wmnet,registry[2003-2008].codfw.wmnet,registry[1003-1004].eqiad.wmnet
  • 16:35 jayme: import docker-report 0.0.12 into buster-wikimedia
  • 15:37 hnowlan: rebuilding maps2009 as buster master
  • 15:08 vgutierrez: restarting acme-chief on acmechief1001
  • 15:02 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
  • 15:02 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps2009.codfw.wmnet with reason: Rebuilding as buster master
  • 15:01 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 55s)
  • 15:00 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
  • 14:57 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 04s)
  • 14:57 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
  • 14:51 volans@deploy1002: Finished deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o (duration: 00m 15s)
  • 14:50 volans@deploy1002: Started deploy [netbox/deploy@91fd299]: Release v2.10.4-wmf3 to netbox-next.w.o
  • 14:45 moritzm: installing postgresql 9.6 security updates on stretch
  • 14:37 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate WMDEBanner* schemas to EventPlatform on all wikis - T282562 (duration: 01m 06s)
  • 14:33 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate LandingPageImpression schema to EventPlatform on all wikis - T282855 (duration: 01m 06s)
  • 14:23 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate LandingPageImpression schema to EventPlatform on testwiki - T282855 (duration: 01m 07s)
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16358 and previous config saved to /var/cache/conftool/dbconfig/20210609-141807-root.json
  • 14:08 hnowlan@puppetmaster1001: conftool action : set/weight=0; selector: name=maps2009.codfw.wmnet
  • 14:08 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: name=maps2009.codfw.wmnet
  • 13:59 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate WMDEBanner* schemas to EventPlatform on testwiki - T282562 (duration: 01m 08s)
  • 13:56 XioNoX: upgrade Routinator 3000 to 0.9.0 on rpki1001 - T282469
  • 13:54 XioNoX: Add Routinator 3000 0.9.0 to the APT repo - T282469
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16356 and previous config saved to /var/cache/conftool/dbconfig/20210609-134800-root.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166 after schema change', diff saved to https://phabricator.wikimedia.org/P16355 and previous config saved to /var/cache/conftool/dbconfig/20210609-133257-root.json
  • 13:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16354 and previous config saved to /var/cache/conftool/dbconfig/20210609-132958-marostegui.json
  • 13:12 moritzm: installing nginx security updates
  • 13:10 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 02m 26s)
  • 13:07 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
  • 13:07 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 00m 10s)
  • 13:07 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
  • 13:07 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: test master with 698968 (duration: 01m 14s)
  • 13:05 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: test master with 698968
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16351 and previous config saved to /var/cache/conftool/dbconfig/20210609-130114-root.json
  • 12:50 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2038.codfw.wmnet
  • 12:47 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: roll back to HEAD~1 (duration: 00m 53s)
  • 12:46 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: roll back to HEAD~1
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16350 and previous config saved to /var/cache/conftool/dbconfig/20210609-124610-root.json
  • 12:43 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 28s)
  • 12:42 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 12:42 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 01m 08s)
  • 12:41 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 12:41 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 47s)
  • 12:40 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 12:39 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 41s)
  • 12:39 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16349 and previous config saved to /var/cache/conftool/dbconfig/20210609-123615-root.json
  • 12:35 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2038.codfw.wmnet
  • 12:33 godog: lists1001:rm /var/lib/prometheus/node.d/mailman_queues.prom
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16348 and previous config saved to /var/cache/conftool/dbconfig/20210609-123106-root.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16347 and previous config saved to /var/cache/conftool/dbconfig/20210609-122111-root.json
  • 12:18 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 03m 38s)
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143 after schema change', diff saved to https://phabricator.wikimedia.org/P16345 and previous config saved to /var/cache/conftool/dbconfig/20210609-121603-root.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P16344 and previous config saved to /var/cache/conftool/dbconfig/20210609-121501-marostegui.json
  • 12:14 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 12:13 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 53s)
  • 12:12 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 12:10 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 44s)
  • 12:09 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 12:09 hnowlan: running `nodetool decommission` on maps2009
  • 12:06 hnowlan: stopped tilerator on maps2009
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16343 and previous config saved to /var/cache/conftool/dbconfig/20210609-120608-root.json
  • 12:05 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on maps2009.codfw.wmnet with reason: Postgis version juggling
  • 12:05 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on maps2009.codfw.wmnet with reason: Postgis version juggling
  • 12:04 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps2009.codfw.wmnet
  • 12:03 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 06s)
  • 12:03 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 12:00 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ac43baa: d185728: WelcomeSurveyExperimentalGroups: Use new syntax (T284599) (duration: 01m 19s)
  • 11:59 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 54s)
  • 11:58 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 11:54 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 41s)
  • 11:54 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 11:53 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 03m 11s)
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Repool db1141 after schema change', diff saved to https://phabricator.wikimedia.org/P16342 and previous config saved to /var/cache/conftool/dbconfig/20210609-115104-root.json
  • 11:50 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 11:49 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 02m 16s)
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P16341 and previous config saved to /var/cache/conftool/dbconfig/20210609-114944-marostegui.json
  • 11:47 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 11:47 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 05s)
  • 11:46 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 11:46 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: (no justification provided) (duration: 00m 53s)
  • 11:45 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: (no justification provided)
  • 11:40 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: redeploy HEAD~1 (duration: 01m 55s)
  • 11:38 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: redeploy HEAD~1
  • 11:36 jbond@deploy1002: Finished deploy [netbox/deploy@f94ce0f]: redeploy HEAD~1 (duration: 00m 54s)
  • 11:35 jbond@deploy1002: Started deploy [netbox/deploy@f94ce0f]: redeploy HEAD~1
  • 11:34 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: re-try (duration: 02m 23s)
  • 11:32 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: re-try
  • 11:32 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: re-try (duration: 00m 59s)
  • 11:31 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: re-try
  • 11:27 jbond: drop keep_env from sudo config - #T275852
  • 11:22 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 00m 43s)
  • 11:22 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 11:21 jbond@deploy1002: Finished deploy [netbox/deploy@98cf8df]: (no justification provided) (duration: 01m 15s)
  • 11:20 jbond@deploy1002: Started deploy [netbox/deploy@98cf8df]: (no justification provided)
  • 11:11 awight: EU deployment window complete
  • 11:10 awight@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Set wgAutoConfirmCount to 10 for enwikisource (T284627) (duration: 02m 04s)
  • 10:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1130.eqiad.wmnet with reason: REIMAGE
  • 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1130.eqiad.wmnet with reason: REIMAGE
  • 10:15 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 53s)
  • 10:14 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 10:13 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 05m 41s)
  • 10:07 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 10:06 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 38s)
  • 10:06 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 T283235', diff saved to https://phabricator.wikimedia.org/P16337 and previous config saved to /var/cache/conftool/dbconfig/20210609-100423-marostegui.json
  • 10:00 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 48s)
  • 09:59 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 09:58 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on schema* after switch towards nginx-light T164456
  • 07:54 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:16 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 06:26 XioNoX: Add 185.71.138.0/24 to network::external and diffscan - T252132
  • 06:12 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16334 and previous config saved to /var/cache/conftool/dbconfig/20210609-053213-root.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16333 and previous config saved to /var/cache/conftool/dbconfig/20210609-051710-root.json
  • 05:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16332 and previous config saved to /var/cache/conftool/dbconfig/20210609-050206-root.json
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135 after dropping an index', diff saved to https://phabricator.wikimedia.org/P16331 and previous config saved to /var/cache/conftool/dbconfig/20210609-044703-root.json
  • 04:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135 to remove rev_page_id index T163532', diff saved to https://phabricator.wikimedia.org/P16330 and previous config saved to /var/cache/conftool/dbconfig/20210609-044428-marostegui.json
  • 04:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:30 eileen: civicrm revision changed from eac772e9c9 to 31d07115a0, config revision is 931a941a5e
  • 03:01 Amir1: mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki=aawiktionary --site-group wiktionary (T284444)
  • 02:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 02:56 Amir1: clean up of the rest of mbox files (except arbcom) (T282303)
  • 02:55 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 02:49 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1010.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "xfer categories following reimage" --blazegraph_instance categories --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
  • 02:49 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 02:39 ryankemper: T280382 Re-enabled puppet on `wdqs1010`
  • 01:20 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 00:37 catrope@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Wikisource OCR on select Wikisources (T283898) (duration: 01m 31s)
  • 00:00 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1010.eqiad.wmnet --dest wdqs1009.eqiad.wmnet --reason "transferring skolemized wikidata.jnl so we can reimage wdqs1009" --blazegraph_instance blazegraph --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
  • 00:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer

2021-06-08

  • 22:36 krinkle@deploy1002: Finished deploy [integration/docroot@d4c9e08]: (no justification provided) (duration: 00m 08s)
  • 22:36 krinkle@deploy1002: Started deploy [integration/docroot@d4c9e08]: (no justification provided)
  • 22:21 ryankemper: T284479 Block put back in place. We're back to expected traffic levels. We'll need a more granular mitigation in place before we can lift this block going forward.
  • 22:15 ryankemper: T284479 Successful puppet run on `cp3052`, proceeding to rest of `A:cp-text`: `sudo cumin -b 19 'A:cp-text' 'run-puppet-agent -q'`
  • 22:14 ryankemper: T284479 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/698850, running puppet on `cp3052.esams.wmnet`
  • 22:10 ryankemper: T284479 Yup more than enough evidence of a strong upward spike now. Proceeding to revert
  • 22:10 ryankemper: T284479 Already starting to see a large upward spike in requests. Doing a quick sanity check to make sure this is out of the ordinary but I'll likely be putting the block back in place shortly
  • 22:09 ryankemper: T284479 Puppet run complete across all of `cp-text`. Monitoring https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?viewPanel=47&orgId=1&from=now-1h&to=now over the next few minutes to see if we see a large spike in `full_text` and `entity_full_text` queries
  • 22:03 ryankemper: T284479 Successful puppet run on `cp3052`, proceeding to rest of `A:cp-text`: `sudo cumin -b 15 'A:cp-text' 'run-puppet-agent -q'`
  • 22:01 ryankemper: T284479 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/698849, running puppet on `cp3052.esams.wmnet`
  • 21:59 ryankemper: T284479 Prior context: We put a block on a range of Google App Engine IPs yesterday to protect Cirrussearch from a bad actor; now we're going to try lifting the block and seeing if we're still getting slammed with traffic
  • 21:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1009.eqiad.wmnet with reason: REIMAGE
  • 21:42 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1009.eqiad.wmnet with reason: REIMAGE
  • 21:29 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1009.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_1009`
  • 21:27 ryankemper: T280382 Disabled puppet on `wdqs1010` out of abundance of caution; will re-enable after wdqs1009 is reimaged and xfer back is complete
  • 21:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:38 bblack: authdns1001: update gdnsd to 3.7.0-2~wmf1
  • 20:18 bblack: authdns2001: update gdnsd to 3.7.0-2~wmf1
  • 19:55 bblack: dns[1235]002: update gdnsd to 3.7.0-2~wmf1
  • 19:53 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.9 refs T281150
  • 19:46 bblack: dns[1235]001: update gdnsd to 3.7.0-2~wmf1
  • 19:43 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:36 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 19:36 ryankemper: T280382 Cancelling the data-transfer run to restart it; realized that the cookbook will start up the `wdqs-updater` again so will locally hack the cookbook on `cumin1001` to prevent that
  • 19:32 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.9/extensions/Echo/modules/nojs/mw.echo.alert.monobook.less: Backport: Fix MonoBook orange banner hover styles (T284496) (duration: 01m 08s)
  • 19:26 bblack: dns400[12]: update gdnsd to 3.7.0-3~wmf1
  • 19:25 bblack: apt: update gdnsd package to gdnsd-3.7.0-2~wmf1 (fix systemd reload issues)
  • 19:20 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1009.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring skolemized wikidata.jnl so we can reimage wdqs1009" --blazegraph_instance blazegraph --without-lvs` on `ryankemper@cumin1001` tmux session `wdqs_1009`
  • 19:20 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:19 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:19 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:18 ryankemper: T280382 `sudo systemctl stop wdqs-updater wdqs-blazegraph` on `wdqs1010` in preparation for transfer
  • 19:08 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo pool` (all caught up on lag)
  • 18:47 bblack: dns4001: update gdnsd to 3.7.0-1~wmf1
  • 18:43 bblack: apt: update gdnsd package to gdnsd-3.7.0-1~wmf1
  • 17:49 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:36 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:25 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:10 elukey: fix dbstore1007's ip address in analytics-in4 on cr{1,2}-eqiad
  • 17:06 jhuneidi@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.9 refs T281150 (duration: 34m 12s)
  • 16:32 jhuneidi@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.9 refs T281150
  • 16:27 papaul: powerdown moss-fe2002 for relocation
  • 16:06 papaul: powerdown ms-backup2002 for relocation
  • 16:02 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 15:40 papaul: powerdown ms-be2061 for relocation
  • 15:40 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
  • 15:33 papaul: powerdown thanos-fe2003 for relocation
  • 15:23 Krinkle: mwmaint1002: Running purge-parsercache-now.php on server 4/4 (pc1009) ref P16060, T280605, T282761.
  • 15:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc2009.codfw.wmnet,pc1009.eqiad.wmnet with reason: Purging parsercache pc3 T282761
  • 15:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc2009.codfw.wmnet,pc1009.eqiad.wmnet with reason: Purging parsercache pc3 T282761
  • 15:13 papaul: powerdown cp2034 for relocation
  • 15:04 papaul: powerdown cp2033 for relocation
  • 14:59 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
  • 14:43 moritzm: cleanup now unused nginx mods and former deps (various X11 libs and libxslt) on testreduce1001/scandium after switch towards nginx-light T164456
  • 14:08 marostegui: Restart sanitarium hosts (db2094, db2095, db1154, db1155) to pick up new filters T284106
  • 14:05 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc3 master T282761 (duration: 00m 57s)
  • 14:05 kormat: setting pc1010 as pc3 primary T282761
  • 13:51 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 42s)
  • 13:51 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 13:48 otto@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 13:41 otto@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 13:40 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 47s)
  • 13:39 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 13:36 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 01m 03s)
  • 13:35 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 13:33 otto@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - otto@cumin1001
  • 13:22 otto@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons. - otto@cumin1001
  • 12:15 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1008 as pc2 master T282761 (duration: 00m 57s)
  • 12:14 kormat: setting pc1008 back as pc2 primary T282761
  • 11:54 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ef49422: enwiki: Disable indexing on the Book namespace (T283522) (duration: 00m 56s)
  • 11:46 urbanecm: Start server-side upload for 1 file (T283470)
  • 11:45 moritzm: installing nginx security updates on buster
  • 11:43 urbanecm: Start server-side upload for 2 files (T283645, T283583)
  • 11:39 urbanecm: EU B&C deployment done
  • 11:38 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16329 and previous config saved to /var/cache/conftool/dbconfig/20210608-113857-kormat.json
  • 11:38 moritzm: installing ruby-nokogiri security updates
  • 11:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/WikimediaEvents/: b0b4653: universalLanguageSelector: Add missing properties (T280770) (duration: 00m 56s)
  • 11:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/UniversalLanguageSelector/resources/js/ext.uls.launch.js: 5df13ee: Pass context to compact_language_links.open hook (T280770) (duration: 00m 57s)
  • 11:23 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16328 and previous config saved to /var/cache/conftool/dbconfig/20210608-112354-kormat.json
  • 11:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 73dc708: lvwiki: Enable Growth features in dark mode (T278191; 3/3) (duration: 00m 58s)
  • 11:13 urbanecm@deploy1002: Synchronized wmf-config/config/lvwiki.yaml: 73dc708: lvwiki: Enable Growth features in dark mode (T278191; 2/3) (duration: 00m 56s)
  • 11:12 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 73dc708: lvwiki: Enable Growth features in dark mode (T278191; 1/3) (duration: 00m 57s)
  • 11:10 urbanecm: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=lvwiki growthexperiments # T278191
  • 11:08 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16327 and previous config saved to /var/cache/conftool/dbconfig/20210608-110850-kormat.json
  • 11:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: abd4010: enwiki: Deploy Growth freatures to 2% of new accounts (T281896) (duration: 00m 57s)
  • 11:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Rebooting pc1008
  • 11:01 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Rebooting pc1008
  • 10:53 kormat@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: reimaged to buster T283131', diff saved to https://phabricator.wikimedia.org/P16326 and previous config saved to /var/cache/conftool/dbconfig/20210608-105346-kormat.json
  • 10:50 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4) (duration: 00m 53s)
  • 10:49 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4)
  • 10:16 liw: testing upcoming Scap release on beta
  • 10:01 XioNoX: upgrade Routinator 3000 to 0.9.0 on rpki2001 - T282469
  • 09:58 jbond@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4) (duration: 00m 54s)
  • 09:57 jbond@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (4)
  • 09:52 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:04 jayme: removing docker-images from registry: releng/ci-jessie, releng/ci-src-setup, releng/composer-php56, releng/composer-test-php56, releng/npm, releng/npm-test, releng/npm-test-3d2png, releng/npm-test-graphoid, releng/npm-test-librdkafka, releng/npm-test-maps-service, releng/php56, releng/quibble-jessie, releng/quibble-jessie-hhvm, releng/quibble-jessie-php56 - T251918
  • 08:31 dcausse: depooling wdqs1006 (lag)
  • 08:29 dcausse: restarting blazegraph on wdqs1006
  • 08:19 elukey@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:13 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:13 elukey@cumin1001: START - Cookbook sre.dns.netbox
  • 07:49 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2002.codfw.wmnet
  • 07:41 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 07:40 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:37 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16324 and previous config saved to /var/cache/conftool/dbconfig/20210608-072937-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16323 and previous config saved to /var/cache/conftool/dbconfig/20210608-071433-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16322 and previous config saved to /var/cache/conftool/dbconfig/20210608-065930-root.json
  • 06:52 tgr: T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index with gerrit:696307 applied
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool after upgrade', diff saved to https://phabricator.wikimedia.org/P16321 and previous config saved to /var/cache/conftool/dbconfig/20210608-064426-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for upgrade', diff saved to https://phabricator.wikimedia.org/P16320 and previous config saved to /var/cache/conftool/dbconfig/20210608-064055-marostegui.json
  • 06:27 elukey: clean some airflow logs on an-airflow1001 as one off to free space (had a chat with the Search team first)
  • 05:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
  • 05:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
  • 05:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
  • 05:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2123.codfw.wmnet with reason: REIMAGE
  • 04:54 marostegui: Repool clouddb1019:3314
  • 04:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 02:38 ryankemper: T284445 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "repairing overinflated blazegraph journal" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs`
  • 02:37 ryankemper: T284445 after manually stopping blazegraph/wdqs-updater, `sudo rm -fv /srv/wdqs/wikidata.jnl` on `wdqs1012` (clearing old overinflated journal file away before xferring new one)
  • 02:34 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo depool` (catching up on ~7h of lag)

2021-06-07

  • 21:26 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0)
  • 21:12 sbassett: Deployed security patch for T284364
  • 19:30 ryankemper: T284479 [Cirrussearch] We'll keep monitoring. For now this incident is resolved. Glancing at our current volume relative to what we'd expect, the numbers we see match what we'd expect. If we're accidentally banning any innocent requests they must be an incredibly small percentage of the total otherwise we'd see significantly lower volume than expected
  • 19:25 ryankemper: T284479 [Cirrussearch] Seeing the expected drop in `entity_full_text` requests here: https://grafana-rw.wikimedia.org/d/000000455/elasticsearch-percentiles?viewPanel=47&orgId=1&from=now-12h&to=now As a result we're no longer rejecting any requests
  • 19:21 ryankemper: T284479 [Cirrussearch] We're working on rolling out https://gerrit.wikimedia.org/r/698607, which will ban search API requests that match the Google App Engine IP range `2600:1900::0/28` AND whose user agent includes `HeadlessChrome`
  • 19:19 cdanis: T284479 ✔️ cdanis@cumin1001.eqiad.wmnet ~ 🕞🍵 sudo cumin -b16 'A:cp-text' "run-puppet-agent"
  • 19:07 andrew@deploy1002: Finished deploy [horizon/deploy@6199b67]: disable shelve/unshelve T284462 (duration: 04m 53s)
  • 19:02 andrew@deploy1002: Started deploy [horizon/deploy@6199b67]: disable shelve/unshelve T284462
  • 19:01 andrew@deploy1002: Finished deploy [horizon/deploy@6199b67]: disable shelve/unshelve (duration: 02m 01s)
  • 18:59 andrew@deploy1002: Started deploy [horizon/deploy@6199b67]: disable shelve/unshelve
  • 18:57 herron: prometheus3001: moved /srv back to vda1 filesystem T243057
  • 18:26 urbanecm: [urbanecm@mwmaint1002 /srv/mediawiki/php-1.37.0-wmf.7]$ mwscript extensions/GrowthExperiments/maintenance/initWikiConfig.php --wiki=skwiki --phab=T284149
  • 18:24 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/includes/WelcomeSurvey.php: 368b5d9: 0e79aee: WelcomeSurvey backports (T284127, T284257; 2/2) (duration: 00m 57s)
  • 18:22 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/extension.json: 368b5d9: 0e79aee: WelcomeSurvey backports (T284127, T284257; 1/2) (duration: 00m 56s)
  • 18:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/maintenance/initWikiConfig.php: 7089728: b2482fb: initWikiConfig GE backports (T284072) (duration: 00m 58s)
  • 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 15e0910: skwiki: Make Growth features available in dark mode (T284149; 3/3) (duration: 00m 56s)
  • 18:14 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 15e0910: skwiki: Make Growth features available in dark mode (T284149; 2/3) (duration: 00m 56s)
  • 18:14 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers
  • 18:14 ottomata: rolling restart of kafka jumbo brokers - T283067
  • 18:13 urbanecm@deploy1002: Synchronized wmf-config/config/skwiki.yaml: 15e0910: skwiki: Make Growth features available in dark mode (T284149; 1/3) (duration: 00m 59s)
  • 18:12 otto@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0)
  • 18:04 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=skwiki growthexperiments # T284149
  • 18:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5de2f8b: Set WelcomeSurveyEnableWithHomepage (T281896, T284257) (duration: 00m 59s)
  • 17:53 otto@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker
  • 17:53 ottomata: rolling restart of kafka jumbo mirror makers - T283067
  • 17:17 ryankemper: [Cirrussearch] We're seeing ~10% of current requests being rejected by poolcounter, due to ~2x expected `eqiad.full_text` query volume and ~30x expected `eqiad.entity_full_text` query volume
  • 16:56 ryankemper: [WDQS] `ryankemper@wdqs1005:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph locked up)
  • 16:51 razzi: run homer '*.eqiad.wmnet' diff
  • 16:49 ottomata: restarting mysqld analytics-meta replica on db1108 to apply config change - T272973
  • 16:31 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@19313f7]: Bump glent jar to 0.2.6 (duration: 04m 29s)
  • 16:27 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@19313f7]: Bump glent jar to 0.2.6
  • 16:09 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@f236b95]: Bump glent jar to 0.2.6 (duration: 00m 35s)
  • 16:09 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@f236b95]: Bump glent jar to 0.2.6
  • 14:57 moritzm: installing remaining lz4 security updates on buster
  • 14:35 moritzm: installing isc-dhcp security updates
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113 (s5,s6) after upgrade', diff saved to https://phabricator.wikimedia.org/P16315 and previous config saved to /var/cache/conftool/dbconfig/20210607-141722-marostegui.json
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 (s5,s6) for upgrade', diff saved to https://phabricator.wikimedia.org/P16314 and previous config saved to /var/cache/conftool/dbconfig/20210607-141307-marostegui.json
  • 13:35 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (3) (duration: 00m 52s)
  • 13:34 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (3)
  • 13:34 moritzm: installing libxml2 security updates on stretch
  • 13:32 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 01m 14s)
  • 13:31 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 13:28 volans@deploy1002: Finished deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next (duration: 00m 54s)
  • 13:27 volans@deploy1002: Started deploy [netbox/deploy@c70df91]: Force deploy of gerrit/672831 to netbox-next
  • 12:41 moritzm: removing now obsolete Java 8 packages from gerrit* T268225
  • 12:36 moritzm: removing now obsolete Java 8 packages from contint* T268225
  • 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 12:25 moritzm: installing nginx security updates on buster
  • 12:22 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki --add-prefix=BROKEN --fix # T284442
  • 12:22 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=wikimaniawiki # T284442
  • 11:09 Lucas_WMDE: EU backport+config window done
  • 11:08 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add 2021 namespaces for wikimania wiki (T284235) (duration: 00m 56s)
  • 10:48 volans: reset netbox-next DB with the latest prod dump
  • 10:42 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:41 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1004.wikimedia.org
  • 10:38 godog: downgrade grafana to 7.4.2 on grafana2001 - T282863
  • 10:36 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
  • 10:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1004.wikimedia.org
  • 10:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: REIMAGE
  • 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1003.wikimedia.org
  • 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1003.wikimedia.org
  • 10:28 kormat: reimaging db1157 T283131
  • 10:24 moritzm: remove now obsolete nginx mods and dependencies on htmldumper1001 T164456
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2006.wikimedia.org
  • 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2006.wikimedia.org
  • 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica2005.wikimedia.org
  • 10:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica2005.wikimedia.org
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host theemin.codfw.wmnet
  • 10:08 kormat@cumin1001: dbctl commit (dc=all): 'db1157 depooling: reimage to buster T283131', diff saved to https://phabricator.wikimedia.org/P16311 and previous config saved to /var/cache/conftool/dbconfig/20210607-100822-kormat.json
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host theemin.codfw.wmnet
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 09:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 09:43 moritzm: upgrading bullseye hosts to latest packages in testing
  • 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 09:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 09:03 moritzm: installing imagemagick security updates on stretch
  • 06:05 marostegui: Upgrade mysql on dbstore1003 T283235
  • 05:57 marostegui: Stop dbstore1004 to clone dbstore1007 T283125
  • 05:37 marostegui: Depool clouddb1020 (s5, s8) for upgrade
  • 05:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2113.codfw.wmnet with reason: REIMAGE
  • 05:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2113.codfw.wmnet with reason: REIMAGE
  • 04:48 marostegui: Depool clouddb1019:3314 (long running alter table)

2021-06-05

  • 16:16 Amir1: deleting all private archives of mm2. All are inaccessible now (T282303)
  • 15:21 Amir1: delete mbox files of group D and E in mm2 (T282303)
  • 14:35 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 00:21 mutante: backup1001 - systemctl baclua-dir works again (restoring backup for non-existing host)
  • 00:18 mutante: backup1001 systemctl reload bacula-dir fails

2021-06-04

  • 22:08 cwhite@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh4001.wikimedia.org
  • 21:51 cwhite@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh4001.wikimedia.org
  • 20:59 bblack: repool cp1087 - T278729
  • 20:11 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1087.eqiad.wmnet with reason: REIMAGE
  • 20:09 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1087.eqiad.wmnet with reason: REIMAGE
  • 19:06 bblack: depool cp1087 - T278729
  • 18:21 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:36 razzi@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 17:33 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 17:33 razzi@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
  • 17:33 razzi@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 17:28 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
  • 17:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: REIMAGE
  • 15:25 topranks: Adding 1:1 NAT configuration for fran2001 / analytics.codfw.wikimedia.org to pfw3-codfw (backup site)
  • 14:47 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I434d9c (duration: 00m 56s)
  • 14:46 krinkle@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/DiscussionTools/extension.json: Iea41ab (duration: 00m 56s)
  • 14:44 krinkle@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/DiscussionTools/includes/: Iea41ab (duration: 00m 59s)
  • 14:41 krinkle@deploy1002: Scap failed!: 9/9 canaries failed their endpoint checks(https://en.wikipedia.org)
  • 13:39 Krinkle: mwmaint1002: Running purge_parsercache_now.php on pc1008, server 3/4, ref T282761
  • 13:33 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 12:46 marostegui: Upgrade mysql on clouddb1016 T283235
  • 12:27 marostegui: Upgrade mysql on clouddb1015 T283235
  • 11:20 jbond: upload debmonitor-client_0.3.0-1+deb10u3_all.deb to apt
  • 10:59 topranks: Running homer for Gerrit 698162: Set up BGP peering to doh5001 in eqsin, triggering DoH /24 announcement there.
  • 09:47 ema: pool cp1087 T278729
  • 09:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1003.eqiad.wmnet
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1003.eqiad.wmnet
  • 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2002.codfw.wmnet
  • 09:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2002.codfw.wmnet
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16304 and previous config saved to /var/cache/conftool/dbconfig/20210604-091742-root.json
  • 09:06 ema: reboot cp1087 T278729
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16303 and previous config saved to /var/cache/conftool/dbconfig/20210604-090239-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16302 and previous config saved to /var/cache/conftool/dbconfig/20210604-084735-root.json
  • 08:33 marostegui: Upgrade db1110 T283235
  • 08:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P16301 and previous config saved to /var/cache/conftool/dbconfig/20210604-083232-root.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P16300 and previous config saved to /var/cache/conftool/dbconfig/20210604-082956-marostegui.json
  • 08:20 godog: upgrade karma to 0.86-1
  • 07:38 jynus: stop and upgrade db1150 T283235
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16299 and previous config saved to /var/cache/conftool/dbconfig/20210604-073326-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16298 and previous config saved to /var/cache/conftool/dbconfig/20210604-073318-root.json
  • 07:29 moritzm: cleanup now unused nginx mods and former deps on install* and puppetdb* servers after switch towards nginx-light (various X11 libs and libxslt) T164456
  • 07:24 moritzm: cleanup now unused nginx mods and former deps on install* servers after switch towards nginx-light (various X11 libs and libxslt)
  • 07:19 urbanecm: Password reset for SUL User:Dominic_Mayers (T282656)
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16297 and previous config saved to /var/cache/conftool/dbconfig/20210604-071823-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16296 and previous config saved to /var/cache/conftool/dbconfig/20210604-071815-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16295 and previous config saved to /var/cache/conftool/dbconfig/20210604-070319-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16294 and previous config saved to /var/cache/conftool/dbconfig/20210604-070311-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P16293 and previous config saved to /var/cache/conftool/dbconfig/20210604-064815-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316', diff saved to https://phabricator.wikimedia.org/P16292 and previous config saved to /var/cache/conftool/dbconfig/20210604-064807-root.json
  • 06:46 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:42 marostegui: Upgrade mysql on db1096:3315 db1096:3316
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 db1096:3315', diff saved to https://phabricator.wikimedia.org/P16291 and previous config saved to /var/cache/conftool/dbconfig/20210604-064242-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16290 and previous config saved to /var/cache/conftool/dbconfig/20210604-055521-root.json
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16289 and previous config saved to /var/cache/conftool/dbconfig/20210604-054017-root.json
  • 05:26 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16288 and previous config saved to /var/cache/conftool/dbconfig/20210604-052514-root.json
  • 05:24 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2002.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 05:23 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 05:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:17 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2002.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 05:16 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P16287 and previous config saved to /var/cache/conftool/dbconfig/20210604-051010-root.json
  • 04:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE
  • 04:41 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2002.codfw.wmnet with reason: REIMAGE
  • 04:25 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2002.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 04:22 ryankemper: T280382 `wdqs2001.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 03:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:33 ryankemper: [WDQS] `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1013.eqiad.wmnet --reason "repair overinflated wikidata jnl" --blazegraph_instance blazegraph`
  • 02:32 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 02:30 ryankemper: T280382 `wdqs1005.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 02:25 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo pool` (caught up on lag)
  • 02:09 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 02:06 ebernhardson: post-deploy restart airflow-(webserver|scheduer) on an-airflow1001
  • 02:05 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift (duration: 04m 40s)
  • 02:00 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@500179f]: Stop overwriting uploads in swift
  • 01:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 00:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:08 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: T280886 (duration: 00m 57s)
  • 00:07 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2001.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 00:06 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 00:05 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 00:05 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:05 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)

2021-06-03

  • 23:41 reedy@deploy1002: Synchronized wmf-config/CommonSettings.php: T280886 (duration: 00m 56s)
  • 23:40 reedy@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T280886 (duration: 00m 57s)
  • 23:33 mutante: installing OS on fresh VM doh5001
  • 23:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE
  • 23:28 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2001.codfw.wmnet with reason: REIMAGE
  • 23:09 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Restrict changetags to sysops and bots on meta T283625 (duration: 00m 58s)
  • 22:41 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2001.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 22:39 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1008.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 22:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:36 ryankemper: T280382 Cancelled transfer to `wdqs1005`; the source host `wdqs1013` has a `wikidata.jnl` that is 80% too big; will transfer from different node -> `wdqs1005` and then fix the journal on `wdqs1013` after
  • 22:36 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 22:35 ryankemper: T280382 `wdqs2005.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 22:28 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:15 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 21:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:54 shdubsh: restart kafka on kafka-logging to take new retention config
  • 20:47 sbassett: Deployed security patch for T282932
  • 20:37 ebernhardson: restart mjolnir-kafka-bulk-daemon on search-loader[12]001
  • 20:35 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container (duration: 01m 00s)
  • 20:34 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 20:34 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:34 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@1c40c83]: bulk daemon: accept events for search_updates swift container
  • 20:34 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 20:34 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 19:58 mutante: [mwmaint1002:~] $ /usr/local/bin/systemd-timer-mail-wrapper -T root@mwmaint1002.eqiad.wmnet --only-on-error /usr/local/bin/cross-validate-accounts
  • 19:56 mutante: [mwmaint1002:~] $ sudo systemctl start daily_account_consistency_check.service
  • 19:41 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh5002.wikimedia.org
  • 19:41 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5002.wikimedia.org
  • 19:39 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs (duration: 04m 27s)
  • 19:37 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh5001.wikimedia.org
  • 19:34 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@339d402]: ship pip and wheel packages for virtualenvs
  • 19:33 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images - T251918 - icinga-wm> RECOVERY - Check systemd state on deneb is OK
  • 19:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:32 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:32 mutante: [deneb:~] $ sudo systemctl start docker-reporter-releng-images
  • 19:28 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2005.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 19:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 19:27 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1013.eqiad.wmnet --dest wdqs1005.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 19:27 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:23 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh5001.wikimedia.org
  • 19:14 mutante: install1003 - restarting nginx after we switched from nginx-full to nginx-light package, same on other install servers T164456
  • 19:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE
  • 19:03 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE
  • 19:03 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2005.codfw.wmnet with reason: REIMAGE
  • 19:01 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1005.eqiad.wmnet with reason: REIMAGE
  • 18:52 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter (duration: 00m 31s)
  • 18:51 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@f40d41a]: resolve npe in datawriter
  • 18:46 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2005.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 18:46 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1005.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 18:39 ryankemper: [WDQS] depooled `wdqs1012` (has ~15 hours of lag to catch up on)
  • 18:37 ryankemper: [WDQS] `ryankemper@wdqs1012:~$ sudo systemctl restart wdqs-blazegraph` (blazegraph on the host has been locked up for ~16 hours based off of https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1622683465757&to=1622745461547)
  • 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729
  • 18:37 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp1087.eqiad.wmnet with reason: replaced DIMM https://phabricator.wikimedia.org/T278729
  • 18:28 mutante: temp. disabling puppet on install* servers. switching nginx to light variant (T164456)
  • 18:16 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter (duration: 00m 15s)
  • 18:16 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@659a8e4]: resolve npe in datawriter
  • 17:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE
  • 17:47 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE
  • 17:47 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: REIMAGE
  • 17:45 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: REIMAGE
  • 17:37 brennen: gitlab1001: re-running install-gitlab-server.sh
  • 17:16 urandom: remove dropped Cassandra keyspace snapshots -- T258414
  • 16:55 ejegg: updated payments-wiki from 6fac77f60e to 7be0534b91
  • 16:23 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:49 topranks: Gerrit 697993: Change BGP peer IP for doh3002 on esams CRs.
  • 15:27 papaul: pdu replacement complete
  • 15:25 moritzm: upgrading gitlab to 13.11.5
  • 15:08 papaul: disconnect ps2-d8-codfw for replacement
  • 14:55 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 14:54 topranks: Gerrit 697970: Add Wikidough BGP peerings on esams CRs for doh3001 and doh3002.
  • 14:23 moritzm: installing nginx security updates on buster
  • 14:12 moritzm: installing postgresql-9.6 security updates
  • 13:55 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:25 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:17 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16285 and previous config saved to /var/cache/conftool/dbconfig/20210603-130059-root.json
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16284 and previous config saved to /var/cache/conftool/dbconfig/20210603-124556-root.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16283 and previous config saved to /var/cache/conftool/dbconfig/20210603-123243-root.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16282 and previous config saved to /var/cache/conftool/dbconfig/20210603-123052-root.json
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16281 and previous config saved to /var/cache/conftool/dbconfig/20210603-121739-root.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16280 and previous config saved to /var/cache/conftool/dbconfig/20210603-121548-root.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P16279 and previous config saved to /var/cache/conftool/dbconfig/20210603-121205-marostegui.json
  • 12:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16278 and previous config saved to /var/cache/conftool/dbconfig/20210603-121133-root.json
  • 12:06 moritzm: restarting FPM on mw canaries to pick up lz4 update
  • 12:03 moritzm: installing lz4 security updates on buster
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16277 and previous config saved to /var/cache/conftool/dbconfig/20210603-120235-root.json
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16276 and previous config saved to /var/cache/conftool/dbconfig/20210603-115628-root.json
  • 11:53 moritzm: installing curl security updates on stretch
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16275 and previous config saved to /var/cache/conftool/dbconfig/20210603-114731-root.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16274 and previous config saved to /var/cache/conftool/dbconfig/20210603-114503-root.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157', diff saved to https://phabricator.wikimedia.org/P16273 and previous config saved to /var/cache/conftool/dbconfig/20210603-114325-marostegui.json
  • 11:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16272 and previous config saved to /var/cache/conftool/dbconfig/20210603-114124-root.json
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16271 and previous config saved to /var/cache/conftool/dbconfig/20210603-113000-root.json
  • 11:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16270 and previous config saved to /var/cache/conftool/dbconfig/20210603-112620-root.json
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16269 and previous config saved to /var/cache/conftool/dbconfig/20210603-112243-marostegui.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16268 and previous config saved to /var/cache/conftool/dbconfig/20210603-111456-root.json
  • 11:10 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e840968: jawiki: extended confirmed should be 120 days since first edit, not registration (T284212) (duration: 00m 58s)
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16267 and previous config saved to /var/cache/conftool/dbconfig/20210603-110906-root.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16266 and previous config saved to /var/cache/conftool/dbconfig/20210603-105953-root.json
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16265 and previous config saved to /var/cache/conftool/dbconfig/20210603-105536-marostegui.json
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16264 and previous config saved to /var/cache/conftool/dbconfig/20210603-105402-root.json
  • 10:52 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:41 godog: test librenms/AM paging
  • 10:40 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16263 and previous config saved to /var/cache/conftool/dbconfig/20210603-103858-root.json
  • 10:28 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16262 and previous config saved to /var/cache/conftool/dbconfig/20210603-102354-root.json
  • 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Purging parsercache T282761
  • 10:21 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc2008.codfw.wmnet,pc1008.eqiad.wmnet with reason: Purging parsercache T282761
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P16261 and previous config saved to /var/cache/conftool/dbconfig/20210603-101950-marostegui.json
  • 10:13 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc2 primary T282761 (duration: 00m 58s)
  • 09:38 marostegui: Deploy schema change on s3 codfw master (with replication) - T282373 T282372 T282371
  • 09:37 moritzm: upgrading eqiad to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range) T235162
  • 08:55 moritzm: uploading gitlab-ce 13.11.5-ce to apt.wikimedia.org thirdparty/gitlab
  • 08:43 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:37 moritzm: upgrading codfw to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range) T235162
  • 08:23 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:19 oblivian@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 08:09 moritzm: upgrading esams/eqsin to debmonitor-client 0.3.0 (along with deleting/recreating system user within 100-499 range)
  • 07:52 ryankemper: [WDQS] Pooled `wdqs1008` and `wdqs2006` (all caught up on lag)
  • 07:48 moritzm: uploaded debmonitor-client 0.3.0-1+deb10u2 to apt.wikimedia.org
  • 06:24 ryankemper: [WDQS] De-pooled `wdqs1008` and `wdqs2006` (~1 hour of lag to catch up on)
  • 06:23 ryankemper: T280382 `wdqs2006.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 06:23 ryankemper: T280382 `wdqs1008.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 06:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:05 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:20 marostegui: Deploy schema change on db1121, lag will appear on s4 (commonswiki) wiki replicas - T266486 T268392 T273360
  • 05:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121', diff saved to https://phabricator.wikimedia.org/P16259 and previous config saved to /var/cache/conftool/dbconfig/20210603-051853-marostegui.json
  • 05:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16258 and previous config saved to /var/cache/conftool/dbconfig/20210603-051402-root.json
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16257 and previous config saved to /var/cache/conftool/dbconfig/20210603-045859-root.json
  • 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16256 and previous config saved to /var/cache/conftool/dbconfig/20210603-044355-root.json
  • 04:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1005.eqiad.wmnet --dest wdqs1008.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 04:36 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 04:36 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2004.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 04:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 04:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 04:34 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 04:30 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2004.codfw.wmnet --dest wdqs2006.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 04:29 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 04:29 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1005.eqiad.wmnet --dest wdqs1008.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 04:29 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 04:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16255 and previous config saved to /var/cache/conftool/dbconfig/20210603-042851-root.json
  • 02:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1008.eqiad.wmnet with reason: REIMAGE
  • 02:20 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1008.eqiad.wmnet with reason: REIMAGE
  • 02:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2006.codfw.wmnet with reason: REIMAGE
  • 02:07 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1008.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 02:07 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2006.codfw.wmnet with reason: REIMAGE
  • 02:05 ryankemper: T280382 `wdqs1003.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 02:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:51 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2006.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 01:47 ryankemper: T280382 `wdqs2003.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 01:43 ryankemper: [WDQS] Pooled `wdqs1004` (caught up on lag)
  • 01:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:40 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/Gadgets: Backport: Reduce message parse in GadgetHooks::getPreferences (second time) (T58633 T278650), Try II (duration: 00m 57s)
  • 00:36 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/user/UserOptionsManager.php: Backport: user: Accept options-messages for multiselect user options (T58633 T278650) (duration: 00m 57s)
  • 00:35 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 00:35 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 00:18 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 00:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 00:18 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)

2021-06-02

  • 23:57 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 23:57 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:56 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1004.eqiad.wmnet --dest wdqs1003.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 23:56 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 23:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:47 ryankemper: T280382 `wdqs1004.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.9T 998G 1.8T 36% /srv`
  • 23:41 ladsgroup@deploy1002: scap failed: average error rate on 4/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 23:38 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:28 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2007.codfw.wmnet --dest wdqs2003.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 23:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 23:26 ryankemper: T280382 `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid10`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 23:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:18 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes: Backport: Allow html form field option 'options-messages' to get parsed (T58633) (duration: 01m 01s)
  • 22:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
  • 22:54 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
  • 22:48 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable wgVectorConsolidateUserLinks on the beta cluster (T266536) (duration: 00m 57s)
  • 22:39 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage_2`
  • 22:34 ryankemper: T280382 Cleaned up no-longer-needed files removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/697832 => `ryankemper@cumin1001:~$ sudo -E cumin -b 2 'P{apt*}' 'sudo rm -rfv /srv/tftpboot/buster-raid0-installer/pxelinux.cfg'`
  • 22:30 ryankemper: T280382 Cleaned up no-longer-needed files removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/697832 => `ryankemper@cumin1001:~$ sudo -E cumin -b 6 'P{install*}' 'sudo rm -fv /srv/tftpboot/buster-raid0-installer/pxelinux.cfg'`
  • 22:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1003.eqiad.wmnet with reason: REIMAGE
  • 22:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1003.eqiad.wmnet with reason: REIMAGE
  • 22:19 Amir1: setting charset of all tables in wikitech to binary (T284108 T269348)
  • 22:11 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1003.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage_2`
  • 22:08 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 22:07 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 22:07 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs1004.eqiad.wmnet
  • 22:07 ryankemper@puppetmaster1001: conftool action : set/pooled=yes; selector: name=wdqs2007.codfw.wmnet
  • 22:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:59 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 21:59 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:56 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1004.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 21:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:39 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1004.eqiad.wmnet with reason: REIMAGE
  • 21:38 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3002.wikimedia.org
  • 21:37 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1004.eqiad.wmnet with reason: REIMAGE
  • 21:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 21:30 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 21:28 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3002.wikimedia.org
  • 21:21 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3001.wikimedia.org
  • 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=wdqs2007.codfw.wmnet
  • 21:17 ryankemper: `ryankemper@wdqs1013:~$ sudo depool` (catching up on 17.9h lag)
  • 21:12 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3001.wikimedia.org
  • 21:10 ryankemper: T280382 T281437 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2007.codfw.wmnet` on `ryankemper@cumin2002` tmux session `wdqs_reimage`
  • 21:10 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs1004.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 20:58 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh3001.wikimedia.org
  • 20:49 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts doh3001.wikimedia.org
  • 20:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh3002.wikimedia.org
  • 20:21 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3002.wikimedia.org
  • 20:00 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh3001.wikimedia.org
  • 19:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh3001.wikimedia.org
  • 18:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e9c981d: Revert "enwiktionary: Raise AF emergency disable treshold+count" (T283460) (duration: 00m 58s)
  • 18:11 urbanecm: Deployed security patch for T281972
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4bf76fc: Make DiscussionTools replytool available for everyone on wikitech (T283119) (duration: 00m 58s)
  • 17:33 legoktm: disabled Kadirselcuk gerrit account, +1 spam (and blocked elsewhere)
  • 16:55 legoktm: restarted apache2 on lists1001 for https://gerrit.wikimedia.org/r/697805
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:19 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:10 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cescout1001.eqiad.wmnet
  • 16:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:59 sukhe@cumin1001: START - Cookbook sre.hosts.decommission for hosts cescout1001.eqiad.wmnet
  • 13:16 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
  • 13:14 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1125.eqiad.wmnet with reason: REIMAGE
  • 12:05 jbond: enable puppet fleet wide. post changing puppetdb to use nginx-light #T164456
  • 11:54 jbond: disable puppet fleet wide. changing puppetdb to use nginx-light #T164456
  • 11:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/includes/actions/InfoAction.php: 85feaa1: InfoAction: Cast wgNamespaceProtection to array (T283751) (duration: 01m 00s)
  • 11:08 jbond: update mod_auth_cas T264605
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f12e368: Investigate MediaSearch usability on other wikis (T278984) (duration: 00m 57s)
  • 11:04 jbond: upload libapache2-mod-auth-cas_1.2-1 for buster and stretch - #T264605
  • 11:01 jbond: upload libapache2-mod-auth-cas_1.2-1+wmf11u1_amd64.deb - #T264605
  • 10:44 topranks: Commit pfw policy 1622570851 to pfw3-codfw and pfw3-eqiad to support new host fran2001 (T282056)
  • 10:21 kormat@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:17 kormat@cumin1001: START - Cookbook sre.dns.netbox
  • 10:01 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbstore1006.eqiad.wmnet
  • 09:51 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbstore1006.eqiad.wmnet
  • 09:14 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/Translate/scripts/moveTranslatablePage.php --wiki=metawiki --reason='OTRS -> VRTS renaming process; see Phab:T280392 and Phab:T280396 (request)' 'OTRS' 'VRT' 'Quiddity (WMF)' # T284118
  • 08:12 moritzm: removed eight inactive addresses from ops@ list
  • 07:44 moritzm: installing squid security updates
  • 06:54 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: REIMAGE
  • 06:51 razzi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: REIMAGE
  • 06:38 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:34 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 75%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16249 and previous config saved to /var/cache/conftool/dbconfig/20210602-050234-root.json [REPLAY FROM 2021-06-02 05:02:34]
  • 05:36 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2071', diff saved to https://phabricator.wikimedia.org/P16248 and previous config saved to /var/cache/conftool/dbconfig/20210602-045736-marostegui.json [REPLAY FROM 2021-06-02 04:57:36]
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2071', diff saved to https://phabricator.wikimedia.org/P16247 and previous config saved to /var/cache/conftool/dbconfig/20210602-045717-marostegui.json [REPLAY FROM 2021-06-02 04:57:17]
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 50%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16246 and previous config saved to /var/cache/conftool/dbconfig/20210602-044730-root.json [REPLAY FROM 2021-06-02 04:47:31]
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 25%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16245 and previous config saved to /var/cache/conftool/dbconfig/20210602-043227-root.json [REPLAY FROM 2021-06-02 04:32:27]
  • 05:32 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 05:31 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Fix pageterms API call for Special:Nearby in Wikidata (T281639) (duration: 00m 56s) [REPLAY FROM 2021-06-01 21:44:06]
  • 05:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [REPLAY FROM 2021-06-01 19:42:38]
  • 05:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox [REPLAY FROM 2021-06-01 19:29:26]
  • 05:28 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1183.eqiad.wmnet
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16251 and previous config saved to /var/cache/conftool/dbconfig/20210602-051919-marostegui.json
  • 05:18 razzi@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1183.eqiad.wmnet
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3314 (re)pooling @ 100%: Repool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16250 and previous config saved to /var/cache/conftool/dbconfig/20210602-051738-root.json
  • off: restart tcpircbot-logmsgbot on alert1001 - T284123
  • 04:56 marostegui: Test

2021-06-01

  • 21:09 andrewbogott: dropping a bunch of tables from the labswiki db as per T284108
  • 17:23 Amir1: starting deletion of mbox files on lists1001 for mailman2, first reading-web-team.mbox, then smallest lists (T282303)
  • 16:31 moritzm: updating debmonitor clients to 0.3.0 (along with cleanup of sysuser UID allocation)
  • 15:38 legoktm: stopped mailman2 service on lists1001 (T52864)
  • 15:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 15:16 ryankemper: T283223 `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic reboot" --reboot --nodes-per-run 1 --start-datetime 2021-05-20T05:16:40 --task-id T283223` on `ryankemper@cumin1001` tmux session `restart_cloudelastic`
  • 15:16 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 14:59 topranks: Restoring Lumen CCT 442550293 to normal metric / bring back into service (T274234)
  • 13:56 marostegui: Stop mysql on db2079 (codfw master) - T283743
  • 13:53 topranks: Draining Lumen CCT 442550293 to do some comparative bandwidth tests from eqiad to codfw (T274234)
  • 13:53 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3f75774: cawiki: Fix help panel links (T280673) (duration: 00m 58s)
  • 13:48 otto@deploy1002: Finished deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - T272973 (duration: 02m 58s)
  • 13:45 otto@deploy1002: Started deploy [analytics/refinery@c0a02e5] (hadoop-test): deploy to an-test-coord1001 to get airflow/dags/hello_world.py - T272973
  • 13:43 topranks: Restoring Telia CT IC-307235 to normal metric / bring back into service (T274234)
  • 13:08 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE
  • 13:06 jynus@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2098.codfw.wmnet with reason: REIMAGE
  • 12:12 dcausse: re-pooling wdsq1005 (caught-up lag)
  • 12:06 moritzm: installing djvulibre security updates
  • 11:16 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
  • 11:14 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2003.codfw.wmnet with reason: REIMAGE
  • 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e4989d2: Enable "Diff" RSS feed on meta (T283380) (duration: 00m 58s)
  • 11:04 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 10:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling
  • 10:39 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: Postgis version juggling
  • 10:38 jiji@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' .
  • 09:37 topranks: Draining Telia CT IC-307235 to do some comparative bandwidth tests from eqiad to codfw (T274234)
  • 08:04 hashar: Restarted Gerrit on gerrit1001 for Java 11 upgrade # T268225
  • 08:02 hashar: Restarted Gerrit on gerrit2001 for Java 11 upgrade # T268225
  • 07:26 dcausse: depooling wdsq1005 (lag)
  • 07:14 moritzm: installing nginx security updates
  • 05:56 legoktm: restarting mailman3 on lists1001
  • 05:37 legoktm: uploaded django-allauth_0.44.0+ds-1~bpo10+1 mailman3_3.3.3-1~bpo10+4 to apt.wm.o
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3314', diff saved to https://phabricator.wikimedia.org/P16242 and previous config saved to /var/cache/conftool/dbconfig/20210601-053137-marostegui.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 100%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16241 and previous config saved to /var/cache/conftool/dbconfig/20210601-052349-root.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 75%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16240 and previous config saved to /var/cache/conftool/dbconfig/20210601-050845-root.json
  • 04:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 50%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16239 and previous config saved to /var/cache/conftool/dbconfig/20210601-045341-root.json
  • 04:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1147 (re)pooling @ 25%: Repool db1147', diff saved to https://phabricator.wikimedia.org/P16238 and previous config saved to /var/cache/conftool/dbconfig/20210601-043837-root.json
  • 00:46 legoktm@deploy1002: Synchronized logos/config.yaml: Revert "Use eswiki 20th anniversary logos" (T280908) (duration: 01m 07s)
  • 00:43 legoktm@deploy1002: Synchronized wmf-config/logos.php: Revert "Use eswiki 20th anniversary logos" (T280908) (duration: 01m 00s)

2021-05-31

  • 07:32 legoktm: deleted all outoing list mail that is for a gmail address being unsubscribed T284003
  • 07:30 legoktm: deleted all outoing list mail that is for a yahoo/aol address being unsubscribed T284003
  • 07:23 legoktm: deleting all outgoing list mail that has a subject that starts with "You have been unsubscribed from the" T284003
  • 06:33 legoktm: manually unsubscribed ahalfaker [at] wikimedia.org from scoring-internal list, triggering mailman bounce loop T282348#7124014
  • 06:22 legoktm: sudo systemctl restart mailman3 on lists1001, bounce runner crashed

2021-05-29

  • 14:44 elukey: execute apt-get clean on an-airflow1001 to free space
  • 14:40 elukey@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=cp1087.eqiad.wmnet

2021-05-28

2021-05-27

  • 23:56 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab1004.eqiad.wmnet with reason: REIMAGE
  • 23:54 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab1004.eqiad.wmnet with reason: REIMAGE
  • 23:45 thcipriani@deploy1002: Synchronized README: Config: Revert "README: deployment training" (duration: 00m 55s)
  • 23:38 derick@deploy1002: Synchronized README: Config: README: deployment training (duration: 00m 55s)
  • 23:21 egardner@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable MediaSearch Assessment filter (T276257) (duration: 00m 57s)
  • 22:06 urbanecm: Invalidate bot password for `PKM@PKMbot` (T283839)
  • 20:37 jbond: add eugene-chernov, strofimovsky01, il to ldap nda #T279545
  • 20:37 jbond: add eugene-chernov, strofimovsky01, il to ldap nda
  • 19:53 James_F: Manually create missing SecurePoll DB tables on mnwwiktionary, taywiki, and trvwiki for T283844
  • 19:48 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 19:21 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.7
  • 19:15 tgr: US morning deploys done
  • 19:12 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Enable Add Links for 50% of new users and all old ones (T277356) (duration: 01m 04s)
  • 19:03 tgr@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments: Backport: Help panel: SwitchEditorPanel fixes (T282800) Avoid session loading when loading task types in help panel RL data (T282800) Add Link: Fix homepage PV token and newcomer task token logging (T283765) (duration: 01m 05s)
  • 18:57 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:56 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: ptwiki: Add 'flow-delete' to 'eliminator' user group (T283266) (duration: 01m 04s)
  • 18:49 tgr@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments: Backport: Help panel: SwitchEditorPanel fixes (T282800) Avoid session loading when loading task types in help panel RL data (T282800) Add Link: Fix homepage PV token and newcomer task token logging (T283765) (duration: 01m 06s)
  • 18:22 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 18:09 tgr@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable Growth's community configuration on the pilot wikis (T283809) (duration: 01m 06s)
  • 17:26 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:20 James_F: Running SecurePoll maintenance script cli/updateNotBlockedKey.php for all wikis T277079
  • 17:18 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:05 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:59 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:58 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1007.eqiad.wmnet --dest wdqs1006.eqiad.wmnet --reason "transferring fresh wikidata journal following runaway inflation of wdqs1006's wikidata.jnl" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_disk`
  • 15:58 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:56 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following runaway inflation of wdqs2004's wikidata.jnl" --blazegraph_instance blazegraph` on `ryankemper@cumin2002` tmux session `wdqs_disk`
  • 15:56 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 15:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:50 ryankemper: T280382 (fixing couple wrong host names in last log line) `wdqs2004` inexplicably has a 2.5TB `wikidata.jnl`. By comparison `wdqs1006` has a 1.6T `wikidata.jnl`, and `wdqs2001`, `wdqs2002`, and `wdqs2008`, have a 975G `wikidata.jnl`
  • 15:49 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:44 ryankemper: T280382 `wdqs2004` inexplicably has a 2.5TB `wikidata.jnl`. By comparison `wdqs1006` has a 1.6T `wikidata.jnl`, and `wdqs2004` and `wdqs2001` have a 975G `wikidata.jnl`. It's not clear why there's such a big divergence
  • 15:41 ryankemper: T280382 `wdqs2004` inexplicably has a 2.5TB `wikidata.jnl`. By comparison `wdqs1006` has a 1.6T `wikidata.jnl`
  • 15:12 XioNoX: test netconf over ssh on cr3-ulsfo
  • 15:03 effie: disable puppet mc2019
  • 14:14 moritzm: bounce keyholder-agent on cumin2001 to drop homer key (now on 2002 only)
  • 12:57 tgr: T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index with gerrit:696307 applied
  • 12:55 tgr: T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index
  • 12:50 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Repool pc1007 as pc1 master T282761 (duration: 01m 04s)
  • 12:47 tgr: EU deploys done
  • 12:40 tgr@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/: Backport: Add Link: Prevent double-opening of the post-edit dialog (T283120) Always delete from search index in AddLinkSubmissionHandler (T283606) (duration: 01m 06s)
  • 12:40 topranks: cr2-eqord: Gerrit 696383: Removing IPv4 Anycast ranges from bgp_out policy.
  • 12:39 tgr@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/: Backport: Add Link: Prevent double-opening of the post-edit dialog (T283120) Add Link: Prevent double-opening of the post-edit dialog (T283120) (duration: 01m 06s)
  • 12:25 tgr@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: Don't update backButton visibility if not set (T283511) (duration: 01m 06s)
  • 11:51 tgr@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/VisualEditor/modules/ve-mw/ui/dialogs/ve.ui.MWTransclusionDialog.js: Backport: Don't update backButton visibility if not set (T283511) (duration: 01m 06s)
  • 10:27 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2082.codfw.wmnet with reason: Rebuilding db2094:s8 from db2082 T283793
  • 10:27 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2082.codfw.wmnet with reason: Rebuilding db2094:s8 from db2082 T283793
  • 10:23 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dborch1001.wikimedia.org with reason: Rebuilding db2094:s8 from db2082 12:19:41 <kormat> i thought also i might directly move pc1010 to pc2, so that it'll have a few days of pc2 cache available when we make it pc2 primary next week
  • 10:23 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dborch1001.wikimedia.org with reason: Rebuilding db2094:s8 from db2082 12:19:41 <kormat> i thought also i might directly move pc1010 to pc2, so that it'll have a few days of pc2 cache available when we make it pc2 primary next week
  • 09:46 kormat: restarting mariadb on pc1007 to upgrade it
  • 08:35 topranks: removing stale peers (AS8674 / Netnod and AS57695 / Misaka) from cr2-esams
  • 08:30 moritzm: installing libx11 security updates
  • 07:45 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to add Wikidough prefix aggregate config on cr's in AMS
  • 07:44 legoktm: adding stephane at kiwix as owner of offline-l per email
  • 07:43 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to add Wikidough prefix aggregate config on cr's in eqsin
  • 07:42 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to add Wikidough prefix aggregate config on cr2-eqord
  • 07:20 topranks: cmooney@cumin1001 Gerrit 694305: Run homer to announce Wikidough Anycast range from cr's in ulsfo
  • 07:14 topranks: cmooney@cumin1001 Gerrit 694305: Add Wikidough Anycast range to aggregate config to cr1-eqdfw
  • 07:11 topranks: cmooney@cumin1001 Gerrit 694305: Add Wikidough Anycast range to aggregate config to cr2-codfw
  • 06:47 ryankemper@puppetmaster2001: conftool action : set/pooled=no; selector: name=wdqs1003.eqiad.wmnet
  • 06:43 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 13s)
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 100%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16227 and previous config saved to /var/cache/conftool/dbconfig/20210527-060953-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1147', diff saved to https://phabricator.wikimedia.org/P16226 and previous config saved to /var/cache/conftool/dbconfig/20210527-055507-marostegui.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 75%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16225 and previous config saved to /var/cache/conftool/dbconfig/20210527-055450-root.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 50%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16224 and previous config saved to /var/cache/conftool/dbconfig/20210527-053946-root.json
  • 05:29 ryankemper: `ryankemper@cloudelastic1003:~$ sudo run-puppet-agent --force`
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1148 (re)pooling @ 25%: Repool db1148', diff saved to https://phabricator.wikimedia.org/P16223 and previous config saved to /var/cache/conftool/dbconfig/20210527-052442-root.json

2021-05-26

  • 23:07 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: resourceloader: Avoid primary connection in SqlModuleDependencyStore (2) (duration: 01m 06s)
  • 23:03 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: resourceloader: Avoid primary connection in SqlModuleDependencyStore (2) (duration: 01m 06s)
  • 22:17 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: resourceloader: Avoid opening a connection to master when not needed (duration: 01m 06s)
  • 22:10 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/includes/resourceloader/dependencystore/SqlModuleDependencyStore.php: Backport: resourceloader: Avoid opening a connection to master when not needed (duration: 01m 07s)
  • 21:22 tgr: T283606: running mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki={ar,bn,cs,vi}wiki --verbose --search-index
  • 19:58 twentyafterfour: finished deploying wmf.7 and error levels appear unchanged. refs T281148
  • 19:57 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1018.eqiad.wmnet with reason: REIMAGE
  • 19:55 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1018.eqiad.wmnet with reason: REIMAGE
  • 19:51 twentyafterfour@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.7 (duration: 01m 07s)
  • 19:50 otto@deploy1002: Finished deploy [analytics/refinery@c02cef1] (hadoop-test): Regular analytics weekly train (duration: 05m 12s)
  • 19:50 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.7
  • 19:45 otto@deploy1002: Started deploy [analytics/refinery@c02cef1] (hadoop-test): Regular analytics weekly train
  • 19:44 twentyafterfour: train is unblocked, proceeding to deploy wmf.7 to group1 wikis refs T281148
  • 19:44 otto@deploy1002: Finished deploy [analytics/refinery@c02cef1] (thin): Regular analytics weekly train THIN (duration: 00m 07s)
  • 19:44 otto@deploy1002: Started deploy [analytics/refinery@c02cef1] (thin): Regular analytics weekly train THIN
  • 19:43 otto@deploy1002: Finished deploy [analytics/refinery@c02cef1]: Regular analytics weekly train take 3 (duration: 01m 00s)
  • 19:42 otto@deploy1002: Started deploy [analytics/refinery@c02cef1]: Regular analytics weekly train take 3
  • 19:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.SuggestedEdits.Guidance.js: 9f3410b: Add Link: Suppress the blue dot on the edit button (T283094) (duration: 01m 07s)
  • 19:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/modules/homepage/suggestededits/ext.growthExperiments.SuggestedEdits.Guidance.js: 512d72e: Add Link: Suppress the blue dot on the edit button (T283094) (duration: 01m 07s)
  • 19:25 urbanecm@deploy1002: Synchronized dblists/visualeditor-nondefault.dblist: 80abdf9: 92d2952: Enable VisualEditor by default at ptwikinews and plwikinews (T282846, T283033) (duration: 01m 09s)
  • 19:21 otto@deploy1002: Started deploy [analytics/refinery@c02cef1]: Regular analytics weekly train take 2
  • 19:17 legoktm: legoktm@deploy1002:~$ sudo -E kubectl delete pod kask-production-6d6869b697-m2qjs -n sessionstore
  • 19:16 otto@deploy1002: Finished deploy [analytics/refinery@b787999]: Regular analytics weekly train (duration: 01m 23s)
  • 19:15 otto@deploy1002: Started deploy [analytics/refinery@b787999]: Regular analytics weekly train
  • 18:12 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3f66b3b: Enable wgCiteResponsiveReferences on svwiki (T281622) (duration: 01m 06s)
  • 18:07 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 07b804b: Enable DiscussionTools on wikitech (T283119) (duration: 01m 05s)
  • 17:51 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 17:39 legoktm@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' .
  • 17:16 legoktm@deploy1002: Synchronized private/PrivateSettings.php: Set $wgShellboxSecretKey - T281423 (duration: 01m 14s)
  • 17:02 moritzm: restarting FPM on mw canaries to pick up libx11 update
  • 16:51 moritzm: installing libx11 security updates
  • 16:38 topranks: cmooney@cumin1001 Running homer to deploy Gerrit 694305 changes to cr2-codfw - Wikidough Anycast
  • 16:12 marostegui: Reboot db2107 (codfw master) T282072
  • 16:10 marostegui: Reboot db2103 (codfw master) T282072
  • 16:09 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on malmok.wikimedia.org with reason: [WIP] applying anycast update: T283503
  • 16:09 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:45:00 on malmok.wikimedia.org with reason: [WIP] applying anycast update: T283503
  • 16:01 papaul: powerdown ms-be2038 for BBU replacement
  • 15:41 effie: enable puppet on mc2019
  • 15:31 marostegui: Cold reset db2107 idrac T283727
  • 15:23 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on malmok.wikimedia.org with reason: applying anycast update: T283503
  • 15:23 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 0:45:00 on malmok.wikimedia.org with reason: applying anycast update: T283503
  • 15:22 topranks: cmooney@cumin1001 Running homer to deploy Gerrit 694305 changes to cr1-codfw - Wikidough Anycast
  • 15:18 urbanecm: otrs_wikiwiki was moved to vrt-wiki.wikimedia.org (T280400)
  • 15:12 topranks: Merging https://gerrit.wikimedia.org/r/c/operations/homer/public/+/694305/ - Add Wikidough Anycast range to network config
  • 15:11 urbanecm@deploy1002: Synchronized wmf-config/: 490435e: Move otrs-wiki.wikimedia.org to vrt-wiki.wikimedia.org (T280400) (duration: 01m 07s)
  • 15:08 urbanecm@deploy1002: Synchronized multiversion/MWMultiVersion.php: 945ee9c: Move otrs-wiki.wikimedia.org to vrt-wiki.wikimedia.org (T280400; 1/2) (duration: 01m 06s)
  • 15:02 legoktm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 03m 18s)
  • 14:59 otto@deploy1002: Finished deploy [analytics/refinery@b787999] (hadoop-test): Regular analytics weekly train TEST (duration: 05m 24s)
  • 14:53 otto@deploy1002: Started deploy [analytics/refinery@b787999] (hadoop-test): Regular analytics weekly train TEST
  • 14:50 otto@deploy1002: Finished deploy [analytics/refinery@b787999] (thin): Regular analytics weekly train THIN (duration: 00m 07s)
  • 14:49 otto@deploy1002: Started deploy [analytics/refinery@b787999] (thin): Regular analytics weekly train THIN
  • 14:49 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
  • 14:49 otto@deploy1002: Finished deploy [analytics/refinery@b787999]: Regular analytics weekly train [analytics/refinery@e536abd] (duration: 30m 22s)
  • 14:47 volans@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: REIMAGE
  • 14:31 moritzm: updated bullseye d-i image to 2021-05-26 daily image T275873
  • 14:19 otto@deploy1002: Started deploy [analytics/refinery@b787999]: Regular analytics weekly train [analytics/refinery@e536abd]
  • 14:18 otto@deploy1002: deploy aborted: Regular analytics weekly train [analytics/refinery@e536abd] (duration: 00m 06s)
  • 14:18 otto@deploy1002: Started deploy [analytics/refinery@e536abd]: Regular analytics weekly train [analytics/refinery@e536abd]
  • 14:05 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@5d7c993]: (no justification provided) (duration: 00m 14s)
  • 14:05 mbsantos@deploy1002: Started deploy [kartotherian/deploy@5d7c993]: (no justification provided)
  • 14:03 hashar@deploy1002: Finished deploy [integration/docroot@ebee5d3]: composer/npm updates (duration: 00m 09s)
  • 14:03 hashar@deploy1002: Started deploy [integration/docroot@ebee5d3]: composer/npm updates
  • 11:44 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: b3c2941: Allow running fixLinkRecommendationData --search-index in production (T283606) (duration: 01m 07s)
  • 11:39 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php: 86bba48: Allow running fixLinkRecommendationData --search-index in production (T283606) (duration: 01m 06s)
  • 11:30 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 11:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/GrowthExperiments/: GrowthExperiments backports (T283544; T282899; T282546) (duration: 01m 06s)
  • 11:26 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/: GrowthExperiments backports (T283544; T282899; T282546) (duration: 01m 19s)
  • 11:05 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Test Wikidata: Enable empty list to object serialization (T241422) (duration: 01m 19s)
  • 10:26 moritzm: installing lz4 security updates on buster
  • 10:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on labstore1007.wikimedia.org with reason: T281045
  • 10:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on labstore1007.wikimedia.org with reason: T281045
  • 09:55 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/Wikibase: Backport: Wrap list of acceptable site ids with an APCu cache in API (duration: 01m 18s)
  • 09:45 godog: rm /root/prometheus from prometheus5001 - old transition files
  • 09:42 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.7/extensions/Wikibase: Backport: Wrap list of acceptable site ids with an APCu cache in API (duration: 02m 12s)
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 100%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16222 and previous config saved to /var/cache/conftool/dbconfig/20210526-093647-root.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 75%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16221 and previous config saved to /var/cache/conftool/dbconfig/20210526-092144-root.json
  • 09:13 elukey: deploy https://gerrit.wikimedia.org/r/c/operations/homer/public/+/695192 on {cr1|cr2}-eqiad - T225005
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 50%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16220 and previous config saved to /var/cache/conftool/dbconfig/20210526-090640-root.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1106 (re)pooling @ 25%: Repool db1106', diff saved to https://phabricator.wikimedia.org/P16219 and previous config saved to /var/cache/conftool/dbconfig/20210526-085137-root.json
  • 08:12 _joe_: purging images on deneb
  • 08:11 kormat: running 'optimize table' over parsercache db on pc1007 with replication enabled T282761
  • 07:14 ryankemper: Pooled `wdqs1013` (caught up on lag), de-pooled `wdqs2003` (should not have been pooled due to reimage failure)
  • 07:13 ryankemper@puppetmaster2001: conftool action : set/pooled=no; selector: name=wdqs2003.codfw.wmnet
  • 05:46 marostegui: Stop MySQL on clouddb1021 to upgrade mysql
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16215 and previous config saved to /var/cache/conftool/dbconfig/20210526-051935-root.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1148', diff saved to https://phabricator.wikimedia.org/P16214 and previous config saved to /var/cache/conftool/dbconfig/20210526-050919-marostegui.json
  • 05:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16213 and previous config saved to /var/cache/conftool/dbconfig/20210526-050431-root.json
  • 04:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16212 and previous config saved to /var/cache/conftool/dbconfig/20210526-044928-root.json
  • 04:35 marostegui: Deploy schema change on db1106, this will generate lag on s1 (enwiki) on wiki replicas T266486 T268392 T273360
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P16211 and previous config saved to /var/cache/conftool/dbconfig/20210526-043439-marostegui.json
  • 04:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P16210 and previous config saved to /var/cache/conftool/dbconfig/20210526-043424-root.json
  • 03:29 eileen: process-control config revision is 7b646533da
  • 00:47 eileen: civicrm revision changed from 584b96452a to eac772e9c9, config revision is 2ca92c3c3c
  • 00:27 mutante: phab2001 - restarted apache2

2021-05-25

  • 23:09 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0)
  • 22:39 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 22:21 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 22:21 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 22:21 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 22:21 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 22:04 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 22:04 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 21:58 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 21:58 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 21:13 razzi@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99)
  • 21:13 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters
  • 21:13 razzi@cumin1001: END (ERROR) - Cookbook sre.hadoop.roll-restart-workers (exit_code=97)
  • 21:13 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 20:40 razzi@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0)
  • 20:28 razzi@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers
  • 20:00 twentyafterfour@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.7
  • 19:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:17 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:12 twentyafterfour@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.7 (duration: 33m 29s)
  • 19:12 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 18:38 twentyafterfour@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.7
  • 18:08 krinkle@deploy1002: Synchronized wmf-config/CommonSettings.php: I2ebe96 (duration: 00m 56s)
  • 17:34 Krinkle: mwmaint1002: Running purge-parsercache-now.php on server 2/4 (pc1007, depooled spare). Ref P16060, T280605, T282761.
  • 17:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 100%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16207 and previous config saved to /var/cache/conftool/dbconfig/20210525-173031-root.json
  • 17:22 effie: disable puppet on mc2019 (for tests)
  • 17:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 75%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16206 and previous config saved to /var/cache/conftool/dbconfig/20210525-171527-root.json
  • 17:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 50%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16205 and previous config saved to /var/cache/conftool/dbconfig/20210525-170024-root.json
  • 16:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1164 (re)pooling @ 25%: Repool db1164', diff saved to https://phabricator.wikimedia.org/P16203 and previous config saved to /var/cache/conftool/dbconfig/20210525-164520-root.json
  • 12:55 urbanecm@deploy1002: Synchronized static/images/project-logos/: 63ad5fda: Revert "Add svwiki 20th anniversary logos" (T282389) (duration: 00m 56s)
  • 12:52 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 94ede526: Revert "Use svwiki 20th anniversary logos" (T282389) (duration: 00m 56s)
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1164', diff saved to https://phabricator.wikimedia.org/P16200 and previous config saved to /var/cache/conftool/dbconfig/20210525-122127-marostegui.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'remove db1124 from dbctl', diff saved to https://phabricator.wikimedia.org/P16199 and previous config saved to /var/cache/conftool/dbconfig/20210525-120718-marostegui.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1124 will be moved to the test cluster', diff saved to https://phabricator.wikimedia.org/P16198 and previous config saved to /var/cache/conftool/dbconfig/20210525-113521-marostegui.json
  • 11:26 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 11:26 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 11:21 Lucas_WMDE: EU backport&config window done
  • 11:20 lucaswerkmeister-wmde@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Change HTTP to HTTPS for concept URIs on Commons (T258590) (duration: 00m 56s)
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16196 and previous config saved to /var/cache/conftool/dbconfig/20210525-111719-root.json
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16195 and previous config saved to /var/cache/conftool/dbconfig/20210525-110215-root.json
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16194 and previous config saved to /var/cache/conftool/dbconfig/20210525-104711-root.json
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repool db1169', diff saved to https://phabricator.wikimedia.org/P16193 and previous config saved to /var/cache/conftool/dbconfig/20210525-103208-root.json
  • 09:58 ema: cp3054: upgrade varnish to latest LTS (6.0.7-1wm1) T264398
  • 09:28 jynus: updating puppet facts on cloud from puppetmaster1001
  • 09:05 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc[2007,2010].codfw.wmnet,pc1007.eqiad.wmnet with reason: Purging parsercache T282761
  • 09:05 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc[2007,2010].codfw.wmnet,pc1007.eqiad.wmnet with reason: Purging parsercache T282761
  • 09:01 kormat: stopping replication on pc1010 T282761
  • 09:00 kormat@deploy1002: Synchronized wmf-config/db-eqiad.php: Set pc1010 as pc1 primary T282761 (duration: 00m 58s)
  • 08:57 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:52 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 08:20 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2007.codfw.wmnet with reason: REIMAGE
  • 08:18 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2006.codfw.wmnet with reason: REIMAGE
  • 08:17 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2007.codfw.wmnet with reason: REIMAGE
  • 08:16 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2005.codfw.wmnet with reason: REIMAGE
  • 08:16 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2006.codfw.wmnet with reason: REIMAGE
  • 08:14 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2005.codfw.wmnet with reason: REIMAGE
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16192 and previous config saved to /var/cache/conftool/dbconfig/20210525-080234-root.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P16191 and previous config saved to /var/cache/conftool/dbconfig/20210525-074950-marostegui.json
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16190 and previous config saved to /var/cache/conftool/dbconfig/20210525-074730-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16189 and previous config saved to /var/cache/conftool/dbconfig/20210525-073227-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Repool db1184', diff saved to https://phabricator.wikimedia.org/P16188 and previous config saved to /var/cache/conftool/dbconfig/20210525-071723-root.json
  • 06:16 kart_: Updated cxserver to 2021-05-15-034540-production (T276214)
  • 06:05 kartik@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:58 kartik@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' .
  • 05:53 kartik@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' .
  • 05:14 marostegui: Reload daily_account_consistency_check.service on mwmaint1002
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 100%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16187 and previous config saved to /var/cache/conftool/dbconfig/20210525-050921-root.json
  • 04:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 75%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16186 and previous config saved to /var/cache/conftool/dbconfig/20210525-045417-root.json
  • 04:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 50%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16185 and previous config saved to /var/cache/conftool/dbconfig/20210525-043914-root.json
  • 04:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1184', diff saved to https://phabricator.wikimedia.org/P16184 and previous config saved to /var/cache/conftool/dbconfig/20210525-043234-marostegui.json
  • 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P16183 and previous config saved to /var/cache/conftool/dbconfig/20210525-043129-marostegui.json
  • 04:25 marostegui: Stop MySQL on dbstore1004 to clone dbstore1006 T283125
  • 04:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1149 (re)pooling @ 25%: Repool db1149', diff saved to https://phabricator.wikimedia.org/P16181 and previous config saved to /var/cache/conftool/dbconfig/20210525-042410-root.json
  • 02:06 James_F: 1.37.0-wmf.7 was branched at 7ee6a2e for T281148 by the TrainBranchBot
  • 00:48 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:44 legoktm@cumin1001: START - Cookbook sre.dns.netbox
  • 00:37 bstorm: labstore1007 downtimed for maintenance T281045

2021-05-24

  • 21:43 legoktm@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:40 legoktm@cumin1001: START - Cookbook sre.dns.netbox
  • 19:32 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:23 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:20 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 19:15 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 18:33 urbanecm: Morning B&C deployment done
  • 18:31 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e9cd344: Disable Education Program namespaces in hewiki (T217137) (duration: 00m 56s)
  • 18:29 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/skins/Vector/: 1742532687b: Introduce the vector-body class (T283206) (duration: 00m 57s)
  • 17:13 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:39 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:35 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 16:17 jynus@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on backup2004.codfw.wmnet with reason: REIMAGE
  • 16:15 jynus@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2004.codfw.wmnet with reason: REIMAGE
  • 16:14 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash1022.eqiad.wmnet
  • 15:55 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash1022.eqiad.wmnet
  • 15:52 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:47 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:45 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:41 twentyafterfour: deploying phabricator hotfix (and restarting php7.3-fpm on phab1001)
  • 15:29 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:09 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash1021.eqiad.wmnet
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 100%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16176 and previous config saved to /var/cache/conftool/dbconfig/20210524-150926-root.json
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 75%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16175 and previous config saved to /var/cache/conftool/dbconfig/20210524-145422-root.json
  • 14:50 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash1021.eqiad.wmnet
  • 14:47 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash1020.eqiad.wmnet
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 50%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16174 and previous config saved to /var/cache/conftool/dbconfig/20210524-143919-root.json
  • 14:36 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts logstash1020.eqiad.wmnet
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3311 (re)pooling @ 25%: Repool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16173 and previous config saved to /var/cache/conftool/dbconfig/20210524-142415-root.json
  • 13:44 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:44 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 13:44 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 13:43 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventstreams' for release 'canary' .
  • 13:41 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:41 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:40 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:39 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' .
  • 13:39 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' .
  • 13:37 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:36 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:35 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop' for release 'production' .
  • 13:34 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 13:34 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 13:34 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 13:33 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 13:33 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'changeprop-jobqueue' for release 'production' .
  • 12:18 urbanecm: Uninstalling Flow from ruwiki: Delete all pages in NS2600 (Flow's Topic) in ruwiki via deleteBatch.php (T282132; P16170)
  • 12:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 47e040b: ruwiki: Uninstall Flow (T282132) (duration: 00m 56s)
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311', diff saved to https://phabricator.wikimedia.org/P16169 and previous config saved to /var/cache/conftool/dbconfig/20210524-113711-marostegui.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 100%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16168 and previous config saved to /var/cache/conftool/dbconfig/20210524-112011-root.json
  • 11:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1183.eqiad.wmnet with reason: Schema change
  • 11:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1183.eqiad.wmnet with reason: Schema change
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1129e01: Remove wgGEMentorshipMigrationStage (T279853) (duration: 00m 57s)
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 75%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16167 and previous config saved to /var/cache/conftool/dbconfig/20210524-110508-root.json
  • 11:03 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 829c61d: Deploy Growth features to newcomers on bgwiki, urwiki (T280824, T280067) (duration: 00m 56s)
  • 10:51 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 10:51 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on maps1009.eqiad.wmnet with reason: Planet reimport
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 50%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16166 and previous config saved to /var/cache/conftool/dbconfig/20210524-105004-root.json
  • 10:35 mbsantos@deploy1002: Finished deploy [tilerator/deploy@6bfdab5]: (no justification provided) (duration: 00m 16s)
  • 10:35 mbsantos@deploy1002: Started deploy [tilerator/deploy@6bfdab5]: (no justification provided)
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3311 (re)pooling @ 25%: Repool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16165 and previous config saved to /var/cache/conftool/dbconfig/20210524-103501-root.json
  • 10:34 mbsantos@deploy1002: Finished deploy [kartotherian/deploy@a9a577a]: (no justification provided) (duration: 00m 15s)
  • 10:34 mbsantos@deploy1002: Started deploy [kartotherian/deploy@a9a577a]: (no justification provided)
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16164 and previous config saved to /var/cache/conftool/dbconfig/20210524-075958-root.json
  • 07:49 XioNoX: bump Equinix Chicago RS max prefix
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311', diff saved to https://phabricator.wikimedia.org/P16163 and previous config saved to /var/cache/conftool/dbconfig/20210524-074659-marostegui.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16162 and previous config saved to /var/cache/conftool/dbconfig/20210524-074454-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 50%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16161 and previous config saved to /var/cache/conftool/dbconfig/20210524-072950-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Repool db1135', diff saved to https://phabricator.wikimedia.org/P16160 and previous config saved to /var/cache/conftool/dbconfig/20210524-071447-root.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1149 - schema change', diff saved to https://phabricator.wikimedia.org/P16159 and previous config saved to /var/cache/conftool/dbconfig/20210524-052747-marostegui.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 100%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16158 and previous config saved to /var/cache/conftool/dbconfig/20210524-051345-root.json
  • 05:09 legoktm: restarting mailman3 on lists1001, bounce runner crashed
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 75%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16157 and previous config saved to /var/cache/conftool/dbconfig/20210524-045841-root.json
  • 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 50%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16156 and previous config saved to /var/cache/conftool/dbconfig/20210524-044337-root.json
  • 04:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1135.eqiad.wmnet with reason: Schema change
  • 04:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on db1135.eqiad.wmnet with reason: Schema change
  • 04:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1135', diff saved to https://phabricator.wikimedia.org/P16155 and previous config saved to /var/cache/conftool/dbconfig/20210524-043654-marostegui.json
  • 04:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1142 (re)pooling @ 25%: Repool db1142', diff saved to https://phabricator.wikimedia.org/P16154 and previous config saved to /var/cache/conftool/dbconfig/20210524-042834-root.json

2021-05-23

  • 14:25 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: EMERGENCY: f752f8b: enwiktionary: Raise AF emergency disable treshold+count (T283460) (duration: 00m 57s)

2021-05-22

  • 22:13 legoktm: reset 2FA for User:Yuvipanda on wikitech
  • 21:07 ryankemper: [WDQS] Pooled `wdqs1006` (caught up on lag), de-pooled `wdqs1013` (8 hours)
  • 16:35 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript deleteEqualMessages.php cswiki --delete

2021-05-21

  • 22:32 bstorm: upload nfsd-ldap: 1.2+deb10u1 to buster-wikimedia T283385
  • 18:24 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:22 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:14 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:39 ppchelko@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:36 ppchelko@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:29 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:28 legoktm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 19s)
  • 17:21 clarakosi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:17 clarakosi@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 17:07 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:07 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:40 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:40 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:16 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:16 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:14 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:14 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:11 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:11 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:09 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:06 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:06 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:03 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:03 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:02 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:02 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:02 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:01 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:19 clarakosi@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:14 clarakosi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 15:07 clarakosi@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:57 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:57 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:56 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:56 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:42 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:20 clarakosi@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:13 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:41 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 12:59 reedy@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 11s)
  • 12:56 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 12:34 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetdb-api
  • 12:24 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mathoid' for release 'production' .
  • 12:24 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=docker-registry
  • 12:23 jayme@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=docker-registry
  • 12:23 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 12:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16150 and previous config saved to /var/cache/conftool/dbconfig/20210521-122253-root.json
  • 12:15 topranks: "Removing BGP peering sessions to LinkedIn AS14413 at AMS-IX / cr2-esams as they are no longer on the exchange."
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16149 and previous config saved to /var/cache/conftool/dbconfig/20210521-120749-root.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16148 and previous config saved to /var/cache/conftool/dbconfig/20210521-115246-root.json
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repool db1134', diff saved to https://phabricator.wikimedia.org/P16147 and previous config saved to /var/cache/conftool/dbconfig/20210521-113742-root.json
  • 10:01 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2008.codfw.wmnet
  • 09:51 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2007.codfw.wmnet
  • 09:41 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2006.codfw.wmnet
  • 09:32 jayme@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host registry2005.codfw.wmnet
  • 09:32 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2008.codfw.wmnet
  • 09:28 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2007.codfw.wmnet
  • 09:26 gehel: depooling wdqs1006 to catch up on lag
  • 09:24 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2006.codfw.wmnet
  • 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host registry2008.codfw.wmnet
  • 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host registry2007.codfw.wmnet
  • 09:21 jayme@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host registry2006.codfw.wmnet
  • 09:15 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2008.codfw.wmnet
  • 09:15 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2007.codfw.wmnet
  • 09:15 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2006.codfw.wmnet
  • 09:14 jayme@cumin1001: START - Cookbook sre.ganeti.makevm for new host registry2005.codfw.wmnet
  • 08:56 kormat: deploying cumin2002 grants to production T276589
  • 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1002.wikimedia.org
  • 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1001.wikimedia.org
  • 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2004.wikimedia.org
  • 08:41 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica2003.wikimedia.org
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16146 and previous config saved to /var/cache/conftool/dbconfig/20210521-082009-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P16145 and previous config saved to /var/cache/conftool/dbconfig/20210521-080540-marostegui.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16144 and previous config saved to /var/cache/conftool/dbconfig/20210521-080506-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16143 and previous config saved to /var/cache/conftool/dbconfig/20210521-075002-root.json
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repool db1119', diff saved to https://phabricator.wikimedia.org/P16142 and previous config saved to /var/cache/conftool/dbconfig/20210521-073459-root.json
  • 06:32 moritzm: installing libspring-java security updates on stretch
  • 05:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 100%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16141 and previous config saved to /var/cache/conftool/dbconfig/20210521-053027-root.json
  • 05:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1006.eqiad.wmnet with reason: REIMAGE
  • 05:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1006.eqiad.wmnet with reason: REIMAGE
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 75%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16140 and previous config saved to /var/cache/conftool/dbconfig/20210521-051523-root.json
  • 05:14 moritzm: installing graphviz security updates on stretch
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 50%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16139 and previous config saved to /var/cache/conftool/dbconfig/20210521-050020-root.json
  • 04:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1087.eqiad.wmnet
  • 04:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1087.eqiad.wmnet
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1142', diff saved to https://phabricator.wikimedia.org/P16138 and previous config saved to /var/cache/conftool/dbconfig/20210521-044717-marostegui.json
  • 04:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1143 (re)pooling @ 25%: Repool db1143', diff saved to https://phabricator.wikimedia.org/P16137 and previous config saved to /var/cache/conftool/dbconfig/20210521-044516-root.json
  • 04:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P16136 and previous config saved to /var/cache/conftool/dbconfig/20210521-044339-marostegui.json
  • 01:27 eileen: civicrm revision changed from 35f5afb1b4 to 584b96452a, config revision is 1f8d0a6bfa
  • 01:18 eileen: civicrm revision changed from 35f5afb1b4 to 584b96452a, config revision is 1f8d0a6bfa

2021-05-20

  • 21:45 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:41 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 20:30 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 20:30 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:06 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 20:06 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:54 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts mwlog1001.eqiad.wmnet
  • 19:43 ppchelko@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 19:41 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog1001.eqiad.wmnet
  • 19:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16134 and previous config saved to /var/cache/conftool/dbconfig/20210520-193039-root.json
  • 19:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16133 and previous config saved to /var/cache/conftool/dbconfig/20210520-191536-root.json
  • 19:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.6
  • 19:01 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16132 and previous config saved to /var/cache/conftool/dbconfig/20210520-190031-root.json
  • 18:56 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P16131 and previous config saved to /var/cache/conftool/dbconfig/20210520-184527-root.json
  • 18:33 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkOnboarding.js: 9edb3f4: Check if task is link-recommendation type before showing onboarding (T282826) (duration: 01m 04s)
  • 18:32 urbanecm@deploy1002: sync-file aborted: 9edb3f4: Check if task is link-recommendation type before showing onboarding (T282826) (duration: 00m 00s)
  • 18:31 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkOnboarding.js: 7fb129f: Check if task is link-recommendation type before showing onboarding (T282826) (duration: 01m 05s)
  • 18:24 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 18:24 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 17:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:30 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:30 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:25 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:14 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:12 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:07 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 16:27 godog: upgrade grafana to 8 beta 2 on grafana2001
  • 15:48 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:46 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:46 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:44 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:44 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:43 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:33 moritzm: installing graphviz security updates on buster
  • 15:31 ryankemper: [cloudelastic] `ryankemper@cloudelastic1003:~$ sudo systemctl restart *search*` to clear `Check systemd state` alert on `cloudelastic1003`
  • 15:30 _joe_: test
  • 15:23 moritzm: installing graphviz security updates on buster
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:21 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P16128 and previous config saved to /var/cache/conftool/dbconfig/20210520-143825-marostegui.json
  • 13:58 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.6 (duration: 01m 05s)
  • 13:57 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.6
  • 13:52 hashar@deploy1002: Synchronized php-1.37.0-wmf.6/includes/upload/UploadFromStash.php: UploadFromStash: convert default user from false to null - T283196 (duration: 01m 05s)
  • 13:50 hashar@deploy1002: Synchronized php-1.37.0-wmf.6/includes/user/ActorStore.php: ActorStore: avoid throwing in case of invalid usernames T283167 (duration: 01m 05s)
  • 13:41 volans@deploy1002: Finished deploy [debmonitor/deploy@444b931]: Release v0.3.0 (duration: 01m 20s)
  • 13:39 volans@deploy1002: Started deploy [debmonitor/deploy@444b931]: Release v0.3.0
  • 12:30 kormat: Deploying wmfmariadbpy 0.7 T283228
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16126 and previous config saved to /var/cache/conftool/dbconfig/20210520-113529-root.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16125 and previous config saved to /var/cache/conftool/dbconfig/20210520-112026-root.json
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16124 and previous config saved to /var/cache/conftool/dbconfig/20210520-110522-root.json
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16123 and previous config saved to /var/cache/conftool/dbconfig/20210520-105018-root.json
  • 10:15 marostegui: Deploy schema change on s1 codfw, lag will appear in codfw T266486 T268392 T273360
  • 10:10 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:10 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P16122 and previous config saved to /var/cache/conftool/dbconfig/20210520-093510-marostegui.json
  • 09:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16121 and previous config saved to /var/cache/conftool/dbconfig/20210520-093257-root.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16120 and previous config saved to /var/cache/conftool/dbconfig/20210520-091754-root.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16119 and previous config saved to /var/cache/conftool/dbconfig/20210520-090250-root.json
  • 08:56 godog: move icinga-wm to libera.chat
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repool db1179', diff saved to https://phabricator.wikimedia.org/P16118 and previous config saved to /var/cache/conftool/dbconfig/20210520-084746-root.json
  • 07:44 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:41 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P16117 and previous config saved to /var/cache/conftool/dbconfig/20210520-071723-marostegui.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 100%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16116 and previous config saved to /var/cache/conftool/dbconfig/20210520-071432-root.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 75%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16115 and previous config saved to /var/cache/conftool/dbconfig/20210520-065928-root.json
  • 06:50 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 06:50 ryankemper: T283223 Write queue not draining fast enough for the next node to reboot, will finish reboot tomorrow
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 50%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16114 and previous config saved to /var/cache/conftool/dbconfig/20210520-064425-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1166 (re)pooling @ 25%: Repool db1166', diff saved to https://phabricator.wikimedia.org/P16113 and previous config saved to /var/cache/conftool/dbconfig/20210520-062921-root.json
  • 06:25 ladsgroup@deploy1002: Synchronized php-1.37.0-wmf.6/includes/PageProps.php: Backport: PageProps: be prepared that PageIdentity is not proper title (T283170) (duration: 01m 06s)
  • 06:08 elukey: powercycle ms-be2035 - no ssh available, no metrics since hours ago, I/O errors registered in the main tty on serial console
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 100%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16112 and previous config saved to /var/cache/conftool/dbconfig/20210520-054402-root.json
  • 05:33 ryankemper: T283223 `sudo -i cookbook sre.elasticsearch.rolling-operation cloudelastic "cloudelastic reboot" --reboot --nodes-per-run 1 --start-datetime 2021-05-20T05:16:40 --task-id T283223` on `ryankemper@cumin1001` tmux session `restart_cloudelastic`
  • 05:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 75%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16111 and previous config saved to /var/cache/conftool/dbconfig/20210520-052859-root.json
  • 05:27 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 05:24 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic reboot - ryankemper@cumin1001 - T283223
  • 05:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1011.eqiad.wmnet
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 50%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16110 and previous config saved to /var/cache/conftool/dbconfig/20210520-051355-root.json
  • 05:13 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1011.eqiad.wmnet
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1143', diff saved to https://phabricator.wikimedia.org/P16109 and previous config saved to /var/cache/conftool/dbconfig/20210520-050025-marostegui.json
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1166', diff saved to https://phabricator.wikimedia.org/P16108 and previous config saved to /var/cache/conftool/dbconfig/20210520-045919-marostegui.json
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1141 (re)pooling @ 25%: Repool db1141', diff saved to https://phabricator.wikimedia.org/P16107 and previous config saved to /var/cache/conftool/dbconfig/20210520-045852-root.json
  • 01:01 mutante: signing puppet certs for doh2001 and doh2002.wikimedia.org (T283192)
  • 00:14 ejegg: updated fundraising CiviCRM from b3fb3c9cb0 to 35f5afb1b4
  • 00:13 ejegg: updated payments-wiki from 9f51ace546 to 6fac77f60e

2021-05-19

  • 22:44 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ sleep 3600 && mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=7200 --user=Lusccasdeutsch . # T278856 # 3 video files
  • 22:29 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh2002.wikimedia.org
  • 22:27 Urbanecm: Start server-side upload for 1 video file (T283186)
  • 22:25 razzi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:22 Urbanecm: Start server-side upload for 3 video file (T283102, T283054)
  • 22:22 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 22:21 razzi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 22:18 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 22:12 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 14s)
  • 22:11 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh2001.wikimedia.org
  • 22:09 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache (duration: 00m 11s)
  • 22:07 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
  • 22:04 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh2002.wikimedia.org
  • 22:00 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
  • 21:58 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh2002.wikimedia.org
  • 21:56 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
  • 21:56 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh2002.wikimedia.org
  • 21:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2002.wikimedia.org
  • 21:51 razzi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host doh2001.wikimedia.org
  • 21:44 razzi@cumin1001: START - Cookbook sre.dns.netbox
  • 20:08 razzi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1125.eqiad.wmnet
  • 19:40 razzi@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1125.eqiad.wmnet
  • 18:30 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:23 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 18:23 herron@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:20 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.6 T281147
  • 18:17 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 16:13 volans: uploaded debmonitor-client_0.3.0 to apt.wikimedia.org stretch-wikimedia,buster-wikimedia,bullseye-wikimedia
  • 15:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 100%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16103 and previous config saved to /var/cache/conftool/dbconfig/20210519-154808-root.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 75%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16102 and previous config saved to /var/cache/conftool/dbconfig/20210519-153304-root.json
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 50%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16101 and previous config saved to /var/cache/conftool/dbconfig/20210519-151800-root.json
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1157 (re)pooling @ 25%: Repool db1157', diff saved to https://phabricator.wikimedia.org/P16100 and previous config saved to /var/cache/conftool/dbconfig/20210519-150257-root.json
  • 13:33 kormat: uploaded wmfmariadb 0.7 packages to apt
  • 13:29 hashar@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.6 (duration: 01m 05s)
  • 13:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.6
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1157', diff saved to https://phabricator.wikimedia.org/P16099 and previous config saved to /var/cache/conftool/dbconfig/20210519-131920-marostegui.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16098 and previous config saved to /var/cache/conftool/dbconfig/20210519-131012-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16097 and previous config saved to /var/cache/conftool/dbconfig/20210519-125508-root.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16096 and previous config saved to /var/cache/conftool/dbconfig/20210519-124004-root.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16095 and previous config saved to /var/cache/conftool/dbconfig/20210519-122501-root.json
  • 11:45 matthiasmullie: "EU backports done"
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16093 and previous config saved to /var/cache/conftool/dbconfig/20210519-114203-marostegui.json
  • 11:41 mlitn@deploy1002: Synchronized php-1.37.0-wmf.6/extensions/GrowthExperiments/modules: Backport: Add a link: Set contentedtiable=false on mobile (T281771) (duration: 01m 06s)
  • 11:14 mlitn@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Properly enable media change tags on Wikipedias (T266067 T282822) - part 2 (duration: 01m 04s)
  • 11:13 mlitn@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Properly enable media change tags on Wikipedias (T266067 T282822) - part 1 (duration: 01m 34s)
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 100%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16091 and previous config saved to /var/cache/conftool/dbconfig/20210519-092630-root.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 75%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16090 and previous config saved to /var/cache/conftool/dbconfig/20210519-091126-root.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 50%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16089 and previous config saved to /var/cache/conftool/dbconfig/20210519-085622-root.json
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1175 (re)pooling @ 25%: Repool db1175', diff saved to https://phabricator.wikimedia.org/P16088 and previous config saved to /var/cache/conftool/dbconfig/20210519-084119-root.json
  • 08:28 marostegui: Stop MySQL on db1175 to upgrade kernel and mysql
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P16087 and previous config saved to /var/cache/conftool/dbconfig/20210519-082713-marostegui.json
  • 08:13 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@f514dd9]: T273847 deploying export_queries_to_relforge - starttime bump (duration: 02m 24s)
  • 08:10 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@f514dd9]: T273847 deploying export_queries_to_relforge - starttime bump
  • 07:48 zpapierski@deploy1002: Finished deploy [wikimedia/discovery/analytics@5740956]: T273847 deploying export_queries_to_relforge - index setting changes (duration: 02m 23s)
  • 07:45 zpapierski@deploy1002: Started deploy [wikimedia/discovery/analytics@5740956]: T273847 deploying export_queries_to_relforge - index setting changes
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16086 and previous config saved to /var/cache/conftool/dbconfig/20210519-074530-root.json
  • 07:42 XioNoX: roll SNMP: filter out default logical interfaces (.0) to all network devices - T283060
  • 07:38 godog: add 100G to prometheus/ops eqiad
  • 07:31 marostegui: Deploy schema change on s3 codfw, lag will appear in codfw T266486 T268392 T273360
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16085 and previous config saved to /var/cache/conftool/dbconfig/20210519-073027-root.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16084 and previous config saved to /var/cache/conftool/dbconfig/20210519-071523-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P16083 and previous config saved to /var/cache/conftool/dbconfig/20210519-070019-root.json
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1010.eqiad.wmnet
  • 06:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 T280492', diff saved to https://phabricator.wikimedia.org/P16082 and previous config saved to /var/cache/conftool/dbconfig/20210519-064343-marostegui.json
  • 06:35 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1010.eqiad.wmnet
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167', diff saved to https://phabricator.wikimedia.org/P16081 and previous config saved to /var/cache/conftool/dbconfig/20210519-063345-marostegui.json
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16080 and previous config saved to /var/cache/conftool/dbconfig/20210519-062824-root.json
  • 06:18 Amir1: upgrading daily-article-l to mailman3 (T282271 T280322)
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16079 and previous config saved to /var/cache/conftool/dbconfig/20210519-061321-root.json
  • 06:04 legoktm: restarted mailman3 on lists1001
  • 06:01 legoktm: stopped mailman3 service on lists1001 for schema change
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16078 and previous config saved to /var/cache/conftool/dbconfig/20210519-055817-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1141', diff saved to https://phabricator.wikimedia.org/P16077 and previous config saved to /var/cache/conftool/dbconfig/20210519-055134-marostegui.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Repool db1109', diff saved to https://phabricator.wikimedia.org/P16076 and previous config saved to /var/cache/conftool/dbconfig/20210519-054313-root.json
  • 05:17 marostegui: Compress a few tables on s3 T283125
  • 04:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109', diff saved to https://phabricator.wikimedia.org/P16075 and previous config saved to /var/cache/conftool/dbconfig/20210519-045857-marostegui.json
  • 03:03 reedy@deploy1002: Synchronized php-1.37.0-wmf.5/includes/changetags/ChangeTagsRevisionList.php: T283098 T283099 (duration: 01m 05s)
  • 03:01 reedy@deploy1002: Synchronized php-1.37.0-wmf.6/includes/changetags/ChangeTagsRevisionList.php: T283098 T283099 (duration: 02m 35s)

2021-05-18

  • 18:40 razzi@deploy1002: Finished deploy [analytics/refinery@9392f1d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7] (duration: 05m 16s)
  • 18:35 razzi@deploy1002: Started deploy [analytics/refinery@9392f1d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7]
  • 18:35 razzi@deploy1002: Finished deploy [analytics/refinery@9392f1d] (thin): Regular analytics weekly train THIN [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7] (duration: 00m 07s)
  • 18:34 razzi@deploy1002: Started deploy [analytics/refinery@9392f1d] (thin): Regular analytics weekly train THIN [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7]
  • 18:33 razzi@deploy1002: Finished deploy [analytics/refinery@9392f1d]: Regular analytics weekly train [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7] (duration: 15m 39s)
  • 18:29 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3da5a8b: Update IP addresses for Wiki Education Dashboard exemptions (T283096) (duration: 01m 06s)
  • 18:26 urbanecm@deploy1002: Synchronized w/robots.php: 8224e53: robots.php: avoid using ContentHandler::getContentText() (T268041) (duration: 01m 04s)
  • 18:17 razzi@deploy1002: Started deploy [analytics/refinery@9392f1d]: Regular analytics weekly train [analytics/refinery@9392f1db6e66975304c8e9b2b7031acd3ed87fa7]
  • 16:00 kormat@cumin1001: dbctl commit (dc=all): 'db1085 being decommissioned T282096', diff saved to https://phabricator.wikimedia.org/P16073 and previous config saved to /var/cache/conftool/dbconfig/20210518-160053-kormat.json
  • 15:30 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 05s)
  • 15:23 urbanecm@deploy1002: Synchronized private/PrivateSettings.php: Update T250887 mitigations (duration: 01m 07s)
  • 14:51 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1085.eqiad.wmnet
  • 14:38 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to EventPlatform on all wikis - T238138 (duration: 01m 06s)
  • 14:32 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.6
  • 14:32 kormat@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1085.eqiad.wmnet
  • 14:21 hashar@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.6 (duration: 79m 07s)
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16067 and previous config saved to /var/cache/conftool/dbconfig/20210518-142042-root.json
  • 14:17 moritzm: installing remaining postgresql-11 updates (client tools and libs, servers already done)
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16066 and previous config saved to /var/cache/conftool/dbconfig/20210518-140538-root.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16065 and previous config saved to /var/cache/conftool/dbconfig/20210518-135034-root.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: Repool db1172', diff saved to https://phabricator.wikimedia.org/P16064 and previous config saved to /var/cache/conftool/dbconfig/20210518-133531-root.json
  • 13:02 hashar@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.6
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1172', diff saved to https://phabricator.wikimedia.org/P16063 and previous config saved to /var/cache/conftool/dbconfig/20210518-125945-marostegui.json
  • 12:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aqs1012.eqiad.wmnet with reason: new AQS node
  • 12:43 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aqs1012.eqiad.wmnet with reason: new AQS node
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16062 and previous config saved to /var/cache/conftool/dbconfig/20210518-124247-root.json
  • 12:40 Krinkle: krinkle@mw1002 purge-parsercache-now.php on pc1010 (spare, depooled), ref P16060, T280605, T282761
  • 12:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16061 and previous config saved to /var/cache/conftool/dbconfig/20210518-122744-root.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16059 and previous config saved to /var/cache/conftool/dbconfig/20210518-121240-root.json
  • 12:08 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.4 (duration: 01m 28s)
  • 12:07 hashar@deploy1002: Pruned MediaWiki: 1.37.0-wmf.3 (duration: 01m 50s)
  • 12:04 hashar@deploy1002: clean aborted: Pruned MediaWiki: 1.37.0-wmf.1 (duration: 01m 16s)
  • 12:04 hashar: scap clean 1.37.0-wmf.1 1.37.0-wmf.3 and 1.37.0-wmf.4 # T281147
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: Repool db1177', diff saved to https://phabricator.wikimedia.org/P16058 and previous config saved to /var/cache/conftool/dbconfig/20210518-115736-root.json
  • 11:41 moritzm: upgrading idp2001 to Java 11.0.11
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1177', diff saved to https://phabricator.wikimedia.org/P16057 and previous config saved to /var/cache/conftool/dbconfig/20210518-112942-marostegui.json
  • 10:53 moritzm: upgrade idp-test to OpenJDK 11.0.11 T281345
  • 10:27 moritzm: installing OpenJDK updates on Hadoop/Druid/AQS/kafka-Jumbo
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16056 and previous config saved to /var/cache/conftool/dbconfig/20210518-102607-root.json
  • 10:16 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
  • 10:14 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1012.eqiad.wmnet with reason: REIMAGE
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 75%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16055 and previous config saved to /var/cache/conftool/dbconfig/20210518-101104-root.json
  • 10:03 kormat: stopping mariadb on db1085 T282096
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16054 and previous config saved to /var/cache/conftool/dbconfig/20210518-095600-root.json
  • 09:47 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P16053 and previous config saved to /var/cache/conftool/dbconfig/20210518-094732-kormat.json
  • 09:44 XioNoX: 👍
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Repool db1178', diff saved to https://phabricator.wikimedia.org/P16052 and previous config saved to /var/cache/conftool/dbconfig/20210518-094056-root.json
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1087 from dbctl T282093', diff saved to https://phabricator.wikimedia.org/P16051 and previous config saved to /var/cache/conftool/dbconfig/20210518-093552-marostegui.json
  • 09:32 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P16050 and previous config saved to /var/cache/conftool/dbconfig/20210518-093228-kormat.json
  • 09:30 topranks: add peering sessions to AS8708 RCS & RDS on cr2-esams
  • 09:27 XioNoX: push test SNMP filter config on asw-a-codfw - T283060
  • 09:17 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 50%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P16049 and previous config saved to /var/cache/conftool/dbconfig/20210518-091725-kormat.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1178', diff saved to https://phabricator.wikimedia.org/P16048 and previous config saved to /var/cache/conftool/dbconfig/20210518-091717-marostegui.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16047 and previous config saved to /var/cache/conftool/dbconfig/20210518-091702-root.json
  • 09:04 kormat@cumin1001: dbctl commit (dc=all): 'Set db1131 to weight 400 in s6/eqiad T280751', diff saved to https://phabricator.wikimedia.org/P16046 and previous config saved to /var/cache/conftool/dbconfig/20210518-090449-kormat.json
  • 09:02 kormat@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P16045 and previous config saved to /var/cache/conftool/dbconfig/20210518-090215-kormat.json
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16044 and previous config saved to /var/cache/conftool/dbconfig/20210518-090159-root.json
  • 09:01 kormat@cumin1001: dbctl commit (dc=all): 'Remove s6 eqiad primary from 'api' group T280751', diff saved to https://phabricator.wikimedia.org/P16043 and previous config saved to /var/cache/conftool/dbconfig/20210518-090156-kormat.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16042 and previous config saved to /var/cache/conftool/dbconfig/20210518-084643-root.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 25%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P16041 and previous config saved to /var/cache/conftool/dbconfig/20210518-083139-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P16040 and previous config saved to /var/cache/conftool/dbconfig/20210518-075532-marostegui.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 100%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16039 and previous config saved to /var/cache/conftool/dbconfig/20210518-075458-root.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 75%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16038 and previous config saved to /var/cache/conftool/dbconfig/20210518-073955-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 50%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16037 and previous config saved to /var/cache/conftool/dbconfig/20210518-072451-root.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1111 (re)pooling @ 25%: Repool db1111', diff saved to https://phabricator.wikimedia.org/P16036 and previous config saved to /var/cache/conftool/dbconfig/20210518-070947-root.json
  • 07:06 marostegui: Deploy schema change on s4 codfw, lag will appear in codfw T266486 T268392 T273360
  • 06:54 XioNoX: Homerify cloudsw ospf
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1111', diff saved to https://phabricator.wikimedia.org/P16035 and previous config saved to /var/cache/conftool/dbconfig/20210518-064426-marostegui.json
  • 06:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1083.eqiad.wmnet
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16034 and previous config saved to /var/cache/conftool/dbconfig/20210518-064033-root.json
  • 06:33 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1083.eqiad.wmnet
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1083 from dbctl T281445', diff saved to https://phabricator.wikimedia.org/P16033 and previous config saved to /var/cache/conftool/dbconfig/20210518-062947-marostegui.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16032 and previous config saved to /var/cache/conftool/dbconfig/20210518-062529-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16031 and previous config saved to /var/cache/conftool/dbconfig/20210518-061026-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repool db1114', diff saved to https://phabricator.wikimedia.org/P16030 and previous config saved to /var/cache/conftool/dbconfig/20210518-055522-root.json
  • 05:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts labsdb1009.eqiad.wmnet
  • 05:42 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts labsdb1009.eqiad.wmnet
  • 05:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1106.eqiad.wmnet with reason: REIMAGE
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1106.eqiad.wmnet with reason: REIMAGE
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114', diff saved to https://phabricator.wikimedia.org/P16029 and previous config saved to /var/cache/conftool/dbconfig/20210518-052324-marostegui.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106', diff saved to https://phabricator.wikimedia.org/P16028 and previous config saved to /var/cache/conftool/dbconfig/20210518-050949-marostegui.json
  • 05:06 marostegui: Restart db1115 mysql
  • 00:56 eileen: civicrm revision changed from 38ac15233f to b3fb3c9cb0, config revision is 1f8d0a6bfa

2021-05-17

  • 23:33 urbanecm@deploy1002: update-interwiki-cache aborted: Update interwiki cache for Beta Cluster (duration: 00m 01s)
  • 23:27 urbanecm@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 55s)
  • 21:46 sbassett: Deployed security patch (and ran scap sync-l10n) for T260865
  • 19:45 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Finalize WikidataCompletionSearchClicks Event Platform migration - T282140 (duration: 00m 58s)
  • 19:13 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to Event Platform on group 0 and group 1 - T238138 (duration: 00m 59s)
  • 18:27 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/skins/Vector/includes/FeatureManagement/Requirements/LanguageInHeaderTreatmentRequirement.php: e180b99: Allow `languageinheader` query param to fully control treatment of languages (T282543) (duration: 00m 58s)
  • 18:19 urbanecm@deploy1002: Synchronized wmf-config/throttle.php: c30f92b5: Remove expired throttle rule (duration: 00m 59s)
  • 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16022 and previous config saved to /var/cache/conftool/dbconfig/20210517-165322-root.json
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16021 and previous config saved to /var/cache/conftool/dbconfig/20210517-163819-root.json
  • 16:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16020 and previous config saved to /var/cache/conftool/dbconfig/20210517-162315-root.json
  • 16:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112', diff saved to https://phabricator.wikimedia.org/P16019 and previous config saved to /var/cache/conftool/dbconfig/20210517-160811-root.json
  • 15:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 100%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16018 and previous config saved to /var/cache/conftool/dbconfig/20210517-153311-root.json
  • 15:27 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.37.0-wmf.5
  • 15:26 elukey@deploy1002: Finished deploy [ores/deploy@3e1ff5f]: Update editquality submodule after Turkish Wikipedia's labelling campain - T257359 (duration: 19m 48s)
  • 15:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 75%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16017 and previous config saved to /var/cache/conftool/dbconfig/20210517-151807-root.json
  • 15:06 elukey@deploy1002: Started deploy [ores/deploy@3e1ff5f]: Update editquality submodule after Turkish Wikipedia's labelling campain - T257359
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 50%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16016 and previous config saved to /var/cache/conftool/dbconfig/20210517-150303-root.json
  • 14:53 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:53 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:50 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:50 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1099:3318 (re)pooling @ 25%: Repool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16015 and previous config saved to /var/cache/conftool/dbconfig/20210517-144800-root.json
  • 14:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 14:41 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318', diff saved to https://phabricator.wikimedia.org/P16014 and previous config saved to /var/cache/conftool/dbconfig/20210517-141737-marostegui.json
  • 14:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 100%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16013 and previous config saved to /var/cache/conftool/dbconfig/20210517-141627-root.json
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16012 and previous config saved to /var/cache/conftool/dbconfig/20210517-140438-root.json
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 100%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16011 and previous config saved to /var/cache/conftool/dbconfig/20210517-140435-root.json
  • 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 75%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16010 and previous config saved to /var/cache/conftool/dbconfig/20210517-140123-root.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16009 and previous config saved to /var/cache/conftool/dbconfig/20210517-134934-root.json
  • 13:49 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1131.eqiad.wmnet with reason: REIMAGE
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 75%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16008 and previous config saved to /var/cache/conftool/dbconfig/20210517-134931-root.json
  • 13:47 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1131.eqiad.wmnet with reason: REIMAGE
  • 13:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 50%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16007 and previous config saved to /var/cache/conftool/dbconfig/20210517-134619-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16006 and previous config saved to /var/cache/conftool/dbconfig/20210517-133431-root.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 50%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16005 and previous config saved to /var/cache/conftool/dbconfig/20210517-133427-root.json
  • 13:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3318 (re)pooling @ 25%: Repool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16004 and previous config saved to /var/cache/conftool/dbconfig/20210517-133116-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P16003 and previous config saved to /var/cache/conftool/dbconfig/20210517-131927-root.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3314 (re)pooling @ 25%: Repool db1144:3314', diff saved to https://phabricator.wikimedia.org/P16002 and previous config saved to /var/cache/conftool/dbconfig/20210517-131924-root.json
  • 13:10 marostegui: Upgrade kernel and mysql (10.4.19) on db1144:3314, db1144:3315
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3314, db1144:3315 for kernel and mysql upgrade', diff saved to https://phabricator.wikimedia.org/P16001 and previous config saved to /var/cache/conftool/dbconfig/20210517-130935-marostegui.json
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3318', diff saved to https://phabricator.wikimedia.org/P16000 and previous config saved to /var/cache/conftool/dbconfig/20210517-125742-marostegui.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15999 and previous config saved to /var/cache/conftool/dbconfig/20210517-123548-root.json
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15998 and previous config saved to /var/cache/conftool/dbconfig/20210517-122045-root.json
  • 12:08 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 12:07 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15997 and previous config saved to /var/cache/conftool/dbconfig/20210517-120541-root.json
  • 12:04 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 11:55 marostegui: Deploy schema change on s8 codfw, lag will appear in codfw T266486 T268392 T273360
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15996 and previous config saved to /var/cache/conftool/dbconfig/20210517-115037-root.json
  • 11:50 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mswikibooks --fix
  • 11:50 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=mswiki --fix
  • 11:49 Urbanecm: 11:49:22 Synchronized wmf-config/InitialiseSettings.php: a73fe2d: Make the Malaysian talk namespaces names consistent (duration: 01m 08s)
  • 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
  • 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
  • 11:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1e06f83: Enable SandboxLink at azwiki (T282954) (duration: 01m 08s)
  • 11:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 32e4343: urwiki: Grant `editprotected` to eliminators (T281274) (duration: 01m 08s)
  • 11:17 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 36d29a6: Enable NewUserMessage on ptwikinews (T282845) (duration: 01m 09s)
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P15995 and previous config saved to /var/cache/conftool/dbconfig/20210517-111343-marostegui.json
  • 11:07 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/{bnwiki,bnwiki-1.5x,bnwiki-2x}.png (T282886)
  • 11:07 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
  • 11:07 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on aqs1012.eqiad.wmnet with reason: Testing removing from new AQS cluster
  • 11:06 urbanecm@deploy1002: Synchronized static/images/project-logos/: b1da7aa: Update bnwiki project logo (T282886) (duration: 01m 42s)
  • 11:03 Urbanecm: [urbanecm@mwmaint1002 ~/uploads]$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Lusccasdeutsch . # T278856
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 100%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15994 and previous config saved to /var/cache/conftool/dbconfig/20210517-103823-root.json
  • 10:37 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 07s)
  • 10:36 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 08s)
  • 10:30 moritzm: installing postgresql-11 security updates
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 75%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15993 and previous config saved to /var/cache/conftool/dbconfig/20210517-102319-root.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 50%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15992 and previous config saved to /var/cache/conftool/dbconfig/20210517-100815-root.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1127 (re)pooling @ 25%: Repool db1127', diff saved to https://phabricator.wikimedia.org/P15991 and previous config saved to /var/cache/conftool/dbconfig/20210517-095312-root.json
  • 09:43 hashar: Restarted CI Jenkins to update the instant-messaging and ircbot plugins # T271122
  • 09:33 moritzm: installing libimage-exiftool-perl security updates
  • 09:29 topranks: push CR691140 to eqiad and codfw core routers - T282809
  • 09:18 hashar: Restarting CI Jenkins to upgrade the Gearman plugin # T281737
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1127', diff saved to https://phabricator.wikimedia.org/P15990 and previous config saved to /var/cache/conftool/dbconfig/20210517-091636-marostegui.json
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 100%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15989 and previous config saved to /var/cache/conftool/dbconfig/20210517-091604-root.json
  • 09:06 ema: cp_eqsin: run confd-reload-vcl manually to fix /var/run/reload-vcl-state T282880
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 75%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15988 and previous config saved to /var/cache/conftool/dbconfig/20210517-090101-root.json
  • 08:52 vgutierrez: pool cp5016
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 50%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15987 and previous config saved to /var/cache/conftool/dbconfig/20210517-084557-root.json
  • 08:45 vgutierrez: depool cp5016
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3317 (re)pooling @ 25%: Repool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15986 and previous config saved to /var/cache/conftool/dbconfig/20210517-083053-root.json
  • 08:28 Urbanecm: wikiadmin@10.64.48.109(centralauth)> delete from global_group_restrictions where ggr_group="Indic_Bots"; # T282968
  • 08:26 urbanecm@deploy1002: Synchronized wmf-config/logos.php: 93e61f7: Use svwiki 20th anniversary logos (T282389) (duration: 01m 08s)
  • 08:24 urbanecm@deploy1002: Synchronized static/images/project-logos/: 0f356a3: Add svwiki 20th anniversary logos (T282389) (duration: 01m 12s)
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3317', diff saved to https://phabricator.wikimedia.org/P15985 and previous config saved to /var/cache/conftool/dbconfig/20210517-061232-marostegui.json
  • 06:01 kormat: restarting mariadb on db1131 to pick up report_host T266483
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 100%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15984 and previous config saved to /var/cache/conftool/dbconfig/20210517-055556-root.json
  • 05:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1079.eqiad.wmnet
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 75%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15983 and previous config saved to /var/cache/conftool/dbconfig/20210517-054053-root.json
  • 05:32 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1079.eqiad.wmnet
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 50%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15982 and previous config saved to /var/cache/conftool/dbconfig/20210517-052549-root.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1079 from dbctl T282079', diff saved to https://phabricator.wikimedia.org/P15981 and previous config saved to /var/cache/conftool/dbconfig/20210517-051728-marostegui.json
  • 05:13 kormat@cumin1001: dbctl commit (dc=all): 'Depool db1131 until it's reimaged to buster T282124', diff saved to https://phabricator.wikimedia.org/P15980 and previous config saved to /var/cache/conftool/dbconfig/20210517-051312-kormat.json
  • 05:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1124 (re)pooling @ 25%: Repool db1124', diff saved to https://phabricator.wikimedia.org/P15979 and previous config saved to /var/cache/conftool/dbconfig/20210517-051045-root.json
  • 05:07 kormat@cumin1001: dbctl commit (dc=all): 'Promote db1173 to s6 master and set section read-write T282124', diff saved to https://phabricator.wikimedia.org/P15978 and previous config saved to /var/cache/conftool/dbconfig/20210517-050740-kormat.json
  • 05:05 kormat@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T282124', diff saved to https://phabricator.wikimedia.org/P15977 and previous config saved to /var/cache/conftool/dbconfig/20210517-050526-kormat.json
  • 05:05 kormat: Starting s6 eqiad failover from db1131 to db1173 - T282124
  • 04:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1112.eqiad.wmnet with reason: REIMAGE
  • 04:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1112.eqiad.wmnet with reason: REIMAGE
  • 04:46 kormat@cumin1001: dbctl commit (dc=all): 'Set db1173 with weight 0 T282124', diff saved to https://phabricator.wikimedia.org/P15976 and previous config saved to /var/cache/conftool/dbconfig/20210517-044657-kormat.json
  • 04:46 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Master switchover s6 T282124
  • 04:46 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Master switchover s6 T282124
  • 04:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 T280492', diff saved to https://phabricator.wikimedia.org/P15975 and previous config saved to /var/cache/conftool/dbconfig/20210517-043551-marostegui.json
  • 04:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1124', diff saved to https://phabricator.wikimedia.org/P15974 and previous config saved to /var/cache/conftool/dbconfig/20210517-043148-marostegui.json
  • 02:10 legoktm: uninstalled python3-dbg on lists1001
  • 01:31 legoktm: restarted mailman3-web
  • 00:13 legoktm: installing python3-dbg on lists1001

2021-05-16

  • 22:45 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=tawiki wikilove # T280326
  • 20:46 legoktm: restarted mailman3-web
  • 19:38 legoktm: restarted mailman3-web
  • 17:29 Amir1: restart mailman3-web
  • 02:39 legoktm: restarting mailman3-web on lists1001 again
  • 00:53 legoktm: restarted mailman3-web on lists1001, uwsgi looked like it got stuck, consuming all CPU/memory

2021-05-15

  • 12:33 Amir1: set fr_quality to 0 for all revisions on several wikis (T279761)
  • 06:54 Amir1: migrating most of last mailing lists of T280322

2021-05-14

  • 20:42 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1002.eqiad.wmnet
  • 20:32 mutante: people1002 - decom'ing - please use people1003 and see list mail
  • 20:31 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1002.eqiad.wmnet
  • 18:58 cdanis@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 18:58 cdanis@cumin1001: START - Cookbook sre.network.cf
  • 18:39 cdanis: ✔️ cdanis@install1003.wikimedia.org ~ 🕝☕ sudo systemctl restart squid.service
  • 18:14 mutante: people1003/people2002: awk -F: '$6 ~ "^\/home" {print $1,$6}' /etc/passwd | while read line ; do user=${line% *}; dir=${line#* }; sudo mkdir -p ${dir}/public_html; sudo chown $user ${dir}/public_html; done (courtesy of Jbond)
  • 17:49 bblack: install1003 - restored normal resolv.conf + re-enabled+ran puppet
  • 17:41 bblack: install1003 - restart squid
  • 17:35 bblack: install1003 - puppet disabled and /etc/resolv.conf manually patched over to deal with a current issue
  • 17:25 cdanis: rolled back cr1-eqiad/cr2-eqiad interface disables T282881
  • 17:10 cdanis: cdanis@re0.cr1-eqiad# set interfaces gr-3/3/0.1 disable # T282881
  • 17:03 cdanis: cdanis@re0.cr2-eqiad# set interfaces gr-4/3/0.2 disable # T282881
  • 15:22 cdanis@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 15:22 cdanis@cumin2002: START - Cookbook sre.network.cf
  • 15:05 Urbanecm: Start server-side upload for 1 video file (T282874)
  • 14:09 andrew@deploy1002: Finished deploy [horizon/deploy@5d0a683]: removing 'locality' from trove dashboard (duration: 04m 15s)
  • 14:04 andrew@deploy1002: Started deploy [horizon/deploy@5d0a683]: removing 'locality' from trove dashboard
  • 12:54 bblack: re-running puppet agent on cp5*
  • 12:19 jbond42: run puppet on CP servers
  • 04:20 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/revisionlist/RevisionItem.php: fix deprecation warning T282825 (duration: 01m 07s)
  • 04:19 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/revisiondelete/RevDelRevisionItem.php: fix deprecation warning T282825 (duration: 01m 07s)
  • 04:18 ariel@deploy1002: Finished deploy [dumps/dumps@b97a2a9]: eliminate double slash in construction of api path (duration: 00m 03s)
  • 04:18 ariel@deploy1002: Started deploy [dumps/dumps@b97a2a9]: eliminate double slash in construction of api path
  • 03:25 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/MapSources/includes/specials/MapSourcesPage.php: fix PHP notice T282833 (duration: 01m 07s)
  • 03:20 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/page/WikiPage.php: T282844 (duration: 01m 06s)
  • 03:18 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/page/PageArchive.php: T282844 (duration: 01m 07s)
  • 03:16 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/Revision/RevisionArchiveRecord.php: fix DeletedContributions breakage T282844 (duration: 01m 07s)
  • 03:13 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/logging/LogEventsList.php: fix PHP notice T282834 (duration: 01m 08s)
  • 00:39 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`

2021-05-13

  • 23:53 mutante: [sodium:~] $ sudo systemctl start update-ubuntu-mirror.service
  • 23:50 mutante: [sodium:~] $ sudo -u mirror /usr/local/sbin/update-ubuntu-mirror
  • 23:22 thcipriani@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/WikimediaEvents: Backport: Fix "final_state: vector" bug in VectorPrefDiffInstrumentation (T261842) (duration: 01m 07s)
  • 23:11 thcipriani@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable WikiLove extension on tawiki (T280326) (duration: 01m 07s)
  • 23:10 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 --new wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 23:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2003.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 23:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1003.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 20:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: REVERT: 9dc74e4: Revert "Enable media change tags on wikipedias" (T266067, T282822) (duration: 01m 07s)
  • 20:09 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:09 herron@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 20:08 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 20:08 herron@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-logging-external' for release 'production' .
  • 19:43 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.5 (duration: 01m 06s)
  • 19:42 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.5
  • 19:39 dancy@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GeoData/includes/Hooks.php: Backport: Make sure mId exists (T282735) (duration: 01m 08s)
  • 19:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 80e5b9d: cd113a7: Enable structured_task/article/link_suggestion_interaction schema (T278177) (duration: 01m 06s)
  • 18:59 Urbanecm: Morning B&C is going to take few more minutes
  • 18:36 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people2001.codfw.wmnet
  • 18:35 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 0856ae1: ca52e78: GrowthExperiments backports (T282711, T282175) (duration: 01m 08s)
  • 18:26 mutante: people2001 is going down - people1003 (eqiad) and people2002 (codfw) are your replacements on bullseye
  • 18:25 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people2001.codfw.wmnet
  • 18:22 Urbanecm: Start server-side upload for 2 video files (T282643, T282644)
  • 18:21 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4cd6a78: Growth features: Push elwiki and cawiki out of dark mode (T280673; T280172) (duration: 01m 07s)
  • 18:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 04eb9d3: Enable media change tags on wikipedias (T266067) (duration: 01m 07s)
  • 18:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: b3300c3: 59c8448: Enable Extension:MediaSearch on (test)commons (T265939) (duration: 01m 08s)
  • 17:20 andrew@deploy1002: Finished deploy [horizon/deploy@3d160f6]: Adding Database dashboards (duration: 04m 08s)
  • 17:16 andrew@deploy1002: Started deploy [horizon/deploy@3d160f6]: Adding Database dashboards
  • 16:36 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: add poolcounter1005 back to config (T273278) (duration: 01m 07s)
  • 16:26 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1005.eqiad.wmnet
  • 16:24 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1005.eqiad.wmnet
  • 16:24 effie: rebooting poolcounter1005
  • 16:09 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: poolcounter1005 will be rebooted for updates (T273278) (duration: 01m 07s)
  • 15:58 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: add poolcounter1004 back to config (T273278) (duration: 01m 07s)
  • 15:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1004.eqiad.wmnet
  • 15:46 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1004.eqiad.wmnet
  • 15:46 effie: restarting poolcounter1004
  • 15:27 jiji@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: ProductionServices: poolcounter1004 will be rebooted for updates (T273278) (duration: 01m 08s)
  • 14:49 Urbanecm: Start server-side upload for 1 video file (T282785)
  • 14:07 Urbanecm: Start server-side upload for 3 video files (T282558, T282556)
  • 12:40 tgr@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments: Backport: instrumentation patches ([[gerrit:690070|]] [[gerrit:690071|]] [[gerrit:690072|]] [[gerrit:690073|]]) (T278116 T278117 T278114 T278177 T278487 T278112 T278111 T278118) (duration: 01m 09s)
  • 11:00 hnowlan: deleting packages still referenced by jessie components: `sudo -i reprepro clearvanished --delete`
  • 10:46 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 10:40 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 10:31 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:25 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 10:11 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 08:47 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 08:47 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 08:45 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 08:45 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 08:21 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 07:43 kevinbazira@deploy1002: Finished deploy [ores/deploy@8fd23ed]: Regular ORES Deployment T278723 (duration: 32m 50s)
  • 07:10 kevinbazira@deploy1002: Started deploy [ores/deploy@8fd23ed]: Regular ORES Deployment T278723
  • 05:54 _joe_: running docker image prune on contint1001, which has 722 unlinked images stored in its docker daemon
  • 01:20 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)

2021-05-12

  • 23:48 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/WikiEditor/includes/WikiEditorHooks.php: 2f6af514c49d47bbec5ce51f9f7263015e039003? PHP VisualEditorFeatureUse logging: properly record session id (T281409) (duration: 01m 07s)
  • 23:40 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/WikiEditor/includes/WikiEditorHooks.php: ef41396: PHP VisualEditorFeatureUse logging: properly record session id (T281409) (duration: 01m 08s)
  • 23:27 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 23:27 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 22:01 ryankemper@cumin2001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:56 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:56 ryankemper@cumin2001: START - Cookbook sre.wdqs.data-transfer
  • 21:54 ryankemper: T280382 `wdqs1012.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 20:57 ottomata: starting new drop_event data purge job to drop all event data older than 90 days in the Hive event database - T273789
  • 20:33 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:27 ryankemper@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:25 ryankemper@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:15 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 19:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:11 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 19:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 19:10 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1012.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `wdqs_reimage`
  • 19:09 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:07 ryankemper@cumin2001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin2001 - T280563
  • 19:06 dancy@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.5 (duration: 01m 06s)
  • 19:05 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.5
  • 19:05 ryankemper: T280382 T281437 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2007.codfw.wmnet` on `ryankemper@cumin2001` tmux session `wdqs_reimage`
  • 19:00 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin2001` tmux session `elastic_restarts`
  • 19:00 ryankemper@cumin2001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin2001 - T280563
  • 18:59 ryankemper: [Elastic] Restarted `*search*` services on `elastic2058`
  • 18:48 mutante: rsyncing home dirs of people1003 over to people2002 as well (T280989)
  • 18:42 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 3999be1: Add Link: refine exclusion rules for finding link text matches (duration: 01m 08s)
  • 18:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eb65aff: Update wordmark and tagline for kawiki (T278251; 2/2) (duration: 01m 09s)
  • 18:26 urbanecm@deploy1002: Synchronized static/images/mobile/: eb65aff: Update wordmark and tagline for kawiki (T278251; 1/2) (duration: 01m 06s)
  • 18:25 urbanecm@deploy1002: sync-file aborted: eb65aff: Update wordmark and tagline for kawiki (T278251) (duration: 00m 00s)
  • 18:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 0cd3297: Disable Education Program namespaces in cswiki (T282691) (duration: 01m 15s)
  • 18:11 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/includes/skins/SkinTemplate.php: 7f14913: Modern keys must be unset (T282646) (duration: 01m 08s)
  • 18:05 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 11defd4: enwiki: Growth features: Change help panel links (T281896) (duration: 01m 23s)
  • 16:15 hnowlan: including envoyproxy_1.15.5-1_amd64.changes with reprepro
  • 15:51 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudnet2003-dev.codfw.wmnet
  • 14:45 aborrero@cumin2001: START - Cookbook sre.hosts.decommission for hosts cloudnet2003-dev.codfw.wmnet
  • 14:02 marostegui: Upgrad mysql on clouddb1015
  • 14:01 marostegui: Upgraded mysql on clouddb1014
  • 13:57 kormat: uploaded wmfmariadbpy 0.6.1 for bullseye
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15950 and previous config saved to /var/cache/conftool/dbconfig/20210512-133248-root.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15949 and previous config saved to /var/cache/conftool/dbconfig/20210512-131745-root.json
  • 13:15 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 13:13 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 13:06 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Test deploy procedure on cumin2002 - volans@cumin2002
  • 13:05 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Test deploy procedure on cumin2002 - volans@cumin2002
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15948 and previous config saved to /var/cache/conftool/dbconfig/20210512-130239-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repool db1174', diff saved to https://phabricator.wikimedia.org/P15947 and previous config saved to /var/cache/conftool/dbconfig/20210512-124736-root.json
  • 12:44 aborrero@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 12:42 aborrero@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2004-dev.codfw.wmnet with reason: REIMAGE
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P15946 and previous config saved to /var/cache/conftool/dbconfig/20210512-121004-marostegui.json
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 100%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15945 and previous config saved to /var/cache/conftool/dbconfig/20210512-120746-root.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 75%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15944 and previous config saved to /var/cache/conftool/dbconfig/20210512-115242-root.json
  • 11:43 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.5/extensions/GrowthExperiments/: 6cc2530: c268d08: b89592e: 7620953: 8fd7610: GrowthExperiments backports (duration: 01m 17s)
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 50%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15943 and previous config saved to /var/cache/conftool/dbconfig/20210512-113737-root.json
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1101:3317 (re)pooling @ 25%: Repool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15942 and previous config saved to /var/cache/conftool/dbconfig/20210512-112234-root.json
  • 11:09 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9939edb: zhwikinews: Allow sysops to grant/revoke transwiki group (T273405) (duration: 02m 17s)
  • 10:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: T276922
  • 10:46 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on cloudvirt1038.eqiad.wmnet with reason: T276922
  • 10:32 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 10:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2004.codfw.wmnet
  • 10:29 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2004.codfw.wmnet
  • 10:14 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2003.codfw.wmnet
  • 10:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2003.codfw.wmnet
  • 10:01 effie: reboot poolcounter2003 and poolcounter2004
  • 09:55 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317', diff saved to https://phabricator.wikimedia.org/P15940 and previous config saved to /var/cache/conftool/dbconfig/20210512-093333-marostegui.json
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 100%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15939 and previous config saved to /var/cache/conftool/dbconfig/20210512-093308-root.json
  • 09:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1074.eqiad.wmnet
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 75%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15938 and previous config saved to /var/cache/conftool/dbconfig/20210512-091804-root.json
  • 09:10 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1074.eqiad.wmnet
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 50%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15937 and previous config saved to /var/cache/conftool/dbconfig/20210512-090301-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1098:3317 (re)pooling @ 25%: Repool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15936 and previous config saved to /var/cache/conftool/dbconfig/20210512-084757-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1074 from dbctl T281959', diff saved to https://phabricator.wikimedia.org/P15935 and previous config saved to /var/cache/conftool/dbconfig/20210512-084755-marostegui.json
  • 08:23 jbond42: rolling restart of ats
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15934 and previous config saved to /var/cache/conftool/dbconfig/20210512-071017-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317', diff saved to https://phabricator.wikimedia.org/P15933 and previous config saved to /var/cache/conftool/dbconfig/20210512-070202-marostegui.json
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15932 and previous config saved to /var/cache/conftool/dbconfig/20210512-065513-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15931 and previous config saved to /var/cache/conftool/dbconfig/20210512-064009-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15930 and previous config saved to /var/cache/conftool/dbconfig/20210512-062506-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P15929 and previous config saved to /var/cache/conftool/dbconfig/20210512-062118-marostegui.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2121 and db2108 in s7 T282535', diff saved to https://phabricator.wikimedia.org/P15928 and previous config saved to /var/cache/conftool/dbconfig/20210512-062046-marostegui.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15927 and previous config saved to /var/cache/conftool/dbconfig/20210512-061702-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Move db2148 to also serve vslow in s2 T282535', diff saved to https://phabricator.wikimedia.org/P15926 and previous config saved to /var/cache/conftool/dbconfig/20210512-060817-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15925 and previous config saved to /var/cache/conftool/dbconfig/20210512-060158-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15924 and previous config saved to /var/cache/conftool/dbconfig/20210512-054655-root.json
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Repool db1181', diff saved to https://phabricator.wikimedia.org/P15923 and previous config saved to /var/cache/conftool/dbconfig/20210512-053151-root.json
  • 05:00 marostegui: Stop MySQL on labsdb1009 labsdb1010 labsdb1011 T282524 T282523 T282522
  • 04:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1181', diff saved to https://phabricator.wikimedia.org/P15922 and previous config saved to /var/cache/conftool/dbconfig/20210512-044728-marostegui.json
  • 04:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2121 T282535', diff saved to https://phabricator.wikimedia.org/P15920 and previous config saved to /var/cache/conftool/dbconfig/20210512-044222-marostegui.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2108 T282535', diff saved to https://phabricator.wikimedia.org/P15919 and previous config saved to /var/cache/conftool/dbconfig/20210512-044109-marostegui.json
  • 04:38 marostegui: Drop testing mailman3 databases T281548
  • 04:36 Amir1: importing archives of wikitech-l (T280322)
  • 01:42 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on people2002.codfw.wmnet with reason: new host
  • 01:42 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on people2002.codfw.wmnet with reason: new host
  • 01:35 mutante: people2002 - created new VM resembling people2001, signed puppet cert request, initial puppet run T280989
  • 01:19 tstarling@deploy1002: Synchronized php-1.37.0-wmf.5/includes/specialpage/ChangesListSpecialPage.php: T282183 fix hidemyself in RC and watchlist (duration: 01m 08s)
  • 01:17 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specialpage/ChangesListSpecialPage.php: T282183 fix hidemyself in RC and watchlist (duration: 01m 16s)
  • 00:54 mutante: made public_html dirs on people1002 readonly to make it obvious it is not the active backend anymore
  • 00:51 mutante: [people1002:/home] $ sudo find . -type d -name public_html -exec chmod 555 {} \;

2021-05-11

  • 23:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: ec37795: Change namespace names and aliases on tiwiki and tiwiktionary (T263840) (duration: 01m 07s)
  • 23:16 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5bc40ac: ptwiki: Use celebration logos in new vector (T281925) (duration: 01m 06s)
  • 23:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: eac843a: Make DT source mode toolbar available as beta on all wikis (T279124) (duration: 01m 12s)
  • 23:06 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/wikipedia-pt-20.png: 60e6e4e: ptwiki: Add wikipedia-pt-20.png (T281925) (duration: 01m 08s)
  • 23:02 urbanecm@deploy1002: Synchronized static/images/mobile/copyright/: e35199b: Adding square logo and wordmark for ptwiki 20 years celebration (T281925) (duration: 01m 50s)
  • 22:14 legoktm@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lists1002.wikimedia.org
  • 22:05 legoktm@cumin1001: START - Cookbook sre.hosts.decommission for hosts lists1002.wikimedia.org
  • 21:37 Urbanecm: Start server-side upload for 3 video files (T282566, T282565, T282559)
  • 21:37 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1012.eqiad.wmnet with reason: REIMAGE
  • 21:34 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1012.eqiad.wmnet with reason: REIMAGE
  • 20:52 legoktm: upgraded mailman3 on lists1001
  • 20:37 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people2002.codfw.wmnet
  • 20:24 mforns@deploy1002: Finished deploy [analytics/refinery@270c753] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 06m 57s)
  • 20:17 mforns@deploy1002: Started deploy [analytics/refinery@270c753] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
  • 20:17 mforns@deploy1002: Finished deploy [analytics/refinery@270c753] (thin): Regular analytics weekly train THIN [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 00m 05s)
  • 20:17 mforns@deploy1002: Started deploy [analytics/refinery@270c753] (thin): Regular analytics weekly train THIN [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
  • 20:17 mforns@deploy1002: Finished deploy [analytics/refinery@270c753]: Regular analytics weekly train [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795] (duration: 17m 01s)
  • 20:00 mforns@deploy1002: Started deploy [analytics/refinery@270c753]: Regular analytics weekly train [analytics/refinery@270c753fc746b979cf90e1537f9a67ede6372795]
  • 19:55 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people2002.codfw.wmnet
  • 19:46 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 09m 45s)
  • 19:37 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
  • 19:33 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.5
  • 19:29 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d] (thin): Regular analytics weekly train THIN [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 00m 07s)
  • 19:29 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d] (thin): Regular analytics weekly train THIN [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
  • 19:28 mforns@deploy1002: Finished deploy [analytics/refinery@7e0598d]: Regular analytics weekly train [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b] (duration: 45m 45s)
  • 18:55 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1011.eqiad.wmnet with reason: REIMAGE
  • 18:53 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Migrate VirtualPageView to EventPlatform on testwiki - T238138 (duration: 01m 09s)
  • 18:52 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1011.eqiad.wmnet with reason: REIMAGE
  • 18:43 mforns@deploy1002: Started deploy [analytics/refinery@7e0598d]: Regular analytics weekly train [analytics/refinery@7e0598d3f0805bf3dda4e01b637d95c16a6a668b]
  • 18:20 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.5 (duration: 09m 43s)
  • 18:10 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.5
  • 17:36 andrew@deploy1002: Finished deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again) (duration: 01m 25s)
  • 17:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 17:35 andrew@deploy1002: Started deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again)
  • 17:34 andrew@deploy1002: Finished deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again) (duration: 02m 27s)
  • 17:33 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 17:32 andrew@deploy1002: Started deploy [horizon/deploy@acc3c68]: testing default policy deployment in codfw1dev (again)
  • 17:31 andrew@deploy1002: Finished deploy [horizon/deploy@2604d7b]: testing default policy deployment in codfw1dev (duration: 01m 59s)
  • 17:29 andrew@deploy1002: Started deploy [horizon/deploy@2604d7b]: testing default policy deployment in codfw1dev
  • 17:20 mutante: the backend for people.wikimedia.org switched from people1002 to people1003, the people.wikimedia.org CNAME has been updated. MOTD is about to be updated to inform users.
  • 17:18 legoktm: disabled pipermail redirects on lists.wikimedia.org
  • 17:07 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 16:12 jynus: restarting bacula-dir on backup1001, stuck process
  • 15:59 dancy@deploy1002: rebuilt and synchronized wikiversions files: (no justification provided)
  • 15:58 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwlog1001.eqiad.wmnet
  • 15:55 bstorm: restart haproxy on dbproxy1018/9 to remove old config
  • 15:47 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog1001.eqiad.wmnet
  • 15:38 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwlog2001.codfw.wmnet
  • 15:37 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 15:36 dancy@deploy1002: sync-world aborted: testwikis wikis to 1.37.0-wmf.4 (duration: 02m 04s)
  • 15:34 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.4
  • 15:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:31 dancy@deploy1002: scap failed: RuntimeError scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details) (duration: 17m 36s)
  • 15:31 dancy@deploy1002: scap failed: average error rate on 9/9 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/83629bcb5560d11e61d3085c89dd9ed6 for details)
  • 15:27 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts mwlog2001.codfw.wmnet
  • 15:24 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 15:13 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.5
  • 15:03 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 14:57 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on logstash1010.eqiad.wmnet with reason: REIMAGE
  • 14:49 moritzm: installing busybox security updates
  • 14:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:31 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:29 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:27 moritzm: installing cgal security updates
  • 14:26 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:14 hashar: Restarted CI Jenkins with a snapshot of the Gearman Jenkins plugin # T281737
  • 14:10 hashar: Restarted CI Jenkins for plugin upgrade # T282433
  • 14:05 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:01 hashar: Restarted releases Jenkins for plugin upgrade # T282433
  • 13:47 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 1d4d007: enwiki: Growth features: Change help panel links (T281896) (duration: 01m 02s)
  • 13:39 jbond42: rolling restart of ats-backend
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mc1027.eqiad.wmnet
  • 12:11 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mc1027.eqiad.wmnet
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15913 and previous config saved to /var/cache/conftool/dbconfig/20210511-114540-root.json
  • 11:35 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15912 and previous config saved to /var/cache/conftool/dbconfig/20210511-113036-root.json
  • 11:16 ladsgroup@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Add P2671 and P4839 to deprecated properties list (T280779) (duration: 00m 58s)
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15911 and previous config saved to /var/cache/conftool/dbconfig/20210511-111532-root.json
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Repool db1162', diff saved to https://phabricator.wikimedia.org/P15910 and previous config saved to /var/cache/conftool/dbconfig/20210511-110029-root.json
  • 10:52 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:46 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1162', diff saved to https://phabricator.wikimedia.org/P15909 and previous config saved to /var/cache/conftool/dbconfig/20210511-102303-marostegui.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 100%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15908 and previous config saved to /var/cache/conftool/dbconfig/20210511-102212-root.json
  • 10:13 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 10:13 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 75%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15907 and previous config saved to /var/cache/conftool/dbconfig/20210511-100708-root.json
  • 09:54 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudgw2002-dev.codfw.wmnet
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 50%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15904 and previous config saved to /var/cache/conftool/dbconfig/20210511-095204-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1170:3312 (re)pooling @ 25%: Repool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15903 and previous config saved to /var/cache/conftool/dbconfig/20210511-093701-root.json
  • 09:23 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
  • 08:37 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:36 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 08:35 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 08:34 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 08:32 moritzm: installing hivex security updates
  • 08:31 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:30 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1170:3312', diff saved to https://phabricator.wikimedia.org/P15901 and previous config saved to /var/cache/conftool/dbconfig/20210511-082038-marostegui.json
  • 08:19 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 07:55 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:54 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 07:40 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:39 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15899 and previous config saved to /var/cache/conftool/dbconfig/20210511-070742-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15898 and previous config saved to /var/cache/conftool/dbconfig/20210511-065238-root.json
  • 06:50 marostegui: Stop replication on db2094:3318 T282514
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15897 and previous config saved to /var/cache/conftool/dbconfig/20210511-063734-root.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: Repool db1182', diff saved to https://phabricator.wikimedia.org/P15896 and previous config saved to /var/cache/conftool/dbconfig/20210511-062231-root.json
  • 05:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1082.eqiad.wmnet
  • 05:36 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1082.eqiad.wmnet
  • 05:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1121.eqiad.wmnet with reason: REIMAGE
  • 05:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1121.eqiad.wmnet with reason: REIMAGE
  • 05:11 marostegui: Reimage db1121 to buster, this will generate lag on s4 (commonswiki) on wikireplicas T280492
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 - going to be reimaged to buster T280492', diff saved to https://phabricator.wikimedia.org/P15895 and previous config saved to /var/cache/conftool/dbconfig/20210511-051102-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P15894 and previous config saved to /var/cache/conftool/dbconfig/20210511-050816-marostegui.json

2021-05-10

  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 779fb53: Update messages used for tech CoC (T280886) (duration: 00m 56s)
  • 23:32 urbanecm@deploy1002: Synchronized wmf-config/extension-list: ba8b786: NO-OP: Enable ChessBrowser on beta (T244075) (duration: 00m 57s)
  • 23:12 urbanecm@deploy1002: Synchronized wmf-config/logos.php: dd6fa65: Use ptwiki 20th anniversary logos (T281925) (duration: 00m 59s)
  • 23:08 urbanecm@deploy1002: Synchronized static/images/project-logos/: f2a76b1: Add ptwiki 20th anniversary logos (T281925) (duration: 00m 58s)
  • 22:28 eileen: civicrm revision changed from 2052d79248 to 38ac15233f, config revision is 47f21e4568
  • 21:59 dancy@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/MediaSearch/MediaSearch.i18n.php: Backport: Manually include I18nUtils class (T282206) (duration: 00m 56s)
  • 21:45 dancy@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/MediaSearch/MediaSearch.i18n.php: Backport: Manually include I18nUtils class (T282206) (duration: 01m 01s)
  • 21:39 legoktm: nvm, downgraded flufl.bounce on lists1001
  • 21:26 legoktm: upgraded flufl.bounce on lists1001 and restarted mailman3 T282348
  • 20:44 andrew@deploy1002: Finished deploy [horizon/deploy@2604d7b]: more deployment fixes (duration: 03m 44s)
  • 20:41 andrew@deploy1002: Started deploy [horizon/deploy@2604d7b]: more deployment fixes
  • 20:40 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 02m 07s)
  • 20:38 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:35 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 01m 55s)
  • 20:33 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:31 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 01m 21s)
  • 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:29 andrew@deploy1002: deploy aborted: update horizon to fix T282489 (duration: 00m 36s)
  • 20:29 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:29 andrew@deploy1002: deploy aborted: update horizon to fix T282489 (duration: 00m 15s)
  • 20:28 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 20:25 andrew@deploy1002: Finished deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489 (duration: 04m 10s)
  • 20:21 andrew@deploy1002: Started deploy [horizon/deploy@6dc83bd]: update horizon to fix T282489
  • 18:34 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: loginwiki: Allow users to mark Notifications as read (T264834) (duration: 00m 57s)
  • 18:25 jforrester@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Disable LocalisationUpdate, part I (T158360) (duration: 00m 58s)
  • 18:24 XioNoX: add cmooney to all network devices
  • 18:18 jforrester@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: [wikitech] Enable VE desktop section edit links (T280291) (duration: 00m 57s)
  • 18:13 jforrester@deploy1002: Synchronized wmf-config: Config: wgAbuseFilterAflFilterMigrationStage: Stop setting, COMPAT_NEW is default (T269712) (duration: 00m 57s)
  • 18:10 jforrester@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: FlaggedRevs: Stop setting wgFlaggedRevsWhitelist, now ignored (duration: 00m 57s)
  • 18:08 legoktm: imported new mailman3, flufl.bounce packages to apt.wm.o
  • 16:27 jbond42: rm -r /var/lib/routinator/repository and rebuilding repo
  • 16:23 herron@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: arclamp/xenon: point all hosts to eqiad (mwlog1002) (T224565) (duration: 00m 59s)
  • 15:20 elukey: restart rsyslog on rpki1001
  • 14:32 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
  • 13:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 100%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15892 and previous config saved to /var/cache/conftool/dbconfig/20210510-131434-root.json
  • 12:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 75%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15891 and previous config saved to /var/cache/conftool/dbconfig/20210510-125930-root.json
  • 12:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 50%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15890 and previous config saved to /var/cache/conftool/dbconfig/20210510-124427-root.json
  • 12:29 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1105:3312 (re)pooling @ 25%: Repool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15889 and previous config saved to /var/cache/conftool/dbconfig/20210510-122923-root.json
  • 12:27 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Initial deploy to cumin2002 - volans@cumin2002
  • 11:46 Urbanecm: EU B&C window done
  • 11:41 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3418237: Disabling Education Program namespaces in Russian Wikipedia (T282112) (duration: 00m 57s)
  • 11:37 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8bef11c: Add *.geograph.ie to the wgCopyUploadsDomains allowlist of Wikimedia Commons (T282007) (duration: 00m 57s)
  • 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage --fix # T262155
  • 11:33 Urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki=jawikivoyage # T262155
  • 11:33 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 068cd7e: Change namespace name and aliases on jawikivoyage (T262155) (duration: 00m 57s)
  • 11:26 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 9209d96: Remove Vector language button from Commons, Wikidata, Mediawiki, Wikispecies (T281968) (duration: 00m 57s)
  • 11:20 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 7f6f849: Add tmpSerializeEmptyListsAsObjects to Wikibase.php (T241422) (duration: 01m 01s)
  • 11:19 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 6138c64: Add tmpSerializeEmptyListsAsObjects Wikibase repo config (T241422) (duration: 00m 57s)
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 23271dd: Enable ReferencePreviews as full default on Marathi wiki (T282147) (duration: 00m 57s)
  • 11:09 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: bd28391: DatabaseBlockStore: fetch correct ActorNormalization (3/3; T281972) (duration: 00m 56s)
  • 11:08 urbanecm@deploy1002: sync-file aborted: bd28391: DatabaseBlockStore: fetch correct ActorNormalization (T281972) (duration: 00m 04s)
  • 11:07 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/ServiceWiring.php: 85dc711: DatabaseBlockStore: fetch correct ActorNormalization (2/3; T281972) (duration: 00m 56s)
  • 11:05 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/block/DatabaseBlockStore.php: 85dc711: DatabaseBlockStore: fetch correct ActorNormalization (1/3; T281972) (duration: 00m 57s)
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312', diff saved to https://phabricator.wikimedia.org/P15888 and previous config saved to /var/cache/conftool/dbconfig/20210510-110125-marostegui.json
  • 10:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15887 and previous config saved to /var/cache/conftool/dbconfig/20210510-104119-root.json
  • 10:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 10:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 59s)
  • 10:31 moritzm: installing openjdk-11 security updates
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15886 and previous config saved to /var/cache/conftool/dbconfig/20210510-102615-root.json
  • 10:22 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
  • 10:18 vgutierrez: rolling restart of ATS backend instances to clear spurious warnings
  • 10:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1004.eqiad.wmnet
  • 10:13 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
  • 10:13 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on maps1005.eqiad.wmnet with reason: Resyncing database from master
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15885 and previous config saved to /var/cache/conftool/dbconfig/20210510-101112-root.json
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15884 and previous config saved to /var/cache/conftool/dbconfig/20210510-095608-root.json
  • 09:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqiad - T281673
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 T281959', diff saved to https://phabricator.wikimedia.org/P15883 and previous config saved to /var/cache/conftool/dbconfig/20210510-094554-marostegui.json
  • 09:28 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
  • 09:27 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1003.wikimedia.org
  • 09:26 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2006.wikimedia.org
  • 08:52 moritzm: installing bind9 security updates on stretch (client-side tools/libs only)
  • 08:48 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@esams - T281673
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1156 for schema change', diff saved to https://phabricator.wikimedia.org/P15881 and previous config saved to /var/cache/conftool/dbconfig/20210510-084102-marostegui.json
  • 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid1001.eqiad.wmnet
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 100%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15880 and previous config saved to /var/cache/conftool/dbconfig/20210510-084040-root.json
  • 08:28 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid1001.eqiad.wmnet
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 75%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15879 and previous config saved to /var/cache/conftool/dbconfig/20210510-082536-root.json
  • 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts failoid2001.codfw.wmnet
  • 08:24 XioNoX: push pfw policies - T282286
  • 08:15 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts failoid2001.codfw.wmnet
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 50%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15878 and previous config saved to /var/cache/conftool/dbconfig/20210510-081033-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1146:3312 (re)pooling @ 25%: Repool db1146:3312', diff saved to https://phabricator.wikimedia.org/P15877 and previous config saved to /var/cache/conftool/dbconfig/20210510-075529-root.json
  • 07:38 hashar: Restarted CI Jenkins # T281737
  • 06:37 elukey: apt-get clean on rpki1001 to free some space
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1146:3312 for schema change', diff saved to https://phabricator.wikimedia.org/P15876 and previous config saved to /var/cache/conftool/dbconfig/20210510-063254-marostegui.json
  • 06:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 100%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15875 and previous config saved to /var/cache/conftool/dbconfig/20210510-063121-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 75%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15874 and previous config saved to /var/cache/conftool/dbconfig/20210510-061617-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 50%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15873 and previous config saved to /var/cache/conftool/dbconfig/20210510-060113-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1129 (re)pooling @ 25%: Repool db1129', diff saved to https://phabricator.wikimedia.org/P15872 and previous config saved to /var/cache/conftool/dbconfig/20210510-054610-root.json
  • 05:13 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1082 from dbctl T281794', diff saved to https://phabricator.wikimedia.org/P15871 and previous config saved to /var/cache/conftool/dbconfig/20210510-051334-marostegui.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129 for schema change', diff saved to https://phabricator.wikimedia.org/P15870 and previous config saved to /var/cache/conftool/dbconfig/20210510-050727-marostegui.json

2021-05-09

  • 21:44 legoktm: restarted mailman3 again (T282348) pymysql.err.InternalError: (1205, 'Lock wait timeout exceeded; try restarting transaction')
  • 18:28 legoktm: systemctl restart mailman3, bounce runner died again (T282348)
  • 10:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 180 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: T275605
  • 10:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 180 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: T275605
  • 09:16 legoktm: mailman3 live hacked patch at https://phabricator.wikimedia.org/T282348#7072358 to fix bounce queue
  • 06:21 legoktm: restarting mailman3 service, bounce runner died
  • 04:27 Amir1: starting upgrade of batch H of mailing lists (T280322)

2021-05-08

  • 17:18 Amir1: starting upgrade of batch G of mailing lists (T280322)

2021-05-07

  • 21:40 legoktm: deleted education@ from MM3, didn't import properly
  • 21:35 legoktm: deleted festivalsommer-teilnehmer from MM3, didn't import properly
  • 21:33 legoktm: fixed owner for wdqs-gui-build list
  • 19:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:42 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 18:55 legoktm: deleted daily-article-l from mailman3 after failed import
  • 18:33 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
  • 18:28 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 18:27 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 18:23 brennen: 1.37.0-wmf.4 train status (T281145): blockers appear resolved, going ahead in the interest of not having a split deploy over weekend
  • 17:50 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/cache/LinkBatch.php: Backport: LinkBatch: skip bad input (T282180 T282070) (duration: 01m 06s)
  • 17:25 andrew@deploy1002: Finished deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev (duration: 01m 55s)
  • 17:23 andrew@deploy1002: Started deploy [horizon/deploy@20f479e]: updated trove -> codfw1dev
  • 15:10 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 24s)
  • 15:08 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:03 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 11s)
  • 15:02 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:02 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 26s)
  • 15:00 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 15:00 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 29s)
  • 14:58 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:57 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 22s)
  • 14:56 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:41 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp203[34].codfw.wmnet
  • 14:40 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 01m 19s)
  • 14:38 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 14:38 andrew@deploy1002: Finished deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev (duration: 00m 50s)
  • 14:37 andrew@deploy1002: Started deploy [horizon/deploy@71f273c]: updated trove -> codfw1dev
  • 13:04 Urbanecm: Start server-side upload for 1 video file (T281927)
  • 12:19 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15856 and previous config saved to /var/cache/conftool/dbconfig/20210507-121908-kormat.json
  • 12:04 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15855 and previous config saved to /var/cache/conftool/dbconfig/20210507-120404-kormat.json
  • 11:49 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15854 and previous config saved to /var/cache/conftool/dbconfig/20210507-114859-kormat.json
  • 11:33 kormat@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: reimaged to buster T280751', diff saved to https://phabricator.wikimedia.org/P15853 and previous config saved to /var/cache/conftool/dbconfig/20210507-113355-kormat.json
  • 09:55 dcausse: depooling wdqs1012 T280382, T282222
  • 09:44 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@codfw - T281673
  • 08:50 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica2005.wikimedia.org
  • 08:19 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 08:15 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@eqsin - T281673
  • 08:10 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15849 and previous config saved to /var/cache/conftool/dbconfig/20210507-074725-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15848 and previous config saved to /var/cache/conftool/dbconfig/20210507-073222-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15847 and previous config saved to /var/cache/conftool/dbconfig/20210507-071718-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15846 and previous config saved to /var/cache/conftool/dbconfig/20210507-070214-root.json
  • 06:17 marostegui: Deploy schema change on s2 codfw, lag will appear T266486 T268392 T273360
  • 06:11 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/api/ApiQueryLogEvents.php: fix UBN T282122 (duration: 01m 10s)
  • 06:09 tstarling@deploy1002: Synchronized php-1.37.0-wmf.3/includes/api/ApiQueryLogEvents.php: fix UBN T282122 (duration: 01m 06s)
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 for schema change', diff saved to https://phabricator.wikimedia.org/P15845 and previous config saved to /var/cache/conftool/dbconfig/20210507-055425-marostegui.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 100%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15844 and previous config saved to /var/cache/conftool/dbconfig/20210507-055350-root.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 75%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15842 and previous config saved to /var/cache/conftool/dbconfig/20210507-053847-root.json
  • 05:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 50%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15841 and previous config saved to /var/cache/conftool/dbconfig/20210507-052343-root.json
  • 05:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 T282093', diff saved to https://phabricator.wikimedia.org/P15840 and previous config saved to /var/cache/conftool/dbconfig/20210507-051519-marostegui.json
  • 05:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1130 (re)pooling @ 25%: Repool db1130', diff saved to https://phabricator.wikimedia.org/P15839 and previous config saved to /var/cache/conftool/dbconfig/20210507-050839-root.json
  • 04:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 for schema change', diff saved to https://phabricator.wikimedia.org/P15837 and previous config saved to /var/cache/conftool/dbconfig/20210507-043350-marostegui.json

2021-05-06

  • 23:50 brennen@deploy1002: rebuilt and synchronized wikiversions files: Rollback group1 and group2 to 1.37.0-wmf.3 (T282193)
  • 22:52 legoktm: upgrading mailman3 and hyperkitty on lists1001 (T282092)
  • 22:11 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials/SpecialWatchlist.php: Backport: Reorder tables in SpecialWatchlist (T282181) (duration: 00m 57s)
  • 21:48 legoktm: upgraded mailman3 and hyperkitty on lists1002 (T282092)
  • 21:46 legoktm: uploaded new mailman3 and hyperkitty packages to apt.wm.o (T282092)
  • 21:11 hashar: restarted CI Jenkins due to T281737
  • 19:05 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.37.0-wmf.4
  • 19:04 ejegg: updated fundraising CiviCRM from 8034e47008 to 2052d79248
  • 18:58 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Migrate WikidataCompletionSearchClicks to event platform on all wikis (T282140) (duration: 01m 04s)
  • 18:55 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 338d1df: Wikibase: Use wikidataclient-test dblist for testwikidata localClientDatabases (T282160) (duration: 01m 05s)
  • 18:46 urbanecm@deploy1002: Synchronized wmf-config/Wikibase.php: 7e21cf0: NO-OP: Wikibase: Use wikidataclient dblist directly for repo localClientDatabases (T282160) (duration: 01m 04s)
  • 18:31 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Declare WikidataCompletionSearchClicks stream and migrate on testwiki - T282140 (duration: 01m 06s)
  • 17:59 volans@cumin2001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cumin1001.eqiad.wmnet
  • 17:59 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
  • 17:47 volans@cumin2001: END (FAIL) - Cookbook sre.hosts.remove-downtime (exit_code=99) for cumin1001.eqiad.wmnet
  • 17:47 volans@cumin2001: START - Cookbook sre.hosts.remove-downtime for cumin1001.eqiad.wmnet
  • 17:35 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:33 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:27 bblack@cumin1001: conftool action : set/pooled=no; selector: name=cp203[34].codfw.wmnet
  • 17:20 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:15 volans: upgrade spicerack on cumin* to 0.0.52
  • 17:15 ryankemper: [Elastic] Set `elastic2043` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
  • 17:13 papaul: powerdown ms-be2057 for relocation
  • 17:13 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'proton' for release 'production' .
  • 17:12 volans: uploaded spicerack_0.0.52 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 17:00 papaul: powerdown elastic2058 for relocation
  • 16:43 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@ulsfo - T281673
  • 16:12 papaul: powerdown mc-gp2002 for relocation
  • 16:09 ryankemper: [Elastic] Set `elastic2058` as the only banned node in Cirrussearch Elasticsearch clusters (`elastic2058-production-search-codfw`, `elastic2058-production-search-omega-codfw`, `elastic2058-production-search-psi-codfw`)
  • 15:58 Amir1: starting upgrade of public mailing lists in group d and e (T280322)
  • 15:50 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
  • 15:47 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1012.eqiad.wmnet with reason: REIMAGE
  • 15:42 papaul: powerdown logstash2027 for relocation
  • 15:41 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'zotero' for release 'production' .
  • 15:40 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 15:34 XioNoX: push cloud-gw-transport-eqiad to asw2-b-eqiad and cloudsw
  • 15:33 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 15:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1012.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 15:32 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2003.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 15:31 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'zotero' for release 'staging' .
  • 15:29 cdanis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
  • 15:29 cdanis@cumin1001: START - Cookbook sre.hosts.downtime for 0:05:00 on cumin1001.eqiad.wmnet with reason: quiz
  • 15:26 ryankemper: T280382 [WDQS] Pooled `wdqs1007` and `wdqs2004`
  • 15:26 ryankemper: T280382 `wdqs2004.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 15:26 ryankemper: T280382 `wdqs1007.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 998G 1.5T 40% /srv`
  • 15:20 mvolz@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:16 mvolz@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'citoid' for release 'production' .
  • 15:14 papaul: powerdown ms-be2053 for relocation
  • 15:10 moritzm: imported wmfbackups 0.5+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
  • 15:07 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 105 hosts with reason: T270704
  • 15:06 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 105 hosts with reason: T270704
  • 15:06 mvolz@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'citoid' for release 'staging' .
  • 15:05 moritzm: imported wmfmariadbpy 0.6+deb11u1 for bullseye-wikimedia to apt.wikimedia.org
  • 14:55 papaul: powerdown kafka-main2002 for relocation
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P15833 and previous config saved to /var/cache/conftool/dbconfig/20210506-143002-marostegui.json
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15829 and previous config saved to /var/cache/conftool/dbconfig/20210506-140916-marostegui.json
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 100%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15828 and previous config saved to /var/cache/conftool/dbconfig/20210506-133738-root.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 75%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15827 and previous config saved to /var/cache/conftool/dbconfig/20210506-132234-root.json
  • 13:21 XioNoX: push pfw policies - T281942
  • 13:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 50%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15826 and previous config saved to /var/cache/conftool/dbconfig/20210506-130730-root.json
  • 12:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1144:3315 (re)pooling @ 25%: Repool db1144:3315', diff saved to https://phabricator.wikimedia.org/P15825 and previous config saved to /var/cache/conftool/dbconfig/20210506-125226-root.json
  • 11:44 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts eventlog1002.eqiad.wmnet
  • 11:35 mlitn@deploy1002: Synchronized wmf-config: Config: Enable Extension:MediaSearch on betacommons (T265939) (duration: 01m 06s)
  • 11:34 mlitn@deploy1002: sync-file aborted: Config: Enable Extension:MediaSearch on betacommons (T265939) (duration: 00m 56s)
  • 11:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
  • 11:31 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: REIMAGE
  • 11:30 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
  • 11:28 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts eventlog1002.eqiad.wmnet
  • 11:27 hnowlan@cumin1001: START - Cookbook sre.hosts.decommission for hosts eventlog1002.eqiad.wmnet
  • 11:23 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Enable ReferencePreviews as full default on pilot wikis (T271206) (duration: 01m 06s)
  • 11:22 wmde-fisch@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: Enable ReferencePreviews as full default on pilot wikis (T271206) (duration: 01m 06s)
  • 11:12 kormat@cumin1001: dbctl commit (dc=all): 'db1173 depooling: Reimage to buster T280751', diff saved to https://phabricator.wikimedia.org/P15824 and previous config saved to /var/cache/conftool/dbconfig/20210506-111256-kormat.json
  • 11:12 kormat: reimaging db1173 to buster T280751
  • 10:59 volans: upgrading spicerack on cumin hosts to 0.0.51-1
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1144:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15823 and previous config saved to /var/cache/conftool/dbconfig/20210506-105909-marostegui.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15822 and previous config saved to /var/cache/conftool/dbconfig/20210506-105850-root.json
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15821 and previous config saved to /var/cache/conftool/dbconfig/20210506-104346-root.json
  • 10:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15820 and previous config saved to /var/cache/conftool/dbconfig/20210506-102842-root.json
  • 10:19 jynus: stop dbprov2002 in advance of maintenance T281135
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repool db1110', diff saved to https://phabricator.wikimedia.org/P15819 and previous config saved to /var/cache/conftool/dbconfig/20210506-101339-root.json
  • 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 09:55 akosiaris@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'external' .
  • 09:50 akosiaris@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'linkrecommendation' for release 'internal' .
  • 09:45 akosiaris@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'linkrecommendation' for release 'staging' .
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 for schema change', diff saved to https://phabricator.wikimedia.org/P15818 and previous config saved to /var/cache/conftool/dbconfig/20210506-092217-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 100%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15817 and previous config saved to /var/cache/conftool/dbconfig/20210506-091818-root.json
  • 09:03 elukey: sudo apt-get remove linux-image-4.19.0-11-amd64 linux-image-4.19.0-9-amd64 linux-image-4.19.0-13-amd64 on ping[123]001 host to free some space (tiny root partition, these are old kernels)
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 75%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15816 and previous config saved to /var/cache/conftool/dbconfig/20210506-090315-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 50%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15815 and previous config saved to /var/cache/conftool/dbconfig/20210506-084811-root.json
  • 08:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087 db1167', diff saved to https://phabricator.wikimedia.org/P15814 and previous config saved to /var/cache/conftool/dbconfig/20210506-084754-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 and db1167 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15813 and previous config saved to /var/cache/conftool/dbconfig/20210506-084443-marostegui.json
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15812 and previous config saved to /var/cache/conftool/dbconfig/20210506-083910-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3315 (re)pooling @ 25%: Repool db1096:3315', diff saved to https://phabricator.wikimedia.org/P15811 and previous config saved to /var/cache/conftool/dbconfig/20210506-083307-root.json
  • 08:27 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1007.eqiad.wmnet
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15810 and previous config saved to /var/cache/conftool/dbconfig/20210506-082406-root.json
  • 08:23 moritzm: imported wikimedia-lvs-realserver to apt.wikimedia.org/bullseye T275873
  • 08:18 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1007.eqiad.wmnet
  • 08:16 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1006.eqiad.wmnet
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15809 and previous config saved to /var/cache/conftool/dbconfig/20210506-080902-root.json
  • 08:06 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1006.eqiad.wmnet
  • 08:04 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1005.eqiad.wmnet
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315 for schema change', diff saved to https://phabricator.wikimedia.org/P15808 and previous config saved to /var/cache/conftool/dbconfig/20210506-075416-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repool db1160', diff saved to https://phabricator.wikimedia.org/P15807 and previous config saved to /var/cache/conftool/dbconfig/20210506-075359-root.json
  • 07:47 jynus: shutting down and removing db2098:s3 instance
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1160 for schema change', diff saved to https://phabricator.wikimedia.org/P15806 and previous config saved to /var/cache/conftool/dbconfig/20210506-074746-marostegui.json
  • 07:45 ariel@cumin1001: START - Cookbook sre.hosts.decommission for hosts snapshot1005.eqiad.wmnet
  • 07:29 vgutierrez: Enforce Puppet Internal CA validation on trafficserver@cp[4026,4032] - T281673
  • 07:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 07:24 moritzm: installing exim security updates on bullseye hosts
  • 07:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15805 and previous config saved to /var/cache/conftool/dbconfig/20210506-064020-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 100%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15804 and previous config saved to /var/cache/conftool/dbconfig/20210506-062931-root.json
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15803 and previous config saved to /var/cache/conftool/dbconfig/20210506-062915-root.json
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15802 and previous config saved to /var/cache/conftool/dbconfig/20210506-062516-root.json
  • 06:20 elukey: apt-get clean on ping[1,2,3]001 to free some space
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 75%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15801 and previous config saved to /var/cache/conftool/dbconfig/20210506-061427-root.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15800 and previous config saved to /var/cache/conftool/dbconfig/20210506-061411-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15799 and previous config saved to /var/cache/conftool/dbconfig/20210506-061012-root.json
  • 06:01 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 06:00 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 06:00 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:59 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 50%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15798 and previous config saved to /var/cache/conftool/dbconfig/20210506-055923-root.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15797 and previous config saved to /var/cache/conftool/dbconfig/20210506-055907-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1083 T281445', diff saved to https://phabricator.wikimedia.org/P15796 and previous config saved to /var/cache/conftool/dbconfig/20210506-055535-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repool db1112 after checking its tables', diff saved to https://phabricator.wikimedia.org/P15795 and previous config saved to /var/cache/conftool/dbconfig/20210506-055509-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1079 (re)pooling @ 25%: Repool db1079', diff saved to https://phabricator.wikimedia.org/P15794 and previous config saved to /var/cache/conftool/dbconfig/20210506-054419-root.json
  • 05:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15793 and previous config saved to /var/cache/conftool/dbconfig/20210506-054404-root.json
  • 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:43 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:38 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079 and db1158 to switch sanitarium masters', diff saved to https://phabricator.wikimedia.org/P15792 and previous config saved to /var/cache/conftool/dbconfig/20210506-053801-marostegui.json
  • 05:38 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1011.eqiad.wmnet --dest wdqs1007.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 05:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2008.codfw.wmnet --dest wdqs2004.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 05:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:32 tstarling@deploy1002: Synchronized php-1.37.0-wmf.4/includes/page/PageReferenceValue.php: fixing T282070 RC/log breakage due to unblocking autoblocks (duration: 01m 09s)
  • 05:27 effie: upgrade scap to 3.17.1-1 - T279695
  • 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
  • 03:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
  • 03:53 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2004.codfw.wmnet with reason: REIMAGE
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1007.eqiad.wmnet with reason: REIMAGE
  • 03:38 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1007.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:38 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs2004.codfw.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 03:18 ryankemper: [Elastic] `elastic2043` is ssh unreachable. Power cycling it to bring it briefly back online - if it has the shard it should be able to repair the cluster state. Otherwise I'll have to delete the index for `enwiki_titlesuggest_1620184482` given the data would be unrecoverable
  • 03:08 ryankemper: [Elastic] `ryankemper@elastic2044:~$ curl -H 'Content-Type: application/json' -XPUT http://localhost:9200/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude":{"_host": null,"_name": null}'`}}
  • 03:08 ryankemper: [Elastic] Temporarily unbanning `elastic2033` and `elastic2043` from `production-search-codfw` to see if we can get the cluster green again. If it returns to green then we'll ban one node, wait for the shards to redistribute, and then ban the other
  • 03:06 ryankemper: [Elastic] I banned two nodes simultaneously earlier today - if there's an index with only 1 replica, and its primary and replica happened to be on the two nodes I banned, then that would have caused this situation
  • 03:04 ryankemper: [Elastic] It looks like we've got a single missing shard in `production-search-codfw` (port 9200), which is putting the cluster into red status. The cluster won't get back into green status without intervention
  • 02:56 ryankemper@cumin1001: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 02:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 00:35 Amir1: sudo service mailman3-web restart

2021-05-05

  • 23:35 ryankemper: T281621 T281327 [Elastic] Banned `elastic2033` and `elastic2043` from the Cirrussearch Elasticsearch clusters
  • 23:10 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GlobalWatchlist/modules/SpecialGlobalWatchlist.display.css: 4947241: Fix centering of as-of label (duration: 01m 08s)
  • 22:13 mutante: welcome new deployer derick - user created on deploy1002 and bastions (T281564)
  • 22:05 mutante: pushing puppet run on all bastion hosts
  • 21:45 mutante: mailing lists: approved Alangi Derick's pending request for membership in ops mailing list (is becoming deployer) T281309
  • 21:37 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/CentralAuth/includes/CentralAuthUser.php: 52b134e: Cross-wiki block should pass correct wiki blocker (T281972) (duration: 01m 09s)
  • 21:34 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/CentralAuth/includes/CentralAuthUser.php: 6526884: Cross-wiki block should pass correct wiki blocker (T281972) (duration: 01m 08s)
  • 21:32 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.4/includes/user/UserIdentityValue.php: f189c46: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 01m 09s)
  • 21:30 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/includes/user/UserIdentityValue.php: 8ffb52d: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 01m 11s)
  • 21:29 urbanecm@deploy1002: sync-file aborted: 8ffb52d: UserIdentityValue: Introduce convenience static factory methods (T281972) (duration: 00m 04s)
  • 20:37 ejegg: updated email preferences wiki (donorwiki) from d449599540 to 9f51ace546
  • 20:36 ejegg: updated payments-wiki from d449599540 to 9f51ace546
  • 20:20 ejegg: updated email preferences wiki (donorwiki) from a232fc3438 to d449599540
  • 19:59 jbond42: re-enable puppet post 685485
  • 19:53 jbond42: disable puppet: rolling out change (685485) which affects all hosts
  • 19:21 brennen@deploy1002: Synchronized php: group1 wikis to 1.37.0-wmf.4 (duration: 01m 07s)
  • 19:19 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.4
  • 19:16 jbond42: ignore the last log message will wait for deploy to finish
  • 19:16 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/tests/phpunit/includes: Backport: Fix order of joins in SpecialRecentChanges (T281981) (duration: 01m 10s)
  • 19:16 jbond42: disable puppet: rolling out change (685485) which affects all hosts
  • 19:14 brennen@deploy1002: Synchronized php-1.37.0-wmf.4/includes/specials: Backport: Fix order of joins in SpecialRecentChanges (T281981) (duration: 01m 08s)
  • 19:10 Amir1: starting migration of public mailing lists in group b and c to mailman3 (T280322)
  • 19:01 brennen: 1.37.0-wmf.4 train status (T281145): deploying patch for T282038 and then rolling forward to group1.
  • 18:59 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[46].eqsin.wmnet
  • 18:50 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp501[35].eqsin.wmnet
  • 18:43 tgr_: Morning deploys done
  • 18:43 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: Prevent edit notices from appearing (T281960) (duration: 01m 08s)
  • 18:42 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/modules/homepage/addlink/AddLinkArticleTarget.js: Backport: Prevent edit notices from appearing (T281960) (duration: 01m 08s)
  • 18:40 tgr@deploy1002: Synchronized wmf-config/flaggedrevs.php: Config: flaggedrevs.php: Use MediaWikiServices, not an extension function (duration: 01m 08s)
  • 18:34 tgr@deploy1002: Synchronized php-1.37.0-wmf.4/extensions/Popups/includes: Backport: Enable Reference Previews for more users (T271206) (duration: 01m 08s)
  • 18:33 tgr@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/includes: Backport: Enable Reference Previews for more users (T271206) (duration: 01m 11s)
  • 18:24 tgr@deploy1002: Synchronized wmf-config/ProductionServices.php: Config: replace mwlog1001 with new mwlog[12]002 hosts (T224565) (duration: 01m 24s)
  • 17:59 bblack@cumin1001: conftool action : set/weight=100; selector: name=cp501[3456].eqsin.wmnet,service=ats-be
  • 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=ats-tls
  • 17:59 bblack@cumin1001: conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=varnish-fe
  • 17:59 mutante: adding a systemd timer to all thumbor servers that writes output of fc-list command into /srv/fc-list/fc-list (T280718)
  • 17:58 XioNoX: push pfw policies - T281942
  • 17:10 ejegg: updated standalone SmashPig deploy from 250a8570d1 to be272c02ce
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 100%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15786 and previous config saved to /var/cache/conftool/dbconfig/20210505-155453-root.json
  • 15:43 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga2001.wikimedia.org
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 75%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15785 and previous config saved to /var/cache/conftool/dbconfig/20210505-153949-root.json
  • 15:25 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga2001.wikimedia.org
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 50%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15784 and previous config saved to /var/cache/conftool/dbconfig/20210505-152445-root.json
  • 15:23 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts icinga1001.wikimedia.org
  • 15:11 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts icinga1001.wikimedia.org
  • 15:10 herron: decommissioning icinga[12]001 hosts T279601 T279602
  • 15:10 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 T280751
  • 15:10 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: Table check on db2129 T280751
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 30%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15783 and previous config saved to /var/cache/conftool/dbconfig/20210505-150942-root.json
  • 14:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 20%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15782 and previous config saved to /var/cache/conftool/dbconfig/20210505-145438-root.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15781 and previous config saved to /var/cache/conftool/dbconfig/20210505-144431-root.json
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 10%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15780 and previous config saved to /var/cache/conftool/dbconfig/20210505-143934-root.json
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15779 and previous config saved to /var/cache/conftool/dbconfig/20210505-142927-root.json
  • 14:28 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 10 hosts with reason: Reimage db2129 T280751
  • 14:28 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 10 hosts with reason: Reimage db2129 T280751
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1126 (re)pooling @ 5%: Repool db1126', diff saved to https://phabricator.wikimedia.org/P15778 and previous config saved to /var/cache/conftool/dbconfig/20210505-142431-root.json
  • 14:19 kormat@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
  • 14:18 marostegui: Upgrade kernel and enable report_host on db1126
  • 14:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 to enable report_host', diff saved to https://phabricator.wikimedia.org/P15777 and previous config saved to /var/cache/conftool/dbconfig/20210505-141735-marostegui.json
  • 14:17 kormat@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2129.codfw.wmnet with reason: REIMAGE
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15776 and previous config saved to /var/cache/conftool/dbconfig/20210505-141423-root.json
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: Repool db1165', diff saved to https://phabricator.wikimedia.org/P15775 and previous config saved to /var/cache/conftool/dbconfig/20210505-135920-root.json
  • 13:58 kevinbazira@deploy1002: Finished deploy [ores/deploy@5612f30]: Regular ORES Deployment T278723 (duration: 16m 47s)
  • 13:48 wmde-fisch@deploy1002: Synchronized wmf-config/CommonSettings.php: Config: Revert "Enable ReferencePreviews on first wikis CommonSettings" () (duration: 02m 08s)
  • 13:41 kevinbazira@deploy1002: Started deploy [ores/deploy@5612f30]: Regular ORES Deployment T278723
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1165 for schema change', diff saved to https://phabricator.wikimedia.org/P15774 and previous config saved to /var/cache/conftool/dbconfig/20210505-133259-marostegui.json
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15773 and previous config saved to /var/cache/conftool/dbconfig/20210505-133202-root.json
  • 13:19 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 9 hosts with reason: Reimage db2129 T280751
  • 13:19 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 9 hosts with reason: Reimage db2129 T280751
  • 13:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15772 and previous config saved to /var/cache/conftool/dbconfig/20210505-131658-root.json
  • 13:12 kormat: reimaging db2129 to buster T280751
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15771 and previous config saved to /var/cache/conftool/dbconfig/20210505-130155-root.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repool db1180', diff saved to https://phabricator.wikimedia.org/P15770 and previous config saved to /var/cache/conftool/dbconfig/20210505-124651-root.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180 for schema change', diff saved to https://phabricator.wikimedia.org/P15769 and previous config saved to /var/cache/conftool/dbconfig/20210505-122351-marostegui.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15768 and previous config saved to /var/cache/conftool/dbconfig/20210505-121353-root.json
  • 12:01 moritzm: installing exim security updates on stretch
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15767 and previous config saved to /var/cache/conftool/dbconfig/20210505-115849-root.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15765 and previous config saved to /var/cache/conftool/dbconfig/20210505-114345-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repool db1173', diff saved to https://phabricator.wikimedia.org/P15764 and previous config saved to /var/cache/conftool/dbconfig/20210505-112842-root.json
  • 11:25 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: 3565427: Enable ReferencePreviews on first wikis (T271206; 2/2) (duration: 01m 10s)
  • 11:24 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 4f3051b: Enable ReferencePreviews on first wikis (T271206; 1/2) (duration: 01m 20s)
  • 11:17 urbanecm@deploy1002: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 289dc34: Enable new language button for all logged in users outside test projects (T280526) (duration: 02m 24s)
  • 10:29 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 09:54 hashar: Restarted Zuul / CI
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15762 and previous config saved to /var/cache/conftool/dbconfig/20210505-094945-root.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 100%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15761 and previous config saved to /var/cache/conftool/dbconfig/20210505-094005-root.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15760 and previous config saved to /var/cache/conftool/dbconfig/20210505-093441-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 80%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15759 and previous config saved to /var/cache/conftool/dbconfig/20210505-092501-root.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15758 and previous config saved to /var/cache/conftool/dbconfig/20210505-091938-root.json
  • 09:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 70%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15757 and previous config saved to /var/cache/conftool/dbconfig/20210505-090957-root.json
  • 09:08 hashar: Upgraded Jenkins ldap plugin from 1.26 to 2.6 # T281737
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repool db1168', diff saved to https://phabricator.wikimedia.org/P15756 and previous config saved to /var/cache/conftool/dbconfig/20210505-090434-root.json
  • 08:55 hashar: Restarting CI Jenkins # T281737
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 60%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15755 and previous config saved to /var/cache/conftool/dbconfig/20210505-085454-root.json
  • 08:50 jgiannelos@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:47 jgiannelos@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'mobileapps' for release 'production' .
  • 08:41 jgiannelos@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'mobileapps' for release 'staging' .
  • 08:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 50%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15754 and previous config saved to /var/cache/conftool/dbconfig/20210505-083950-root.json
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 for schema change', diff saved to https://phabricator.wikimedia.org/P15753 and previous config saved to /var/cache/conftool/dbconfig/20210505-083810-marostegui.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P15752 and previous config saved to /var/cache/conftool/dbconfig/20210505-082609-marostegui.json
  • 08:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 35%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15751 and previous config saved to /var/cache/conftool/dbconfig/20210505-082446-root.json
  • 08:13 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org buster-wikimedia
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 30%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15750 and previous config saved to /var/cache/conftool/dbconfig/20210505-080942-root.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 25%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15749 and previous config saved to /var/cache/conftool/dbconfig/20210505-075438-root.json
  • 07:53 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 20%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15748 and previous config saved to /var/cache/conftool/dbconfig/20210505-073934-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15747 and previous config saved to /var/cache/conftool/dbconfig/20210505-073722-marostegui.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 100%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15746 and previous config saved to /var/cache/conftool/dbconfig/20210505-073653-root.json
  • 07:35 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 07:35 moritzm: rolling restart of cassandra in eqiad to pick up Java security updates
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15745 and previous config saved to /var/cache/conftool/dbconfig/20210505-073416-root.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 100%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15744 and previous config saved to /var/cache/conftool/dbconfig/20210505-073223-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 15%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15743 and previous config saved to /var/cache/conftool/dbconfig/20210505-072431-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 75%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15742 and previous config saved to /var/cache/conftool/dbconfig/20210505-072149-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15741 and previous config saved to /var/cache/conftool/dbconfig/20210505-071912-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 75%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15740 and previous config saved to /var/cache/conftool/dbconfig/20210505-071720-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1082 T281794', diff saved to https://phabricator.wikimedia.org/P15739 and previous config saved to /var/cache/conftool/dbconfig/20210505-071132-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 10%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15738 and previous config saved to /var/cache/conftool/dbconfig/20210505-070927-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 50%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15737 and previous config saved to /var/cache/conftool/dbconfig/20210505-070646-root.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15736 and previous config saved to /var/cache/conftool/dbconfig/20210505-070409-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 50%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15735 and previous config saved to /var/cache/conftool/dbconfig/20210505-070216-root.json
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 5%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15734 and previous config saved to /var/cache/conftool/dbconfig/20210505-065423-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1096:3316 (re)pooling @ 25%: Repool db1096:3316 after schema change', diff saved to https://phabricator.wikimedia.org/P15733 and previous config saved to /var/cache/conftool/dbconfig/20210505-065142-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repool db1156', diff saved to https://phabricator.wikimedia.org/P15732 and previous config saved to /var/cache/conftool/dbconfig/20210505-064905-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1074 (re)pooling @ 25%: Repool db1074', diff saved to https://phabricator.wikimedia.org/P15731 and previous config saved to /var/cache/conftool/dbconfig/20210505-064712-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 and db1156 to switch sanitarium hosts T280492', diff saved to https://phabricator.wikimedia.org/P15730 and previous config saved to /var/cache/conftool/dbconfig/20210505-064204-marostegui.json
  • 06:41 marostegui: Check tables on db1112 (lag might show up on s3 on wiki replicas) T280492
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 3%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15729 and previous config saved to /var/cache/conftool/dbconfig/20210505-063920-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 2%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15728 and previous config saved to /var/cache/conftool/dbconfig/20210505-062416-root.json
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1178 (re)pooling @ 1%: Slowly pool db1178 into s8 T275633', diff saved to https://phabricator.wikimedia.org/P15727 and previous config saved to /var/cache/conftool/dbconfig/20210505-060912-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1178 into dbctl T275633', diff saved to https://phabricator.wikimedia.org/P15726 and previous config saved to /var/cache/conftool/dbconfig/20210505-060814-marostegui.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1104 from API', diff saved to https://phabricator.wikimedia.org/P15725 and previous config saved to /var/cache/conftool/dbconfig/20210505-060636-marostegui.json
  • 06:00 marostegui: Restart mysqld on x1 database primary master (db1103) T281212
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3311 into main traffic', diff saved to https://phabricator.wikimedia.org/P15724 and previous config saved to /var/cache/conftool/dbconfig/20210505-053841-marostegui.json
  • 05:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1106 into s1 vslow, remove db1099:3311', diff saved to https://phabricator.wikimedia.org/P15723 and previous config saved to /var/cache/conftool/dbconfig/20210505-053211-marostegui.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 for schema change', diff saved to https://phabricator.wikimedia.org/P15722 and previous config saved to /var/cache/conftool/dbconfig/20210505-052943-marostegui.json
  • 04:53 eileen: civicrm revision changed from e7c610fd87 to 8034e47008, config revision is 189788d452
  • 03:58 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_codfw "codfw reboot" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563` on `ryankemper@cumin1001` tmux session `elastic_restarts`
  • 03:58 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw reboot - ryankemper@cumin1001 - T280563
  • 03:56 ryankemper: T280563 Reboot of `eqiad` complete. Only ~half of `codfw` is remaining.
  • 03:56 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:54 ryankemper: T280382 `wdqs1011.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 03:52 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:51 ryankemper: T280382 [WDQS] `ryankemper@wdqs2007:~$ sudo depool` (need to monitor host to see if it becomes ssh unreachable again or if it was a one-off; also high update lag)
  • 03:50 ryankemper: T280382 `wdqs2007.codfw.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/mapper/vg0-srv 2.7T 998G 1.6T 39% /srv`
  • 03:07 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 02:59 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 01:55 ryankemper: T281327 [Elastic] Unbanned `elastic2043` from cluster
  • 01:50 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:49 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage` (will likely fail due to underlying hw but we'll see)
  • 01:47 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 01:45 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:45 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:43 ryankemper: T280382 [WDQS] `racadm>>racadm serveraction powercycle` on `wdqs2007`
  • 01:39 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs1006.eqiad.wmnet --dest wdqs1011.eqiad.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 01:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 01:36 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 00:29 eileen: civicrm revision changed from 94e321dbe0 to e7c610fd87, config revision is 189788d452
  • 00:15 ejegg: updated payments-wiki from 44570561f2 to d449599540
  • 00:08 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 3f6ea8c: Growth: enwiki: Add list of mentors (T281896) (duration: 01m 10s)
  • 00:00 urbanecm@deploy1002: Synchronized fc-list: 9397049: update fc-list to current version on buster (T79424) (duration: 01m 09s)

2021-05-04

  • 23:41 urbanecm@deploy1002: Synchronized wmf-config/config/enwiki.yaml: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 3/3) (duration: 01m 09s)
  • 23:40 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 2/3) (duration: 01m 09s)
  • 23:38 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d29dbb2: Enable Growth features on enwiki in the dark mode (T281896; 1/3) (duration: 01m 09s)
  • 23:31 urbanecm@deploy1002: Synchronized wmf-config/config/bgwiki.yaml: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 3/3) (duration: 01m 09s)
  • 23:30 urbanecm@deploy1002: sync-file aborted: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 3/3) (duration: 00m 03s)
  • 23:30 urbanecm@deploy1002: Synchronized dblists/growthexperiments.dblist: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 2/3) (duration: 01m 09s)
  • 23:28 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 5b4c516: Enable Growth team features in dark mode on bgwiki (T280824; 1/3) (duration: 01m 09s)
  • 23:26 Urbanecm: Create tables for GrowthExperiments extension on enwiki (T281896)
  • 23:24 Urbanecm: Create tables for GrowthExperiments extension on bgwiki (T280824)
  • 23:22 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: a3c24f3: Avoid using User::getGroups() and ::getEffectiveGroups() (T281823) (duration: 01m 10s)
  • 23:13 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: e467d92: Add extendedconfirmed on ptwiki (T281926) (duration: 01m 10s)
  • 23:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 012d613: Add extendedconfirmed on azwiki (T281860) (duration: 01m 10s)
  • 22:49 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 22:47 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 22:46 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 22:44 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: REIMAGE
  • 22:44 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: REIMAGE
  • 22:42 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: REIMAGE
  • 21:30 eileen: civicrm revision changed from 33a63d5789 to 94e321dbe0, config revision is a212d6ab23
  • 21:17 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4 (duration: 03m 55s)
  • 21:13 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@06a4a3e]: Bump glent to 0.2.4
  • 20:13 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:10 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 20:09 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7] (duration: 05m 16s)
  • 20:04 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (hadoop-test): Regular analytics weekly train HADOOP-TEST [analytics/refinery@0dc3ae7]
  • 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7] (duration: 00m 07s)
  • 20:03 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7] (thin): Regular analytics weekly train THIN [analytics/refinery@0dc3ae7]
  • 20:03 joal@deploy1002: Finished deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7] (duration: 17m 15s)
  • 19:46 joal@deploy1002: Started deploy [analytics/refinery@0dc3ae7]: Regular analytics weekly train [analytics/refinery@0dc3ae7]
  • 19:38 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.37.0-wmf.4
  • 17:58 dancy@deploy1002: Finished scap: testwikis wikis to 1.37.0-wmf.4 (duration: 42m 33s)
  • 17:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead (duration: 01m 46s)
  • 17:24 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@0c4538f]: Increase convert_to_esbulk memory overhead
  • 17:16 dancy@deploy1002: Started scap: testwikis wikis to 1.37.0-wmf.4
  • 17:03 brennen: 1.37.0-wmf.4 was branched at f069fd8 for T281145
  • 17:00 volans: uploaded spicerack_0.0.51 to apt.wikimedia.org bullseye-wikimedia
  • 16:26 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead (duration: 01m 54s)
  • 16:25 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@e6ae572]: Increase convert_to_esbulk memory overhead
  • 16:16 dzahn@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:15 dzahn@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:13 mutante: k8s: upgrading release=namespaces, helmfile apply to create miscweb namespace T281538
  • 16:13 dzahn@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 16:12 hnowlan@deploy1002: helmfile [eqiad] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:12 dzahn@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 16:07 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 15:59 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 14:41 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 13:46 moritzm: installing exim security updates on buster
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15721 and previous config saved to /var/cache/conftool/dbconfig/20210504-133950-root.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15720 and previous config saved to /var/cache/conftool/dbconfig/20210504-132446-root.json
  • 13:14 moritzm: upgrading linux-libc-dev on buster hosts (to version introduced by 10.9 point release)
  • 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 13:12 hnowlan@deploy1002: helmfile [codfw] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 50%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15719 and previous config saved to /var/cache/conftool/dbconfig/20210504-130943-root.json
  • 13:01 moritzm: installing debian-archive-keyring updates on buster
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Repool db1137 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15718 and previous config saved to /var/cache/conftool/dbconfig/20210504-125439-root.json
  • 12:50 marostegui: Upgrade mysql and kernel on db1137 T281212
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1137 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15717 and previous config saved to /var/cache/conftool/dbconfig/20210504-124937-marostegui.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15716 and previous config saved to /var/cache/conftool/dbconfig/20210504-124848-root.json
  • 12:46 kormat@cumin1001: dbctl commit (dc=all): 'Repooling after sanitarium master switch T280751', diff saved to https://phabricator.wikimedia.org/P15715 and previous config saved to /var/cache/conftool/dbconfig/20210504-124647-kormat.json
  • 12:35 kormat@cumin1001: dbctl commit (dc=all): 'Depooling for sanitarium master switch T280751', diff saved to https://phabricator.wikimedia.org/P15714 and previous config saved to /var/cache/conftool/dbconfig/20210504-123537-kormat.json
  • 12:34 kormat@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 T280751
  • 12:34 kormat@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Replace db1085 with db1165 T280751
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15713 and previous config saved to /var/cache/conftool/dbconfig/20210504-123344-root.json
  • 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'staging' .
  • 12:27 hnowlan@deploy1002: helmfile [staging] Ran 'sync' command on namespace 'api-gateway' for release 'production' .
  • 12:22 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 683b876: 5763630: GrowthExperiments: Rename control variant to control, GrowthExperiments: Set linkrecommendation variant to 0 (T281727) (duration: 00m 58s)
  • 12:20 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments/: 8f938c2: c8c07ab: GrowthExperiments backports (T281727) (duration: 00m 59s)
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15712 and previous config saved to /var/cache/conftool/dbconfig/20210504-121841-root.json
  • 12:08 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repool db1120 after mysql upgrade', diff saved to https://phabricator.wikimedia.org/P15711 and previous config saved to /var/cache/conftool/dbconfig/20210504-120337-root.json
  • 11:58 marostegui: Upgrade mysql and kernel on db1120 T281212
  • 11:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 to upgrade its mysql T281212', diff saved to https://phabricator.wikimedia.org/P15710 and previous config saved to /var/cache/conftool/dbconfig/20210504-115634-marostegui.json
  • 11:40 jmm@cumin2001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0)
  • 11:31 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] );` on arwiki, bnwiki, viwiki (T278710, T281703)
  • 11:27 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 87dff0b: GrowthExperiments: Enable link recommendations for target wikis (T278710) (duration: 00m 57s)
  • 11:10 Urbanecm: Create growthexperiments_link_recommendations and growthexperiments_link_submissions on arwiki,bnwiki,viwiki x1 (T266913)
  • 11:06 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 8228f6b: Disable ContentTranslation New article campaign in fiwiki (T277473) (duration: 00m 59s)
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15707 and previous config saved to /var/cache/conftool/dbconfig/20210504-102649-root.json
  • 10:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15705 and previous config saved to /var/cache/conftool/dbconfig/20210504-101145-root.json
  • 09:57 moritzm: installing bind9 security updates on buster (client side tools/libs only)
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15704 and previous config saved to /var/cache/conftool/dbconfig/20210504-095642-root.json
  • 09:45 godog: +50G for prometheus k8s in codfw
  • 09:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repool db1167', diff saved to https://phabricator.wikimedia.org/P15703 and previous config saved to /var/cache/conftool/dbconfig/20210504-094138-root.json
  • 09:04 jmm@cumin2001: START - Cookbook sre.cassandra.roll-restart
  • 09:04 moritzm: rolling restart of cassandra in codfw to pick up Java security updates
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 100%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15702 and previous config saved to /var/cache/conftool/dbconfig/20210504-081716-root.json
  • 08:02 marostegui: Check tables on db1106, lag will show up on s1 on wiki replicas (T280492)
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15701 and previous config saved to /var/cache/conftool/dbconfig/20210504-080213-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 75%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15700 and previous config saved to /var/cache/conftool/dbconfig/20210504-080212-root.json
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 from s1 vslow to get its tables checked and pool db1099:3311 instead T280492', diff saved to https://phabricator.wikimedia.org/P15699 and previous config saved to /var/cache/conftool/dbconfig/20210504-080206-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15698 and previous config saved to /var/cache/conftool/dbconfig/20210504-074639-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 50%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15697 and previous config saved to /var/cache/conftool/dbconfig/20210504-074632-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15696 and previous config saved to /var/cache/conftool/dbconfig/20210504-073135-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 25%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15695 and previous config saved to /var/cache/conftool/dbconfig/20210504-073127-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repool db1161', diff saved to https://phabricator.wikimedia.org/P15694 and previous config saved to /var/cache/conftool/dbconfig/20210504-071632-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1082 (re)pooling @ 10%: Repool db1082', diff saved to https://phabricator.wikimedia.org/P15693 and previous config saved to /var/cache/conftool/dbconfig/20210504-071623-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 and db1082 to change s5 sanitarium master T280492', diff saved to https://phabricator.wikimedia.org/P15692 and previous config saved to /var/cache/conftool/dbconfig/20210504-071146-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15691 and previous config saved to /var/cache/conftool/dbconfig/20210504-065034-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15690 and previous config saved to /var/cache/conftool/dbconfig/20210504-063530-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15689 and previous config saved to /var/cache/conftool/dbconfig/20210504-062027-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15688 and previous config saved to /var/cache/conftool/dbconfig/20210504-061700-root.json
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15687 and previous config saved to /var/cache/conftool/dbconfig/20210504-060523-root.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15686 and previous config saved to /var/cache/conftool/dbconfig/20210504-060156-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167 to clone db1178 T275633', diff saved to https://phabricator.wikimedia.org/P15684 and previous config saved to /var/cache/conftool/dbconfig/20210504-055116-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repool db1158', diff saved to https://phabricator.wikimedia.org/P15683 and previous config saved to /var/cache/conftool/dbconfig/20210504-055020-root.json
  • 05:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15682 and previous config saved to /var/cache/conftool/dbconfig/20210504-054653-root.json
  • 05:45 marostegui: Stop mysql on db1158 to clone db1178
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158 to clone db1178 T275633', diff saved to https://phabricator.wikimedia.org/P15680 and previous config saved to /var/cache/conftool/dbconfig/20210504-054539-marostegui.json
  • 05:36 marostegui: Deploy schema change on s6 codfw, lag will appear - T266486 T268392 T273360
  • 05:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Repool db1118', diff saved to https://phabricator.wikimedia.org/P15678 and previous config saved to /var/cache/conftool/dbconfig/20210504-053149-root.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15677 and previous config saved to /var/cache/conftool/dbconfig/20210504-052612-root.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15676 and previous config saved to /var/cache/conftool/dbconfig/20210504-051108-root.json
  • 05:07 marostegui: Restart sanitarium hosts to pick up new filters T263817
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15675 and previous config saved to /var/cache/conftool/dbconfig/20210504-045605-root.json
  • 04:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repool db1121', diff saved to https://phabricator.wikimedia.org/P15674 and previous config saved to /var/cache/conftool/dbconfig/20210504-044101-root.json
  • 04:06 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:38 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:38 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 03:36 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 03:35 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 02:09 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
  • 02:07 pt1979@cumin2001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet with reason: REIMAGE
  • 01:41 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563

2021-05-03

  • 23:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 230ef57: Prepare for new configuration option (T277951) (duration: 00m 57s)
  • 23:15 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: 7c47ee1: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958) (duration: 00m 57s)
  • 23:14 urbanecm@deploy1002: sync-file aborted: 7c47ee1: Replace $wgRelatedArticlesFooterWhitelistedSkins (T277958)¨ (duration: 00m 01s)
  • 22:17 legoktm: ran disable_list for: iegcom wikien-l fundraiser spcommittee-private-l spcommittee-l mediation-en-l test-second wikifr-colloque-l
  • 22:14 mutante: [backup1001:~] $ sudo check_bacula.py --icinga
  • 21:56 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 21:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:54 ryankemper: T280563 eqiad reboot failed with: `curator.exceptions.FailedExecution: Exception encountered. Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: ConnectionTimeout caused by - ReadTimeoutError(HTTPSConnectionPool(host='search.svc.eqiad.wmnet', port=9243): Read timed out. (read timeout=10))`
  • 21:52 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:47 ryankemper: T280563 `sudo -i cookbook sre.elasticsearch.rolling-operation search_eqiad "eqiad reboot to apply sec updates" --reboot --nodes-per-run 3 --start-datetime 2021-04-29T23:04:29 --task-id T280563`
  • 21:46 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 21:32 krinkle@deploy1002: Synchronized wmf-config/InitialiseSettings.php: d95b91648 (duration: 00m 58s)
  • 21:27 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
  • 21:25 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: REIMAGE
  • 21:22 ryankemper: [WDQS] `ryankemper@wdqs1003:~$ sudo pool`
  • 21:20 ryankemper: T280382 [WDQS] `ryankemper@puppetmaster1001:~$ sudo confctl select 'name=wdqs1011.eqiad.wmnet' set/pooled=no`
  • 21:19 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1011.eqiad.wmnet
  • 21:09 ryankemper: T280382 `sudo -i wmf-auto-reimage-host -p T280382 wdqs1011.eqiad.wmnet` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:06 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 21:05 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:02 ryankemper: T280382 `wdqs1010.eqiad.wmnet` has been re-imaged and had the appropriate wikidata/categories journal files transferred. `df -h` shows disk space is no longer an issue following the switch to `raid0`: `/dev/md2 2.6T 975G 1.5T 39% /srv`
  • 20:56 ryankemper: T280382 [WDQS] `ryankemper@wdqs2001:~$ sudo run-puppet-agent --force`
  • 20:44 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:42 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:37 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --source wdqs2001.codfw.wmnet --dest wdqs2007.codfw.wmnet --reason "transferring fresh categories journal following reimage" --blazegraph_instance categories` on `ryankemper@cumin1001` tmux session `reimage`
  • 20:37 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:40 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:39 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2007.codfw.wmnet with reason: REIMAGE
  • 19:24 ryankemper: T280382 `sudo -i cookbook sre.wdqs.data-transfer --without-lvs --source wdqs1003.eqiad.wmnet --dest wdqs1010.eqiad.wmnet --reason "transferring fresh wikidata journal following reimage" --blazegraph_instance blazegraph` on `ryankemper@cumin1001` tmux session `reimage`
  • 19:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:21 ryankemper@puppetmaster1001: conftool action : set/pooled=no; selector: name=wdqs1004.eqiad.wmnet
  • 19:21 ryankemper: T280382 [WDQS] `sudo confctl select 'name=wdqs1004.eqiad.wmnet' set/pooled=no` (`wdqs1004` failed re-image [not sure why yet] and won't let me ssh in to depool so using conftool instead)
  • 18:20 Urbanecm: Morning B&C window done
  • 18:19 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/RelatedArticles/resources/ext.relatedArticles.readMore.bootstrap/index.js: cf9d9da: Hotfix: loadRelatedArticles should consider existence of container element (T281547) (duration: 00m 57s)
  • 18:15 urbanecm@deploy1002: Synchronized wmf-config/filebackend.php: bc1bc90: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 2/2) (duration: 00m 57s)
  • 18:14 urbanecm@deploy1002: Synchronized wmf-config/CommonSettings.php: bc1bc90: NOOP: beta: Use upload.wikimedia.beta.wmflabs.o for uploads (T281650; 1/2) (duration: 00m 58s)
  • 17:44 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 17:20 hashar: Restarting CI Jenkins due to "Gearman worker contint2001.wikimedia.org_manager" thread dieing unexpectedly # T281737
  • 16:30 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation reboot without plugin upgrade (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad reboot to apply sec updates - ryankemper@cumin1001 - T280563
  • 16:29 ryankemper: T281498 `sudo confctl select 'name=wdqs2004.codfw.wmnet' set/pooled=yes:weight=10` after merge of https://gerrit.wikimedia.org/r/c/operations/puppet/+/684435
  • 16:27 ryankemper@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2004.codfw.wmnet
  • 16:19 legoktm: legoktm@lists1001:~$ sudo apt install default-mysql-client # for temporary debugging
  • 15:48 pt1979@cumin2001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 pt1979@cumin2001: START - Cookbook sre.dns.netbox
  • 15:27 Amir1: upgrade group A to mailman3 (T280322)
  • 14:27 volans: uploaded conftool_1.3.1 to apt.wikimedia.org bullseye-wikimedia
  • 13:43 volans: uploaded cumin_4.1.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia
  • 13:10 Urbanecm: Run `User::newSystemUser( 'Maintenance script', [ 'steal' => true ] )` on cswiki to make the user a proper system user (T281703)
  • 12:36 kostajh: Backport window done
  • 12:33 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: Set default variant (T278123) GrowthExperiments: enable link recommendations frontend on cswiki (T278710) (duration: 00m 57s)
  • 12:07 kharlan@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Config: GrowthExperiments: enable link recommendations backend on cswiki (T278710) (duration: 00m 57s)
  • 11:56 kharlan@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/GrowthExperiments: Backport: refreshLinkRecommendations.php: Use per-wiki locks Handle DB readonly errors (T281382) (duration: 00m 58s)
  • 11:15 urbanecm@deploy1002: Synchronized php-1.37.0-wmf.3/extensions/Popups/: a438b64: Fix settings dialog offering ReferencePreviews when unavailable (T281352) (duration: 00m 58s)
  • 11:11 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: c5a7c67: Set wgGEMentorshipMigrationStage to SCHEMA_COMPAT_NEW everywhere (T279853) (duration: 00m 57s)
  • 11:04 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: f1a5ef0: wikidata: post edit constraint jobs on 70% of edits (T204031) (duration: 00m 57s)
  • 10:59 moritzm: installing avahi security updates on buster
  • 10:47 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 57s)
  • 10:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 58s)
  • 09:42 moritzm: installing python3.7 security updates
  • 09:41 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a] (duration: 29m 24s)
  • 09:12 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (hadoop-test): Hotfix analytics deploy (monthly sqoop) HADOOP-TEST [analytics/refinery@584ed6a]
  • 09:10 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a] (duration: 00m 07s)
  • 09:10 joal@deploy1002: Started deploy [analytics/refinery@584ed6a] (thin): Hotfix analytics deploy (monthly sqoop) THIN [analytics/refinery@584ed6a]
  • 09:09 joal@deploy1002: Finished deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a] (duration: 16m 06s)
  • 08:52 joal@deploy1002: Started deploy [analytics/refinery@584ed6a]: Hotfix analytics deploy (monthly sqoop) [analytics/refinery@584ed6a]
  • 08:01 moritzm: installing edk2 security updates
  • 07:31 moritzm: installing libimage-exiftool-perl security updates

2021-05-02

  • 13:40 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host
  • 13:40 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cloudmetrics1002.eqiad.wmnet with reason: Flaky host

2021-05-01

  • 19:12 Urbanecm: Invalidate password for MaraBot@SUL (T281586)
  • 16:58 legoktm@deploy1002: Synchronized logos/config.yaml: Add eswiki 20th anniversary logos (duration: 00m 57s)
  • 16:56 legoktm@deploy1002: Synchronized wmf-config/logos.php: Use eswiki 20th anniversary logos (T280908) (duration: 00m 56s)
  • 16:50 legoktm@deploy1002: Synchronized static/images/project-logos/: Add eswiki 20th anniversary logos (duration: 00m 57s)
  • 07:22 elukey: powercycle elastic2033 - no ssh, no tty available via mgmt


2000s

2010s

2020s