Server Admin Log

From Wikitech
(Redirected from SAL)
Jump to navigation Jump to search

2020-02-15

  • 01:01 cdanis: βœ”οΈ cdanis@an-coord1001.eqiad.wmnet ~ πŸ•—πŸΊ sudo systemctl restart hive-server2.service ; sudo systemctl restart hive-metastore.service

2020-02-14

  • 23:42 XenoRyet: updated civicrm from cf86495d44 to 8c77e9e915
  • 21:01 volker-e@deploy1001: Finished deploy [design/style-guide@1928c00]: Deploy design/style-guide: (duration: 00m 09s)
  • 21:01 volker-e@deploy1001: Started deploy [design/style-guide@1928c00]: Deploy design/style-guide:
  • 20:21 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: Prevent some logspam T245280 (duration: 01m 05s)
  • 19:27 XenoRyet: updated civicrm from 55b2afb6eb to cf86495d44
  • 19:10 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/Wikibase: T245062 Prevent invalid term languages from cached PrefetchingTermLookup (duration: 01m 09s)
  • 17:37 jforrester@deploy1001: Unlocked for deployment [ALL REPOSITORIES]: Testing T245062 fix on mwdebug1001 (duration: 03m 05s)
  • 17:33 jforrester@deploy1001: Locking from deployment [ALL REPOSITORIES]: Testing T245062 fix on mwdebug1001 (planned duration: 60m 00s)
  • 16:11 moritzm: installing git-lfs updates from Buster 10.3 point update
  • 15:55 moritzm: uploaded pypuppetdb 0.3.3-2~wmf+deb10u1 to apt.wikimedia.org
  • 15:55 bblack: (log(n))
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2086:3318 T239453', diff saved to https://phabricator.wikimedia.org/P10414 and previous config saved to /var/cache/conftool/dbconfig/20200214-155443-marostegui.json
  • 15:52 moritzm: uploaded pypuppetdb 0.3.3-2~wmf+deb9u1 to apt.wikimedia.org
  • 15:46 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Resync initialisesetting to try and pick up previoiusly deployed cirrus query routing changes (duration: 01m 05s)
  • 15:42 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:42 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:32 effie: restart mc-gp* for updates
  • 15:17 bd808: Toil reduction: !log messages now work from the SRE team's Freenode channel.
  • 13:50 gehel: restart relforge for JVM upgrade - T245120
  • 10:35 vgutierrez: revert ats 8.0.6-rc0 experiment on cp40[26,32]
  • 10:14 vgutierrez: rolling restart of ats-be to enable TLSv1.3 against origin servers - T170567
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - T242702', diff saved to https://phabricator.wikimedia.org/P10409 and previous config saved to /var/cache/conftool/dbconfig/20200214-093456-marostegui.json
  • 09:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:25 volans: manually absented /usr/local/bin/apt2xml on the 5 hosts with puppet disabled
  • 09:15 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:15 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:12 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:12 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:05 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:46 moritzm: installing 4.19.98 kernel update on Buster systems
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 100 for 10.4 testing - T242702', diff saved to https://phabricator.wikimedia.org/P10408 and previous config saved to /var/cache/conftool/dbconfig/20200214-080600-marostegui.json
  • 06:51 vgutierrez: updating puppet compiler facts
  • 01:27 dpifke@deploy1001: Finished deploy [performance/navtiming@2eec00a]: (no justification provided) (duration: 00m 05s)
  • 01:27 dpifke@deploy1001: Started deploy [performance/navtiming@2eec00a]: (no justification provided)
  • 00:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T245202 cirrus: Move all move_like traffic to codfw (duration: 01m 02s)
  • 00:51 jforrester@deploy1001: Synchronized wmf-config/PoolCounterSettings.php: T245202 cirrus: Increase the pool counter limits a bit (duration: 01m 05s)

2020-02-13

  • 22:13 jeh: running filesystem tests on cloudvirt1024 T241884
  • 21:42 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 21:41 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
  • 21:40 jbond42: refresh facts on compilers
  • 21:38 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 21:37 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
  • 21:35 ottomata: deploying production and canary releases for eventgate-logging-external (and destroying the 'logging-external' release) (safe because eventgate-logging-external is not in use) - T245203
  • 21:29 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'canary' .
  • 21:28 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'production' .
  • 20:33 marxarelli: rollback to group1 due to 500 spike (2k/min) (T233867)
  • 20:32 dduvall@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 20:30 marxarelli: varnish 500 spike. rolling back
  • 20:20 gehel: restarting blazegraph + updater on wdqs2006
  • 20:19 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.19
  • 19:44 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/api/ApiRollback.php: T245159 ApiRollback: Properly deal with UserIdentity (duration: 01m 04s)
  • 19:20 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/resourceloader/ResourceLoaderSkinModule.php: T245182 ResourceLoaderSkinModule: Don't hard-deprecate wgLogoHD just now (duration: 01m 03s)
  • 19:17 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T219534 Add new MLR models for Cirrus on zh/ja/kowiki (duration: 01m 03s)
  • 19:10 moritzm: installing e2fsprogs security updates
  • 18:48 bblack: ns1.wikimedia.org - re-routing back to authdns2001 instead of dns2002 on cr[12]-codfw - T242017
  • 18:38 bblack: authdns2001 - reboot - T242017
  • 18:36 bblack: ns1.wikimedia.org - re-routing from authdns2001 to dns2002 on cr[12]-codfw - T242017
  • 18:09 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: I9d0c8af3c577 (duration: 01m 06s)
  • 18:00 krinkle@deploy1001: Synchronized wmf-config/etcd.php: Iae1f45896 (duration: 01m 06s)
  • 17:59 volans: downtimed mgmt in eqiad for 1h
  • 17:58 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: Iae1f45896 (duration: 01m 08s)
  • 17:49 krinkle@deploy1001: Synchronized wmf-config/etcd.php: Ibfca686f681 (duration: 01m 06s)
  • 17:41 krinkle@deploy1001: Synchronized wmf-config/etcd.php: Iefff596955e (duration: 01m 08s)
  • 17:40 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: Iefff596955e (duration: 01m 06s)
  • 17:35 krinkle@deploy1001: Synchronized wmf-config/etcd.php: I2e4fb0 (duration: 01m 06s)
  • 17:32 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: I2e4fb0 (duration: 01m 06s)
  • 17:10 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: No-op (code style only) deploy sync (duration: 01m 07s)
  • 17:09 jforrester@deploy1001: sync aborted: wmf-config/CommonSettings.php No-op (code style only) deploy sync (duration: 00m 04s)
  • 17:09 jforrester@deploy1001: Started scap: wmf-config/CommonSettings.php No-op (code style only) deploy sync
  • 16:32 robh: ps1-a8-codfw.mgmt.codfw.wmnet firmware upgraded via T245164
  • 16:28 papaul: rebooting elastic2043 for firmware upgrade
  • 16:22 gehel: canceled the restart of elastic2043 - T243715
  • 16:21 gehel: restarting elastic2043 - T243715
  • 16:10 _joe_: depooling/repooling mw1240
  • 16:02 _joe_: pooled mw1238 again
  • 15:59 _joe_: depooling mw1238 for analysis
  • 15:42 vgutierrez: rolling restart of ats-be on esams - T170567
  • 15:38 vgutierrez: disable allow_half_open on ats-tls @ cp4031 - T236458
  • 15:27 vgutierrez: turning on TLSv1.3 between ats-be and applayer in cp30[51-52] - T170567
  • 15:22 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/extensions/WikibaseMediaInfo/resources/: UBN fix: Force non-value to be undefined (duration: 01m 06s)
  • 14:51 vgutierrez: test TLSv1.3 between ats-be and applayer in cp3050 - T170567
  • 14:47 XioNoX: re-image rpki2001 - T244585
  • 14:33 XioNoX: add routinator_0.6.4_amd64.deb to buster-wikimedia apt repo
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - T242702', diff saved to https://phabricator.wikimedia.org/P10405 and previous config saved to /var/cache/conftool/dbconfig/20200213-142735-marostegui.json
  • 14:24 XioNoX: re-enable ping offload in esams - T244584
  • 13:31 XioNoX: disable ping offload in esams - T244584
  • 13:24 XioNoX: re-enable ping offload in eqiad - T244584
  • 13:06 XioNoX: disable ping offload in eqiad - T244584
  • 13:03 XioNoX: re-enable ping offload in codfw - T244584
  • 13:00 vgutierrez: pool cp10[75,76] running buster - T242093
  • 12:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:49 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:49 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:47 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:34 Amir1: EU SWAT is done
  • 12:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Read and write more in the new term store, take II, the cache issue (T219123 T225055) (duration: 01m 03s)
  • 12:29 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Read and write more in the new term store (duration: 01m 03s)
  • 12:29 vgutierrez: depool cp10[75,76] and reimage as buster - T242093
  • 12:28 vgutierrez: pool cp10[77,78] running buster - T242093
  • 12:20 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert: Triple the factor of WDQS lag to maxlag for Wikidata (T244722) (duration: 01m 04s)
  • 12:18 XioNoX: re-image ping2001 to buster - T244584
  • 12:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 1c81925: Create Test Custodians group at Beta Wikiversity (T240438) (duration: 01m 07s)
  • 12:13 XioNoX: disable ping offload in codfw
  • 12:13 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:13 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:13 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0f035e4: Update wgAvailableRights declaration of autoreviewprotected (T230103) (duration: 01m 03s)
  • 12:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 176b0e8: Grant autopatrol to azwiki patrollers (T244338) (duration: 01m 05s)
  • 11:53 vgutierrez: depool cp10[77,78] and reimage as buster - T242093
  • 11:52 vgutierrez: pool cp10[79,80] running buster - T242093
  • 11:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:37 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:18 vgutierrez: rolling upgrade of ATS to version 8.0.5-1wm16 fleet wide - T244464
  • 11:16 vgutierrez: depool cp10[79,80] and reimage as buster - T242093
  • 11:12 ema: A:cp re-enable puppet, leave it to cron to apply wikimedia-common/wikimedia-frontend VCL merge T241239
  • 11:08 vgutierrez: upload trafficserver 8.0.5-1wm16 to apt.wm.o (buster) - T244464
  • 11:02 vgutierrez: pool cp10[81,82] and reimage as buster - T242093
  • 10:59 ema: cp4021 (cache_upload): apply wikimedia-common/wikimedia-frontend VCL merge T241239
  • 10:49 ema: cp4027 (cache_text): apply wikimedia-common/wikimedia-frontend VCL merge T241239
  • 10:46 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:41 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:23 vgutierrez: removing /root/.ssh/known_hosts in cumin1001
  • 10:21 vgutierrez: pool cp10[83,84] running buster - T242093
  • 10:08 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:06 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:06 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:45 vgutierrez: depool cp10[83,84] and reimage as buster - T242093
  • 09:45 vgutierrez: pool cp10[85,86] running buster - T242093
  • 09:10 moritzm: installing Java security updates on elastic* and relforge*
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight for db1107 50 -> 100 for 10.4 testing - T242702', diff saved to https://phabricator.wikimedia.org/P10403 and previous config saved to /var/cache/conftool/dbconfig/20200213-085957-marostegui.json
  • 08:57 gehel: restart elasticsearch on elastic2051 - JVM upgrade
  • 08:21 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:18 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 08:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:57 moritzm: installing Java security updates on Hadoop, Kafka/Jumbo, AQS and Druid canaries
  • 07:57 vgutierrez: depool cp10[85,86] and reimage as buster - T242093
  • 07:53 moritzm: rolling restart of restbase-dev to pick up Java security update
  • 07:49 vgutierrez: pool cp10[87,88] running buster - T242093
  • 07:49 vgutierrez: testing ATS 8.0.5-1wm16 + KA between ats-tls and varnish-fe in cp4031 - T244464
  • 07:47 moritzm: installing Java security updates on stat/SWAP hosts
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 50 for 10.4 testing - T242702', diff saved to https://phabricator.wikimedia.org/P10402 and previous config saved to /var/cache/conftool/dbconfig/20200213-072839-marostegui.json
  • 07:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:21 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:03 vgutierrez: depool cp10[87,88] and reimage as buster - T242093
  • 07:02 vgutierrez: pool cp10[89,90] running buster - T242093
  • 06:49 vgutierrez: pool cp20[02,05] running buster - T242093
  • 06:36 marostegui: Upgrade and compress db1087, this will generate lag on s8 on the wiki replicas - T232446
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for compression - T232446', diff saved to https://phabricator.wikimedia.org/P10401 and previous config saved to /var/cache/conftool/dbconfig/20200213-063535-marostegui.json
  • 06:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1099:3318 into vslow for s8 T239453', diff saved to https://phabricator.wikimedia.org/P10400 and previous config saved to /var/cache/conftool/dbconfig/20200213-063334-marostegui.json
  • 06:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1099:3318, db1099:3311 T239453', diff saved to https://phabricator.wikimedia.org/P10399 and previous config saved to /var/cache/conftool/dbconfig/20200213-063207-marostegui.json
  • 06:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 T239453', diff saved to https://phabricator.wikimedia.org/P10398 and previous config saved to /var/cache/conftool/dbconfig/20200213-062642-marostegui.json
  • 06:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 T239453', diff saved to https://phabricator.wikimedia.org/P10397 and previous config saved to /var/cache/conftool/dbconfig/20200213-062148-marostegui.json
  • 06:19 vgutierrez: testing a new build of ATS 8.0.6 in cp40[26,32]
  • 06:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1099:3318, db1099:3311 T239453', diff saved to https://phabricator.wikimedia.org/P10396 and previous config saved to /var/cache/conftool/dbconfig/20200213-061219-marostegui.json
  • 06:11 vgutierrez: depool cp10[89,90] and reimage as buster - T242093
  • 06:04 vgutierrez: depool cp20[02,05] and reimage as buster - T242093
  • 06:04 vgutierrez: pool cp20[01,08] running buster - T242093
  • 06:02 twentyafterfour: set phabricator read-only to false
  • 06:01 twentyafterfour: set phabricator read-only
  • 06:00 marostegui: Start phabricator maintenance T244566
  • 05:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:53 marostegui: Upgrade db1128 without restarting mysql - T244566
  • 05:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:47 marostegui: Silence m3 hosts for maintenance - T244566
  • 05:38 vgutierrez: depool cp2008 and reimage as buster - T242093
  • 05:37 vgutierrez: pool cp2011 running buster - T242093
  • 05:35 vgutierrez: depool cp2001 and reimage as buster - T242093
  • 05:34 vgutierrez: pool cp2004 running buster - T242093
  • 05:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 05:25 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 05:09 vgutierrez: depool cp20[04,11] and reimage as buster - T242093
  • 03:57 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 03:57 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:54 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 03:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 03:32 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 03:30 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 03:28 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 03:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 03:06 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 03:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 03:02 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 03:00 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 02:44 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 02:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 02:35 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 02:33 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:22 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:20 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:18 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:17 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:10 twentyafterfour: no apparent problems with phabricator upgrade, all done
  • 01:01 twentyafterfour: starting phabricator deploy, momentary downtime expected while apache restarts
  • 00:58 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:56 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 00:54 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 00:45 niharika29@deploy1001: Synchronized wmf-config/throttle.php: Throttle rule for National Gallery of Canada Library and Archives edit-a-thon - T244488 (duration: 01m 07s)
  • 00:36 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:34 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:32 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 00:31 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 00:11 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:08 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 00:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 00:06 pt1979@cumin2001: START - Cookbook sre.hosts.downtime

2020-02-12

  • 23:46 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:44 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:43 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:11 XioNoX: deactivate BGP to office's router1 while it's on maintenance
  • 21:59 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
  • 21:58 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
  • 21:57 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
  • 21:53 chaomodus: restart nagios-nrpe-service on cumin1001 after it had oomed
  • 21:51 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
  • 21:51 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventgate-analytics' for release 'canary' .
  • 21:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
  • 21:18 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'canary' .
  • 21:18 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'production' .
  • 21:10 marxarelli: completed group1 to 1.35.0-wmf.19
  • 21:00 dduvall@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.19 (duration: 01m 03s)
  • 20:59 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.19
  • 20:49 krinkle@deploy1001: Synchronized wmf-config/CommonSettings.php: T232563 - Remove SERVER_SOFTWARE override (duration: 01m 03s)
  • 20:39 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T72470 - Disable wgLegacyJavaScriptGlobals on svwiki (duration: 01m 08s)
  • 19:53 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Don't use hex escapes in the name of cawiki (duration: 01m 04s)
  • 19:47 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T243503 [itwiki] Move assignment of 'mover' group from sysops to bureaucrats (duration: 01m 02s)
  • 19:42 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T243509 [zh_classicalwiki] Enable new user message for auto-created accounts (duration: 01m 03s)
  • 19:38 James_F: Ran mwscript maintenance/namespaceDupes.php --wiki=mywiki --fix and mwscript maintenance/namespaceDupes.php --wiki=mywiktionary --fix on mwmaint1002
  • 19:37 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T244980 Localise $wgMetaNamespace for mywiki and mywiktionary (duration: 01m 03s)
  • 19:30 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T244205 [newiki] Set local timezone to Kathmandu (duration: 01m 03s)
  • 19:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T241883 [fywiktionary] Set a local wgSitename (duration: 01m 03s)
  • 19:12 jforrester@deploy1001: Synchronized wmf-config/throttle-analyze.php: Replace deprecated IP class with IPUtils (no-op sync) (duration: 01m 03s)
  • 18:31 mutante: irc2001 - manually run the "${v6_token_cmd} && ${v6_flush_dyn_cmd}" commands from interface::add_ip6_mapped to debug 'Interface::Add_ip6_mapped[main]/Augeas[ens5_v6_token]: Could not evaluate: Saving failed' but it does not reproduce the puppet error ... (T244719)
  • 17:57 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/pager/IndexPager.php: T244941 IndexPager: Cast properties passed to implode to arrays (duration: 01m 03s)
  • 17:27 jeh: upgrade RAID firmware on cloudvirt1024 to 25.5.6.0009 T241884
  • 17:22 bblack: ns1.wikimedia.org - re-route back to original authdns2001 destination
  • 17:11 brennen: restarting jenkins for updates
  • 17:09 vgutierrez: disabling KA between ats-tls and varnish-fe on cp4031 - T244464
  • 17:01 vgutierrez: rolling back cp4026 and cp4032 to trafficserver 8.0.5-1wm15
  • 17:00 vgutierrez: depool cp40[26,32]
  • 16:53 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:52 vgutierrez: pool cp20[06,14] running buster - T242093
  • 16:51 bblack@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:49 moritzm: installing openjpeg2 security updates
  • 16:10 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:56 vgutierrez: Enable KA and disable parent proxies on cp4031 - T244464
  • 15:50 vgutierrez: depool cp20[06,14] and reimage as buster - T242093
  • 15:49 volans: spicerack upgraded to 0.0.30-1 on both cumin hosts
  • 15:48 vgutierrez: pool cp20[07,17] running buster - T242093
  • 15:46 bblack: authdns2001 - shutting down for hardware work - T242017
  • 15:40 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:39 jeh: clearing foreign drive RAID configuration on cloudvirt1024 T241884
  • 15:37 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:32 marostegui: Disable event handler for db1095 RAID check on icinga - T244958
  • 15:32 marostegui: Disable event handler for db1095 RAID check on icinga -
  • 15:28 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:26 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:25 jeh: upgrade BIOS firmware on cloudvirt1024 to 2.4.8 T241884
  • 15:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:02 vgutierrez: depool cp20[07,17] and reimage as buster - T242093
  • 14:34 XioNoX: repool eqsin
  • 14:31 moritzm: reimage logstash2026 to test new standard RAID0 partman recipe
  • 14:00 vgutierrez: pool cp20[10,18] running buster - T242093
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - T242702', diff saved to https://phabricator.wikimedia.org/P10393 and previous config saved to /var/cache/conftool/dbconfig/20200212-135514-marostegui.json
  • 13:39 akosiaris: revert sessionstore on mw1331, mw1348 so that it times out instead of returning TCP RSTs. Testing for T243106
  • 13:36 XioNoX: re-enable transit/peering on cr1-eqsin - T244944
  • 13:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:23 akosiaris: mangle sessionstore on mw1331, mw1348 so that it timesout instead of returning TCP RSTs. Testing for T243106
  • 13:23 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:22 XioNoX: cr1-eqsin RE failover (final) - T244944
  • 13:21 marostegui: Restart wikibugs as phab comments aren't showing up on irc - T241109
  • 13:20 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:18 jynus: setting up db1140 under maintenance (upgrade, reboot, disable alerts)
  • 13:15 vgutierrez: disabling KA between ats-tls and varnish-fe on cp4031 - T244464
  • 13:10 moritzm: upgrading debdeploy fleet-wide to 0.0.99.13
  • 13:08 moritzm: uploaded libapache2-mod-auth-cas 1.2-1~deb8u1 for jessie-wikimedia to apt.wikimedia.org
  • 13:05 vgutierrez: depool cp20[10,18] and reimage as buster - T242093
  • 13:05 vgutierrez: pool cp20[12,20] running buster - T242093
  • 12:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:53 XioNoX: cr1-eqsin RE failover - T244944
  • 12:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:35 vgutierrez: depool cp20[12,20] and reimage as buster - T242093
  • 12:34 vgutierrez: pool cp20[13,22] running buster - T242093
  • 12:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:21 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Triple the factor of WDQS lag to maxlag for Wikidata (T244722), take II, the cache issue (duration: 01m 03s)
  • 12:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:19 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Triple the factor of WDQS lag to maxlag for Wikidata (T244722) (duration: 01m 04s)
  • 12:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:12 kartik@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 571412|Enable ContentTranslation out of beta in bs and mk WPs (T244139, T244140) (duration: 01m 15s)
  • 12:08 vgutierrez: depool cp2013 and reimage as buster - T242093
  • 12:06 vgutierrez: pool cp2016 running buster - T242093
  • 12:01 vgutierrez: depool cp20[16,22] and reimage as buster - T242093
  • 11:57 vgutierrez: pool cp20[19,24] running buster - T242093
  • 11:53 akosiaris: mangle sessionstore on mw1331 so that it is unreachable. Testing for T243106
  • 11:49 vgutierrez: repooling cp40[26,32]
  • 11:39 vgutierrez: pool cp3050 running buster - T242093
  • 11:37 vgutierrez: depooling cp[4026,4032]
  • 11:35 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:33 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:18 vgutierrez: depool cp2024 and reimage as buster - T242093
  • 11:17 vgutierrez: pool cp2025 running buster - T242093
  • 11:16 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:15 vgutierrez: depool cp2016 and reimage as buster - T242093
  • 11:14 vgutierrez: pool cp2019 running buster - T242093
  • 11:11 moritzm: reimage logstash2026 to test new standard RAID0 partman recipe
  • 11:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:00 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:50 vgutierrez: depool cp3050 and reimage as buster - T242093
  • 10:49 vgutierrez: pool cp30[51,52] running buster - T242093
  • 10:45 vgutierrez: depool cp20[19,25] and reimage as buster - T242093
  • 10:42 vgutierrez: pool cp2026 running buster - T242093
  • 10:36 vgutierrez: pool cp2023 running buster - T242093
  • 10:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:34 moritzm: bouncing ferm on ganeti1016, failed to start after boot
  • 10:32 vgutierrez: Enable KA between ats-tls and varnish-fe on cp4031 - T244464
  • 10:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:31 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:27 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:12 vgutierrez: testing trafficserver 8.0.6-rc0 in cp40[26,32]
  • 10:06 vgutierrez: depool cp20[23,26] and reimage as buster - T242093
  • 10:01 vgutierrez: depool cp30[51-52] and reimage as buster - T242093
  • 09:38 ema: cp: rolling ats-tls-restart to enable analytics logging T237993
  • 09:26 ema: cp4027: ats-tls-restart to enable analytics logging to pipe T237993
  • 09:25 moritzm: rolling restart of cassandra on restbase-dev to pick up Java security updates
  • 09:17 marostegui: Failover m2 master dbproxy from dbproxy1007 to dbproxy1013 - T202367
  • 09:13 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 09:11 marostegui: Upgrade and reboot dbproxy1013 before making it master - T202367
  • 08:55 elukey@cumin1001: START - Cookbook sre.ganeti.makevm
  • 08:46 phedenskog@deploy1001: Finished deploy [performance/navtiming@9bbbb58]: (no justification provided) (duration: 00m 05s)
  • 08:46 phedenskog@deploy1001: Started deploy [performance/navtiming@9bbbb58]: (no justification provided)
  • 08:38 marostegui: Restart wikibugs as it doesn't show phab comments on irc - T241109
  • 08:21 moritzm: installing mesa security updates
  • 07:28 vgutierrez: pool cp30[53-54] running buster - T242093
  • 07:18 oblivian@puppetmaster1001: conftool action : set/weight=30; selector: dc=eqiad,pool=appserver,name=mw132[3-4].*
  • 07:16 oblivian@puppetmaster1001: conftool action : set/weight=20; selector: dc=eqiad,pool=appserver,service=nginx,name=mw12[3-5].*
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 with weight 20 for 10.4 testing - T242702', diff saved to https://phabricator.wikimedia.org/P10391 and previous config saved to /var/cache/conftool/dbconfig/20200212-070250-marostegui.json
  • 06:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:46 marostegui: Redact ngwikimedia on db1124:3313 and db2094:3313 T240772
  • 06:22 vgutierrez: depool cp30[53-54] and reimage as buster - T242093
  • 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 06:17 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 06:16 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.decommission
  • 01:48 XioNoX: disabling peering session on cr1-eqsin (they're flapping otherwise)
  • 00:44 jforrester@deploy1001: Synchronized php-1.35.0-wmf.19/includes/page/ImageHistoryPseudoPager.php: T244937 ImageHistoryPseudoPager: Update doQuery() for IndexPager changes (duration: 01m 03s)
  • 00:38 XioNoX: reboot cr1-eqsin
  • 00:33 XioNoX: commit full on cr1-eqsin - T243080
  • 00:21 reedy@deploy1001: Synchronized wmf-config/CommonSettings.php: rm wgKartographerIconServer (duration: 01m 02s)
  • 00:20 reedy@deploy1001: Synchronized wmf-config/CommonSettings-labs.php: rm wgKartographerIconServer (duration: 01m 03s)
  • 00:16 eileen: civicrm revision changed from ee9edf8137 to 55b2afb6eb, config revision is 561ae21f77

2020-02-11

  • 22:04 XioNoX: switchover RE mastership back re0 on cr1-eqsin - T243080
  • 21:50 XioNoX: reboot re0:cr1-eqsin (backup) - T243080
  • 21:45 cdanis: repool eqiad
  • 21:37 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^cp107.*
  • 21:36 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=^cp108.*
  • 21:36 bblack: re-pooling all cp10xx in eqiad
  • 21:32 XioNoX: switchover RE mastership on cr1-eqsin - T243080
  • 21:14 robh: cp1067 powered back into service post firmware update via T243167
  • 21:11 cdanis: depool eqiad
  • 21:01 marxarelli: completed group0 to 1.35.0-wmf.19 (T233867)
  • 20:57 robh: cp108[45] returned to service, depooling cp108[67]for firmware update via T243167
  • 20:54 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.19
  • 20:53 mutante: gerrit - moving gerrit db_pass from private module passwords to private hieradata
  • 20:51 XioNoX: reboot backup RE on cr1-eqsin - T243080
  • 20:38 robh: depooling cp108[45] for firmware update via T243167
  • 20:32 dduvall@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.19 and rebuild l10n cache (duration: 37m 31s)
  • 20:19 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide: (duration: 00m 02s)
  • 20:19 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
  • 20:18 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide: (duration: 00m 03s)
  • 20:18 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
  • 20:08 XioNoX: depool eqsin for router upgrade - T243080
  • 20:01 volker-e@deploy1001: Finished deploy [design/style-guide@dd8e6de]: Deploy design/style-guide: (duration: 00m 04s)
  • 20:01 volker-e@deploy1001: Started deploy [design/style-guide@dd8e6de]: Deploy design/style-guide:
  • 19:55 dduvall@deploy1001: Started scap: testwiki to php-1.35.0-wmf.19 and rebuild l10n cache
  • 19:43 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.16 (duration: 01m 48s)
  • 19:42 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.15 (duration: 01m 51s)
  • 19:38 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.14 (duration: 02m 08s)
  • 19:36 dduvall@deploy1001: Pruned MediaWiki: 1.35.0-wmf.11 (duration: 10m 53s)
  • 19:35 marxarelli: running `scap clean --delete` for old wmf branches wmf.11, wmf.14, wmf.15, wmf.16 (T233867)
  • 19:03 volans: uploaded spicerack_0.0.30-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 19:00 Urbanecm: Create User:Ammarpad on ngwikimedia and promote to sysop, bureaucrat (T240771)
  • 18:48 jforrester@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.18
  • 18:43 twentyafterfour: getting ready to deploy wmf.18 refs T233866
  • 18:42 greg-g: restarting stashbot
  • 18:35 bblack: ns1.wikimedia.org - changing static route destination on cr[12]-codfw from authdns2001 to dns2002 - T242017
  • 18:33 Urbanecm: Create ngwikimedia is done (T240771)
  • 18:26 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Create ngwikimedia (T240771) (duration: 01m 03s)
  • 18:24 urbanecm@deploy1001: Synchronized multiversion/MWMultiVersion.php: Create ngwikimedia (T240771) (duration: 01m 06s)
  • 18:21 urbanecm@deploy1001: rebuilt and synchronized wikiversions files: Create ngwikimedia (T240771)
  • 18:20 dpifke@deploy1001: Finished deploy [performance/navtiming@b471b64]: (no justification provided) (duration: 00m 05s)
  • 18:20 dpifke@deploy1001: Started deploy [performance/navtiming@b471b64]: (no justification provided)
  • 18:19 urbanecm@deploy1001: Synchronized dblists/: Create ngwikimedia (T240771) (duration: 01m 06s)
  • 17:57 bblack: reboot dns2002 post-reimaging
  • 17:13 vgutierrez: Disable KA on cp4031 - T244464
  • 16:49 vgutierrez: pool cp3055 running buster - T242093
  • 16:43 vgutierrez: repooling cp4031
  • 16:38 vgutierrez: depooling cp4031 for some KA tests
  • 16:25 vgutierrez: pool cp3056 running buster - T242093
  • 16:23 bblack: dns2002 - shutting down for hardware work and reinstall - T242017
  • 16:21 bblack: dns2002 - stopping bird adverts to depool service for T242017
  • 16:20 bblack: dns2002 - downtimed in icinga for T242017
  • 16:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:05 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:38 vgutierrez: depool cp3056 and reimage as buster - T242093
  • 15:36 vgutierrez: pool cp3058 running buster - T242093
  • 15:29 otto@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Configuring test.event stream in beta, no-op in prod - T242122 (duration: 01m 08s)
  • 15:24 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:58 vgutierrez: depool cp3055 and reimage as buster - T242093
  • 14:56 vgutierrez: pool cp3057 running buster - T242093
  • 14:52 moritzm: pruning old CAS logs (predating the current logger config for /var/log/cas/*) from idp1001/idp2001
  • 14:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:35 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:21 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=labswiki --force "Ladsgroup" --custom-groups checkuser
  • 14:20 vgutierrez: restart varnish-fe on cp4031 - T244464
  • 14:07 vgutierrez: depool cp3057 and cp3058 and reimage as buster - T242093
  • 13:52 vgutierrez: pool cp3059 and cp3060 running buster - T242093
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after 10.4 testing - T242702', diff saved to https://phabricator.wikimedia.org/P10382 and previous config saved to /var/cache/conftool/dbconfig/20200211-130343-marostegui.json
  • 12:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:34 Amir1: EU SWAT is done
  • 12:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:28 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Fix typo in the config name (T244697), take II, cache (duration: 01m 06s)
  • 12:26 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Fix typo in the config name (T244697) (duration: 01m 05s)
  • 12:12 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Stop reading for the new term store as the default of client wikis (T244697), Second round, cache issue (duration: 01m 07s)
  • 12:10 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Stop reading for the new term store as the default of client wikis (T244697) (duration: 01m 11s)
  • 12:04 vgutierrez: depool cp3059 and cp360 and reimage as buster - T242093
  • 11:59 vgutierrez: repool cp3061 and cp3062 running buster - T242093
  • 11:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:24 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:20 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:20 vgutierrez: ats-tls effectively reusing connections between ats-tls and varnish-fe on cp4031 - T244464
  • 11:18 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:56 vgutierrez: depool cp3062 and reimage as buster - T242093
  • 10:54 vgutierrez: repool cp3064 running buster - T242093
  • 10:51 vgutierrez: depool cp3061 and reimage as buster - T242093
  • 10:50 vgutierrez: repool cp5006 and cp3063 running buster - T242093
  • 10:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:28 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:25 mvolz@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:23 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:18 mvolz@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 10:11 mvolz@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
  • 10:07 vgutierrez: rolling restart of ats-tls in ulsfo - T244464
  • 09:57 vgutierrez: depool cp3063 and cp3064 and reimage as buster - T242093
  • 09:52 vgutierrez: depool cp5006 and reimage as buster - T242093
  • 09:52 vgutierrez: pool cp5007 running buster - T242093
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Increase db1107 weight from 10 to 11', diff saved to https://phabricator.wikimedia.org/P10380 and previous config saved to /var/cache/conftool/dbconfig/20200211-083812-marostegui.json
  • 08:25 marostegui: Upgrade db1095:3312, db1095:3313
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10379 and previous config saved to /var/cache/conftool/dbconfig/20200211-082204-marostegui.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10378 and previous config saved to /var/cache/conftool/dbconfig/20200211-081421-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight from 5 to 10 for db1107 - T242702', diff saved to https://phabricator.wikimedia.org/P10377 and previous config saved to /var/cache/conftool/dbconfig/20200211-081319-marostegui.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10376 and previous config saved to /var/cache/conftool/dbconfig/20200211-080458-marostegui.json
  • 07:57 akosiaris: T242705 systemctl stop uwsgi-ores on ores2001.
  • 07:54 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 07:54 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1013 after upgrade', diff saved to https://phabricator.wikimedia.org/P10375 and previous config saved to /var/cache/conftool/dbconfig/20200211-075358-marostegui.json
  • 07:47 marostegui: Upgrade es1013 - T239791
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1013 - T239791', diff saved to https://phabricator.wikimedia.org/P10374 and previous config saved to /var/cache/conftool/dbconfig/20200211-074358-marostegui.json
  • 07:23 vgutierrez: depool cp5007 and reimage as buster - T242093
  • 07:22 vgutierrez: pool cp5001 and cp5008 running buster - T242093
  • 07:21 marostegui: Remove partitions from db2086:3318 - T239453
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2086:3318 T239453', diff saved to https://phabricator.wikimedia.org/P10373 and previous config saved to /var/cache/conftool/dbconfig/20200211-071936-marostegui.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3318 T239453', diff saved to https://phabricator.wikimedia.org/P10372 and previous config saved to /var/cache/conftool/dbconfig/20200211-071639-marostegui.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1107 for 10.4 testing - T242702', diff saved to https://phabricator.wikimedia.org/P10371 and previous config saved to /var/cache/conftool/dbconfig/20200211-070720-marostegui.json
  • 07:01 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:59 marostegui: Stop haproxy on dbproxy1001 - T244463
  • 06:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 06:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:57 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 06:48 marostegui: Remove grants in m1 for dbproxy1001 - T231280
  • 06:25 vgutierrez: depool cp5001 & cp5008 and reimage as buster - T242093
  • 06:18 marostegui: Failover m1-master from dbproxy1014 to dbproxy1012 - T202367
  • 00:26 ebernhardson@deploy1001: Synchronized php-1.35.0-wmf.18/skins/MinervaNeue: SWAT: Revert: Reduce userContributions icon code (duration: 01m 06s)
  • 00:20 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Give NS_HELP same weight as NS_MAIN in search on wikitech (duration: 01m 06s)
  • 00:15 ebernhardson@deploy1001: Synchronized wmf-config/: SWAT: Enable SpecialMute page on all wikis (duration: 01m 06s)

2020-02-10

  • 23:30 robh: cp108[23] returned to service via T243167
  • 23:28 legoktm: restarting zuul
  • 23:26 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/OATHAuth/src/Key/TOTPKey.php: T244308 (duration: 01m 04s)
  • 23:25 reedy@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/OATHAuth/src/Key/TOTPKey.php: T244308 (duration: 01m 07s)
  • 23:06 robh: cp108[01] returned to service, cp108[23] offline for bios update via T243167
  • 22:50 chasemp: phab1001:~# sudo /srv/phab/phabricator/bin/bulk make-silent --id 2164
  • 22:45 sbassett@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add authevents as monolog channel (duration: 01m 06s)
  • 22:43 robh: cp107[789] returned to service, cp108[01] offline for bios update via T243167
  • 22:42 robh: cp107[89] returned to service, cp108[01] offline for bios update via T243167
  • 21:58 robh: cp107[56] returned to service, cp107[78] offline for bios update via T243167
  • 21:43 arlolra: Updated Parsoid to 612106d2 (T244412, T244413, T242746, T235273, T235307, T238845, T204618, T240054)
  • 21:38 robh: cp1075 & cp1076 offline for bios updates per T243167
  • 21:36 robh: cp1075 and cp1076 going offline for bios updates. This will cause a bit of cp irc icinga noise, but no paging. Not putting into maint mode, as there is no way to maint mode the noisest check (which checks all backends and thus shouldnt be disabled)
  • 21:33 arlolra@deploy1001: Finished deploy [parsoid/deploy@d2d4870]: Updating Parsoid to 612106d2 (duration: 10m 26s)
  • 21:32 XioNoX: clamp tcp-mss on cr2-eqiad:xe-3/3/3
  • 21:23 arlolra@deploy1001: Started deploy [parsoid/deploy@d2d4870]: Updating Parsoid to 612106d2
  • 21:12 halfak@deploy1001: Finished deploy [ores/deploy@a6f4f14]: T242705 (duration: 12m 18s)
  • 21:00 halfak@deploy1001: Started deploy [ores/deploy@a6f4f14]: T242705
  • 20:55 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/MachineVision: MachineVision: Fix page id parsing from imageinfo results (T244752) (duration: 01m 11s)
  • 20:14 mholloway-shell@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/MachineVision: MachineVision: Fix page id parsing from imageinfo results (T244752) (duration: 01m 15s)
  • 19:31 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:570393 Config: Session Store: Switch group0 and group1 to kask-session T243106 (duration: 01m 06s)
  • 19:28 mutante: Gerrit - added eevans to 'wmf-deployment' group (T244508)
  • 19:12 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T242122 Load new EventStreamConfig extension if so configured (duration: 01m 06s)
  • 19:07 jforrester@deploy1001: Scap failed!: Call to mwscript eval.php stderr: not empty
  • 19:06 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T242122 Set default of wmgUseEventStreamConfig false everywhere (duration: 01m 06s)
  • 18:39 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.18 refs T233866 (duration: 01m 05s)
  • 18:38 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.18 refs T233866
  • 18:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.18 refs T233867
  • 18:21 twentyafterfour: MediaWiki train: finally moving forward with group0 wikis to 1.35.0-wmf.18 refs T233866
  • 17:52 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T244561 Set Kartographer servers to Wikimedia servers (duration: 01m 06s)
  • 16:48 moritzm: installing libexif security updates on jessie
  • 16:22 vgutierrez: pooling cp5002 and cp5009 running buster - T242093
  • 15:45 XioNoX: push outbound flowspec support to core routers
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 after first day of 10.4 testing - T242702', diff saved to https://phabricator.wikimedia.org/P10366 and previous config saved to /var/cache/conftool/dbconfig/20200210-154552-marostegui.json
  • 15:41 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:41 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:33 godog: roll restart cassandra on session* to apply logging changes - T242585
  • 15:23 moritzm: uploading debdeploy 0.0.99.13 to apt.wikimedia.org
  • 15:22 godog: roll restart cassandra on restbase* to apply logging changes - T242585
  • 15:19 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:19 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 15:19 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:06 marostegui: Reload haproxy on dbproxy1017 and dbproxy1017 - T244209
  • 15:04 twentyafterfour@deploy1001: Finished scap: full scap sync prior to wmf.18 rollout (duration: 20m 13s)
  • 15:04 godog: roll restart cassandra on maps* to apply logging changes - T242585
  • 15:03 vgutierrez: rolling restart of ats-tls - T240950
  • 15:00 marostegui: Restart mysql on m5 master (wikitech will go down) - T244209
  • 14:52 vgutierrez: rolling restart of ats-tls in ulsfo - T244464
  • 14:46 vgutierrez: depool cp5002 and cp5009 and reimage as buster - T242093
  • 14:44 twentyafterfour@deploy1001: Started scap: full scap sync prior to wmf.18 rollout
  • 14:42 vgutierrez: repool cp5003 and cp5010 running buster - T242093
  • 14:41 marostegui: Full-upgrade db1133 (without restarting mysql) - T244209
  • 14:40 twentyafterfour: MediaWiki Train: Running a full scap to prepare for moving forward to 1.35.0-wmf.18 ( T233866 )
  • 14:32 marostegui: Downtime m5 hosts for the upcoming maintenance - T244209
  • 14:19 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:17 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 14:17 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:11 XioNoX: remove TCP-MSS clamping on cr3-knams
  • 13:48 vgutierrez: depool cp5003 and reimage as buster - T242093
  • 13:47 vgutierrez: pooling cp5004 with buster - T242093
  • 13:46 vgutierrez: depool cp5010 and reimage as buster - T242093
  • 13:45 vgutierrez: pooling cp5011 with buster - T242093
  • 13:28 godog: roll restart cassandra on aqs to apply logging changes - T242585
  • 13:03 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Wikibase: Revert "wbterms: Set default for the term store to read new" (T244529) (duration: 01m 00s)
  • 13:03 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:59 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:58 Urbanecm: EU SWAT is done
  • 12:58 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 989c9f8: Revert "Revert "Remove handler deleted from the MachineVision extension"" (duration: 00m 58s)
  • 12:51 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 989c9f8: Revert "Revert "Remove handler deleted from the MachineVision extension"" (duration: 00m 59s)
  • 12:49 urbanecm@deploy1001: Finished scap: SWAT: 799224f: 137a40e (T241242; T243974) (duration: 20m 18s)
  • 12:30 vgutierrez: depool cp5004 and reimage as buster - T242093
  • 12:29 vgutierrez: pooling cp5005 with buster - T242093
  • 12:28 urbanecm@deploy1001: Started scap: SWAT: 799224f: 137a40e (T241242; T243974)
  • 12:23 vgutierrez: pooling ncredir1001 with buster - T243391
  • 12:18 _joe_: running puppet, scap pull on mwdebug1001
  • 12:17 vgutierrez: upload trafficserver 8.0.5-1wm15 to apt.wm.o (buster) - T244538
  • 12:08 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 12:08 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:06 vgutierrez: testing ats 8.0.5-1-wm15 on cp4032 - T244538
  • 12:06 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 014405a: Add throttle rules for OSU Editathon and workshop for cawiki, remove expired ones (T244608, T244645) (duration: 01m 03s)
  • 11:57 vgutierrez: depool ncredir1001 and reimage as buster - T243391
  • 11:57 vgutierrez: pooling ncredir1002 with buster - T243391
  • 11:43 vgutierrez: pooling cp4027 with buster - T242093
  • 11:38 vgutierrez: depool ncredir1002 and reimage as buster - T243391
  • 11:31 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:29 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:22 vgutierrez: depooling cp5011 and cp5005 & reimage as buster - T242093
  • 11:07 vgutierrez: depool cp4027 & reimage as buster - T242093
  • 11:07 vgutierrez: pooling ncredir2001 with buster - T243391
  • 11:03 vgutierrez: pooling cp4028 with buster - T242093
  • 10:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:47 godog: remove old logs from /var/log/swift on swift hsots
  • 10:31 vgutierrez: depool ncredir2001 and reimage as buster - T243391
  • 10:26 vgutierrez: depool cp4028 & reimage as buster - T242093
  • 10:14 moritzm: installing sudo security updates for buster
  • 08:53 vgutierrez: pooling cp4029 with buster - T242093
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Increase weight from 1 to 5 for db1107 - T242702', diff saved to https://phabricator.wikimedia.org/P10364 and previous config saved to /var/cache/conftool/dbconfig/20200210-084446-marostegui.json
  • 08:43 vgutierrez: pooling ncredir2002 with buster - T243391
  • 08:34 effie: rolling restart php-fpm on labweb[1001-1002].wikimedia.org,mw*.eqiad.wmnet,scandium.eqiad.wmnet, wtp[1025-1048].eqiad.wmnet
  • 08:32 effie: update php-apcu on eqiad - T236800
  • 08:29 effie: rolling restart php-fpm on cloudweb2001-dev.wikimedia.org,mw[2135-2147,2150-2212,2214-2290].codfw.wmnet,wtp[2001-2020].codfw.wmnet
  • 08:23 effie: update php-apcu on codfw - T236800
  • 07:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 07:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:54 moritzm: updating d-i netinst image for Stretch 9.12 point release (which bumped the kernel ABI)
  • 07:29 moritzm: updating d-i netinst image for Buster 10.3 point release (which bumped the kernel ABI)
  • 07:09 elukey: restore mw1347's mcrouter settings to its default (proxy threads 10 -> 5)
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Place db1107 - MariaDB 10.4 on s1 with minimal weight - T242702', diff saved to https://phabricator.wikimedia.org/P10363 and previous config saved to /var/cache/conftool/dbconfig/20200210-070140-marostegui.json
  • 06:55 vgutierrez: depool ncredir2002 and reimage as buster - T243391
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool es1019', diff saved to https://phabricator.wikimedia.org/P10362 and previous config saved to /var/cache/conftool/dbconfig/20200210-065326-marostegui.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1091 T232446', diff saved to https://phabricator.wikimedia.org/P10361 and previous config saved to /var/cache/conftool/dbconfig/20200210-065135-marostegui.json
  • 06:47 vgutierrez: depool cp4029 & reimage as buster - T242093
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019', diff saved to https://phabricator.wikimedia.org/P10360 and previous config saved to /var/cache/conftool/dbconfig/20200210-064553-marostegui.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 T232446', diff saved to https://phabricator.wikimedia.org/P10359 and previous config saved to /var/cache/conftool/dbconfig/20200210-064458-marostegui.json
  • 06:39 marostegui: Compress db1124:3318 - this will generate lag on s8 wiki replicas - T232446
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 T232446', diff saved to https://phabricator.wikimedia.org/P10358 and previous config saved to /var/cache/conftool/dbconfig/20200210-063716-marostegui.json
  • 06:23 marostegui: Remove partitions from db1099:3311, db1099:3318 T239453
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3318 T239453', diff saved to https://phabricator.wikimedia.org/P10357 and previous config saved to /var/cache/conftool/dbconfig/20200210-062112-marostegui.json
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1099:3311 T239453', diff saved to https://phabricator.wikimedia.org/P10356 and previous config saved to /var/cache/conftool/dbconfig/20200210-061822-marostegui.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1091 T232446', diff saved to https://phabricator.wikimedia.org/P10355 and previous config saved to /var/cache/conftool/dbconfig/20200210-061656-marostegui.json

2020-02-09

  • 05:11 cdanis: T238305 hardreset cp3051

2020-02-08

  • 19:12 _joe_: set cpufreq governor to performance on mw1328
  • 17:04 _joe_: restarted php7.2-fpm on mw1332
  • 16:53 Urbanecm: mwscript resetAuthenticationThrottle.php --wiki=enwiki --signup --ip 12.24.27.50
  • 16:47 gjg@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Editathon in Charolette (duration: 00m 58s)
  • 00:05 Jeff_Green: switched payments.wikimedia.org to codfw datacenter due to T244610

2020-02-07

  • 22:20 jeh: ceph: round 2 OSD failover and recovery testing on cloudcephosd1003.wikimedia.org T240718
  • 20:47 mutante: OS install on new install_server VMs worked on second attempt, issues are gone. signed puppet certs for install1003.eqiad.wmnet, install2003.codfw.wmnet, initial puppet runs (T224576)
  • 20:42 jeh: ceph: OSD failover and recovery testing on cloudcephosd1003.wikimedia.org T240718
  • 20:32 mutante: ganeti: attempting to reinstall install1003 which failed last time
  • 17:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019 after on-site maintenance T243963', diff saved to https://phabricator.wikimedia.org/P10350 and previous config saved to /var/cache/conftool/dbconfig/20200207-173850-marostegui.json
  • 17:36 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync InitializeSettings again for lols refs T233866 (duration: 01m 03s)
  • 17:32 twentyafterfour@deploy1001: Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/570929 refs T233866 (duration: 01m 02s)
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool es1019 after on-site maintenance T243963', diff saved to https://phabricator.wikimedia.org/P10349 and previous config saved to /var/cache/conftool/dbconfig/20200207-172541-marostegui.json
  • 17:22 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: roll back all wikis to 1.35.0-wmf.16 refs T233866
  • 17:19 marostegui: Start MySQL on es1019 after onsite maintenance T243963
  • 16:46 filippo@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 16:38 filippo@cumin1001: START - Cookbook sre.ganeti.makevm
  • 16:13 XioNoX: remove MSS clamping from eqiad/eqord/knams/esams
  • 16:05 andrew@deploy1001: Finished deploy [horizon/deploy@bc777d6]: Fix for T243422 (duration: 03m 45s)
  • 16:04 vgutierrez: pooling cp4030 with buster - T242093
  • 16:03 bblack: removing GRE MTU mitigations from cp[135]xxx - T232602
  • 16:01 andrew@deploy1001: Started deploy [horizon/deploy@bc777d6]: Fix for T243422
  • 15:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:48 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:25 vgutierrez: depool & reimage cp4030 as buster - T242093
  • 15:21 vgutierrez: pooling cp4031 with buster - T242093
  • 15:20 vgutierrez: pooling ncredir3001 running buster - T243391
  • 15:18 marostegui: Restart all instances on db1124 and db1125 to pick up a new replication filter - T240094
  • 15:11 marostegui: Restart all instances on db2094 and db2095 to pick up a new replication filter - T240094
  • 14:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:53 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:43 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: REVERT: Wikibase Client: Fix setting name typo (T244529) (duration: 01m 40s)
  • 14:43 Amir1: ladsgroup@mwmaint1002:~$ mwscript createAndPromote.php --wiki=zhwiki --force "Amir Sarabadani (WMDE)" --sysop (T244578)
  • 14:40 hoo@deploy1001: Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org)
  • 14:38 hoo@deploy1001: Synchronized wmf-config/Wikibase.php: Wikibase Client: Fix setting name typo (T244529) (duration: 01m 20s)
  • 14:33 vgutierrez: depool and reimage ncredir3001 as buster - T243391
  • 14:32 vgutierrez: depool & reimage cp4031 as buster - T242093
  • 14:23 vgutierrez: pooling ncredir3002 running buster - T243391
  • 13:26 vgutierrez: pooling cp4021 with buster - T242093
  • 13:05 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:03 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:51 vgutierrez: depool and reimage ncredir3002 as buster - T243391
  • 12:42 vgutierrez: depool & reimage cp4021 as buster - T242093
  • 12:08 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:58 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:57 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:25 vgutierrez: pooling ncredir5001 running buster - T243391
  • 11:24 vgutierrez: pooling cp4022 with buster - T242093
  • 11:09 akosiaris: undo wikifeeds experiments
  • 11:07 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 10:42 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 10:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:37 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:36 akosiaris: conduct experiments with stopping/starting uwsgi-ores on ores2001 T242705
  • 10:24 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 10:23 vgutierrez: depool and reimage ncredir5001 as buster - T243391
  • 10:14 vgutierrez: depool & reimage cp4022 as buster - T242093
  • 10:02 akosiaris: increase capacity for wikifeeds by 50% T244535
  • 10:02 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 10:01 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 09:53 ema: A:mw: increase keepalive_requests from 100 to 200 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570670/ T241145
  • 09:09 godog: roll restart cassandra instance on restbase-dev
  • 09:03 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 09:03 godog: restart cassandra on restbase-dev1004 to test logging pipeline onboard
  • 09:01 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 08:59 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1090:3312, db1090:3317', diff saved to https://phabricator.wikimedia.org/P10343 and previous config saved to /var/cache/conftool/dbconfig/20200207-085846-marostegui.json
  • 08:54 marostegui: Upgrade db1090:3312, db1090:3317
  • 08:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1090:3312, db1090:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P10342 and previous config saved to /var/cache/conftool/dbconfig/20200207-085432-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1101:3317 T239453', diff saved to https://phabricator.wikimedia.org/P10341 and previous config saved to /var/cache/conftool/dbconfig/20200207-084447-marostegui.json
  • 08:44 moritzm: installing libexif security updates
  • 08:21 akosiaris: deploy https://gerrit.wikimedia.org/r/570726 T244535 to avoid CPU throttling of wikifeeds
  • 08:21 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Increase base weight for db1126', diff saved to https://phabricator.wikimedia.org/P10340 and previous config saved to /var/cache/conftool/dbconfig/20200207-075323-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 T239453', diff saved to https://phabricator.wikimedia.org/P10339 and previous config saved to /var/cache/conftool/dbconfig/20200207-075234-marostegui.json
  • 07:48 marostegui: Remove revision partitions from db2085:3318 T239453
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Fullyy repool db1126 T232446', diff saved to https://phabricator.wikimedia.org/P10338 and previous config saved to /var/cache/conftool/dbconfig/20200207-074511-marostegui.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3318 T239453', diff saved to https://phabricator.wikimedia.org/P10337 and previous config saved to /var/cache/conftool/dbconfig/20200207-074407-marostegui.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 T239453', diff saved to https://phabricator.wikimedia.org/P10336 and previous config saved to /var/cache/conftool/dbconfig/20200207-074258-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1101:3317 T239453', diff saved to https://phabricator.wikimedia.org/P10335 and previous config saved to /var/cache/conftool/dbconfig/20200207-073130-marostegui.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 T232446', diff saved to https://phabricator.wikimedia.org/P10334 and previous config saved to /var/cache/conftool/dbconfig/20200207-073026-marostegui.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 T232446', diff saved to https://phabricator.wikimedia.org/P10333 and previous config saved to /var/cache/conftool/dbconfig/20200207-063831-marostegui.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1105:3311 T239453', diff saved to https://phabricator.wikimedia.org/P10332 and previous config saved to /var/cache/conftool/dbconfig/20200207-063402-marostegui.json
  • 06:31 elukey: force a puppet run on all ores[12] nodes
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 T239453', diff saved to https://phabricator.wikimedia.org/P10331 and previous config saved to /var/cache/conftool/dbconfig/20200207-062731-marostegui.json
  • 06:26 marostegui: Reboot db1107 for update - T242702
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126 T232446', diff saved to https://phabricator.wikimedia.org/P10330 and previous config saved to /var/cache/conftool/dbconfig/20200207-062502-marostegui.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 T239453', diff saved to https://phabricator.wikimedia.org/P10329 and previous config saved to /var/cache/conftool/dbconfig/20200207-062345-marostegui.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1105:3311 T239453', diff saved to https://phabricator.wikimedia.org/P10328 and previous config saved to /var/cache/conftool/dbconfig/20200207-062043-marostegui.json
  • 04:49 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 04:46 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 04:16 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 04:14 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 04:13 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 04:11 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 03:51 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 03:49 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 03:42 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 03:40 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:27 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 01:25 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 01:24 robh: eqsin pdu work ongoing starting now. ps1-603 swapping per T242250
  • 00:13 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:11 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 00:09 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 00:08 pt1979@cumin2001: START - Cookbook sre.hosts.downtime

2020-02-06

  • 23:44 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:42 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:37 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:35 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:25 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T244133 [cswikisource] Enable VisualEditor in the Edice namespace (duration: 01m 07s)
  • 23:22 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T159711 T161365 T164435 [nlwiki] Enable VisualEditor in the Project namespace (duration: 01m 08s)
  • 23:21 pt1979@cumin2001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 23:19 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:15 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:13 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 23:10 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T244405 Don't trying to assign to if it's unset (duration: 01m 07s)
  • 22:50 jforrester@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/VisualEditor: T242184 Change tags method so anon edits will go through (duration: 01m 08s)
  • 22:42 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:40 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:39 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 22:38 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 22:18 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 22:15 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 22:13 mutante: turning mw2271 and mw2163 into canary appservers for codfw, this adds mediawiki-testers shell users and removes scap sql scripts, rest stays as is (T242606)
  • 21:54 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:52 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 21:40 twentyafterfour: train blocked due to serious incident related to deploying the latest branch. Incident documentation: https://wikitech.wikimedia.org/wiki/Incident_documentation/20200206-mediawiki refs T233866
  • 21:30 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:27 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 21:05 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 21:03 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 20:52 akosiaris: restart all wikifeeds pods
  • 20:48 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'wikifeeds' for release 'production' .
  • 20:45 akosiaris: restart restbase on restbase1027
  • 20:32 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: (no justification provided)
  • 20:30 twentyafterfour: sync-wikiversions --force
  • 20:30 twentyafterfour@deploy1001: Scap failed!: 9/11 canaries failed their endpoint checks(http://en.wikipedia.org)
  • 20:25 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.18 refs T233866
  • 19:45 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T244405 Set wgLogoHD before adding wordmark (duration: 01m 06s)
  • 19:36 bblack: re-pool cp1075 (eqiad text)
  • 19:33 addshore: SWAT done!
  • 19:32 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/WikibaseLexemeCirrusSearch: T244479 Update namespace for PrefetchingTermLookup & fix tests (duration: 01m 06s)
  • 19:31 bblack: depool cp1075 (eqiad text) for minor experimentation
  • 19:29 addshore@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Babel/includes/Babel.php: T243713 Timeout for meta api call from 10 to 2 seconds. (duration: 01m 07s)
  • 19:28 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Babel/includes/Babel.php: T243713 Timeout for meta api call from 10 to 2 seconds. (duration: 01m 07s)
  • 19:25 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix incorrect spellings of "RESTBase" in config variables (2/2) 2.IS (duration: 01m 06s)
  • 19:23 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: Fix incorrect spellings of "RESTBase" in config variables (2/2) 1.CS (duration: 01m 07s)
  • 19:23 cdanis: manual puppet run on netflow1001 looked good; βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•‘β˜• sudo cumin A:netflow "run-puppet-agent --enable 'rollout of I60692f0e8 T237587 cdanis'"
  • 19:22 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 19:20 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Fix incorrect spellings of "RESTBase" in config variables (1/2) (duration: 01m 06s)
  • 19:20 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 19:14 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation everywhere T243395, sync again for luck (duration: 01m 06s)
  • 19:12 cdanis: βœ”οΈ cdanis@cumin1001.eqiad.wmnet ~ πŸ•‘β˜• sudo cumin A:netflow "disable-puppet 'rollout of I60692f0e8 T237587 cdanis'"
  • 19:10 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation everywhere T243395 (duration: 01m 07s)
  • 19:05 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group1 T243395 (duration: 01m 10s)
  • 19:01 moritzm: restarting exim on mendelevium to pick up cyrus-sasl security updates
  • 18:58 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:56 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 18:55 moritzm: restarting apache on tungsten/dbmonitor to pick up cyrus-sasl security updates
  • 18:53 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@8e15868]: Update mobileapps to ceeb950 (duration: 06m 27s)
  • 18:46 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@8e15868]: Update mobileapps to ceeb950
  • 18:36 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:34 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 18:06 pt1979@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 18:04 pt1979@cumin2001: START - Cookbook sre.hosts.downtime
  • 17:32 herron: set performance cpu scaling governor on maps*
  • 16:49 vgutierrez: pooling ncredir5002 running buster - T243391
  • 16:38 vgutierrez: pooling cp4023 with buster - T242093
  • 16:36 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@524be2b]: airflow: Update ores data transfer from drafttopic -> articletopic (duration: 00m 19s)
  • 16:35 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@524be2b]: airflow: Update ores data transfer from drafttopic -> articletopic
  • 16:35 XioNoX: remove AS prepending in esams/knams
  • 16:31 bblack: lvs1013 - restart pybal for dual bgp session config - T180069
  • 16:30 bblack: lvs1014 - restart pybal for dual bgp session config - T180069
  • 16:30 bblack: lvs1015 - restart pybal for dual bgp session config - T180069
  • 16:29 bblack: lvs1016 - restart pybal for dual bgp session config - T180069
  • 16:28 moritzm: restarting apache on bromine to pick up SASL security updates
  • 16:24 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:22 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:22 moritzm: installing cyrus-sasl2 security updates on jessie
  • 16:20 bblack: lvs2001 - restart pybal for dual bgp session config - T180069
  • 16:19 bblack: lvs2002 - restart pybal for dual bgp session config - T180069
  • 16:19 bblack: lvs2003 - restart pybal for dual bgp session config - T180069
  • 16:07 vgutierrez: depool and reimage ncredir5002 as buster - T243391
  • 16:07 bblack: lvs4005 - restart pybal for dual bgp session config - T180069
  • 16:06 bblack: lvs4006 - restart pybal for dual bgp session config - T180069
  • 16:06 bblack: lvs4007 - restart pybal for dual bgp session config - T180069
  • 16:03 vgutierrez: depool & reimage cp4023 as buster - T242093
  • 16:03 vgutierrez: pooling cp4024 with buster - T242093
  • 15:59 akosiaris: repool eventgate-analytics/eqiad. Experiment proved the failover wouldn't cause (on it's own) a problem. Experiment done.
  • 15:58 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=eventgate-analytics
  • 15:57 halfak@deploy1001: Finished deploy [ores/deploy@50a101a]: T242705 (duration: 04m 35s)
  • 15:56 vgutierrez: pooling ncredir4001 running buster - T243391
  • 15:55 moritzm: installing qemu security updates
  • 15:54 bblack: lvs5001 - restart pybal for dual bgp session config - T180069
  • 15:53 bblack: lvs5002 - restart pybal for dual bgp session config - T180069
  • 15:53 halfak@deploy1001: Started deploy [ores/deploy@50a101a]: T242705
  • 15:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:52 bblack: lvs5003 - restart pybal for dual bgp session config - T180069
  • 15:50 moritzm: installing python-ecdsa security updates
  • 15:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:41 moritzm: installing jsoup security updates
  • 15:30 vgutierrez: depool & reimage ncredir4001 as buster - T243391
  • 15:29 vgutierrez: depool & reimage cp4024 as buster - T242093
  • 15:28 vgutierrez: pooling ncredir4002 running buster - T243391
  • 15:27 moritzm: installing sudo security updates on jessie
  • 15:23 vgutierrez: pooling cp4025 with buster - T242093
  • 15:14 ema: A:mw-api: force puppet run to increase keepalive_requests from 100 to 200 https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570670/ T241145
  • 15:09 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:07 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:59 godog: extend graphite1004 / graphite2003 fs +200G
  • 14:56 vgutierrez: depool and reimage ncredir4002 as buster - T243391
  • 14:46 vgutierrez: depool & reimage cp4025 as buster - T242093
  • 14:16 akosiaris: 20mins in with eventgate-analytics/eqiad depooled from discovery, no issues yet.
  • 14:14 ema: run puppet on mw-api-canary to revert nginx keepalive_requests bump T241145
  • 13:55 marostegui: Stop MySQL on es1019, upgrade and poweroff for on-site maintenance - T243963
  • 13:54 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=eqiad,dnsdisc=eventgate-analytics
  • 13:53 akosiaris: depool eqiad eventgate-analytics for testing purposes. Requests will flow to codfw, monitoring https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?orgId=1&from=now-30m&to=now for issues.
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1019 for onsite maintenance T243963', diff saved to https://phabricator.wikimedia.org/P10321 and previous config saved to /var/cache/conftool/dbconfig/20200206-135157-marostegui.json
  • 13:45 XioNoX: rollback deactivate BGP transits on cr3-knams
  • 13:34 elukey: repool mw1347 with mcrouter running with 10 proxy threads (was: 5)
  • 13:31 XioNoX: reboot cr3-knams
  • 13:31 elukey: depool mw1347 to test some mcrouter settings
  • 13:27 XioNoX: deactivate BGP transits on cr3-knams
  • 13:22 vgutierrez: Enable server session sharing on ats-tls in cp4031 - T244464
  • 13:10 XioNoX: rollback: deactivate BGP transits on cr2-eqsin
  • 13:00 XioNoX: reboot cr2-eqsin for sw upgrade
  • 13:00 addshore: SWAT done
  • 13:00 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: resync REVERT Enable EntitySourceBasedFederation for group1 (duration: 01m 07s)
  • 12:59 XioNoX: deactivate BGP transits on cr2-eqsin
  • 12:58 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: REVERT Enable EntitySourceBasedFederation for group1 T243395, due to T244479 (duration: 01m 07s)
  • 12:52 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group1 T243395 (duration: 01m 06s)
  • 12:46 addshore@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Babel: REVERT Fetch central babel information over SQL query, not API (T243726) (duration: 01m 07s)
  • 12:44 addshore@deploy1001: sync-file aborted: Fetch central babel information over SQL query, not API (T243726) (duration: 01m 04s)
  • 12:40 vgutierrez: pooling cp3065 - T242093
  • 12:39 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable EntitySourceBasedFederation for group0 T243395 (duration: 01m 07s)
  • 12:34 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Re-enable delayed new upload jobs for MachineVision extension (duration: 01m 08s)
  • 12:26 cparle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Remove handler deleted from the MachineVision extension (duration: 01m 05s)
  • 12:25 XioNoX: remove full-duplex statement from eqsin Tata link (not supported on Junos 18, as 10G is full duplex anyway)
  • 12:24 cparle@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/MachineVision: Use the wbsetclaim API to add depicts statements (duration: 01m 09s)
  • 12:07 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5e1cbb2: Enable CX in te, kn, gu, mr and pawiki as a default tool (T243271, T243272, T243273, T243274, T243275) (duration: 01m 09s)
  • 11:41 akosiaris: upgrade etherpad-lite on etherpad1002 to 1.8.0-1
  • 11:38 kart_: Updated cxserver to 2020-02-05-051751-production (T244230, T234323)
  • 11:35 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:33 akosiaris: upload etherpad-lite_1.8.0-1 to apt.wikimedia.org buster-wikimedia/main
  • 11:31 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 11:28 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 11:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:11 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:21 akosiaris: undo "switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348". no effect observed
  • 10:20 akosiaris: undo "switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348"
  • 10:19 vgutierrez: Enabling HTTP keepalive between ats-tls and varnish-frontend on cp4031 - T244464
  • 10:00 vgutierrez: depool and reimage cp3065 as buster - T242093
  • 09:59 vgutierrez: upload trafficserver 8.0.5-1wm14 to apt.wm.o (buster) - T242093
  • 09:08 dcausse@deploy1001: Finished deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui 5a1af3b (duration: 11m 41s)
  • 08:56 dcausse@deploy1001: Started deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui 5a1af3b
  • 08:45 dcausse@deploy1001: Finished deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui 5a1af3b to wdqs1010.eqiad.wmnet (duration: 00m 29s)
  • 08:44 dcausse@deploy1001: Started deploy [wdqs/wdqs@4306c64]: deploying wdqs 0.3.14-SNAPSHOT and gui 5a1af3b to wdqs1010.eqiad.wmnet
  • 08:23 marostegui: Reboot dbproxy1012 and dbproxy1014 for upgrade
  • 08:18 dcausse: restarting blazegraph on wdqs1006: T242453
  • 08:17 akosiaris: switchover selectively eventgate-analytics.discovery.wmnet to codfw for mw1331 and mw1348 to
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101:3317 - T239453', diff saved to https://phabricator.wikimedia.org/P10319 and previous config saved to /var/cache/conftool/dbconfig/20200206-065906-marostegui.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3317 - T239453', diff saved to https://phabricator.wikimedia.org/P10318 and previous config saved to /var/cache/conftool/dbconfig/20200206-065238-marostegui.json
  • 06:46 elukey: run puppet on all ores[12]* nodes
  • 02:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 02:42 mutante: ganeti - Creating new VM named install2003.codfw.wmnet in codfw with row=A vcpu=1 memory=1 gigabytes disk=20 gigabytes link=private (T244390)
  • 02:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 02:30 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 02:21 mutante: ganeti - Creating new VM named install1003.eqiad.wmnet in eqiad with row=C vcpu=1 memory=1 gigabytes disk=20 gigabytes link=private (T244390)
  • 02:20 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm

2020-02-05

  • 23:30 ebernhardson: delete search indices duplicated on multiple clusters for: hywwiki, chrwiktionary, gcrwiki, mnwwiki, noboard_chapterswikimedia nqowiki nrmwiki outreachwiki and srnwiki
  • 23:08 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@a51f927]: Update mobileapps to a7928fa (duration: 10m 48s)
  • 22:57 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@a51f927]: Update mobileapps to a7928fa
  • 22:07 mutante: Gerrit - added ppchelko to 'wmf-deployment' Gerrit group (he is already in deployment admin group) (T244389)
  • 21:37 arlolra@deploy1001: Finished deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to 74730a3 (duration: 03m 07s)
  • 21:33 arlolra@deploy1001: Started deploy [parsoid/deploy@01d9d3d]: Updating Parsoid to 74730a3
  • 21:31 mutante: killing and restarting wikibugs, it was reporting each update twice
  • 20:51 joal@deploy1001: Finished deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy (duration: 00m 07s)
  • 20:51 joal@deploy1001: Started deploy [analytics/refinery@a47f0d5] (thin): Analytics regular weekly deploy
  • 20:51 joal@deploy1001: Finished deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy (duration: 13m 28s)
  • 20:50 mutante: ores1004 - systemctl start celery-ores-worker
  • 20:45 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.18 refs T233866 (duration: 01m 07s)
  • 20:44 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.18 refs T233866
  • 20:37 joal@deploy1001: Started deploy [analytics/refinery@a47f0d5]: Analytics regular weekly deploy
  • 20:34 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1269.eqiad.wmnet
  • 20:25 dzahn@cumin1001: conftool action : set/weight=25; selector: name=mw1267.eqiad.wmnet
  • 20:25 mutante: mw1267 restarting php7.2-fpm
  • 20:21 joal@deploy1001: Finished deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version (duration: 00m 08s)
  • 20:21 joal@deploy1001: Started deploy [analytics/hdfs-tools/deploy@714e2d0]: Deploy bug fix version
  • 20:09 twentyafterfour: Preparing to deploy wmf/1.35.0-wmf.18 to group1 wikis refs T233866
  • 20:09 moritzm: installing git security updates for jessie
  • 20:00 moritzm: installing unzip security updates
  • 19:44 mutante: LDAP - added spramduya to wmf group (T243802)
  • 19:38 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean up VisualEditor settings (duration: 01m 07s)
  • 19:38 ebernhardson: restart mjolnir-kafka-bulk-daemon across eqiad, daemons appear stuck and not reading new messages
  • 19:19 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T238029 Enable InukaPageView logging on production Wikipedias (duration: 01m 07s)
  • 19:15 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: Sync back revert of 975b4bbb9 (duration: 01m 06s)
  • 19:10 jforrester@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 18:35 vgutierrez: pooling cp5012 - T242093
  • 18:23 vgutierrez: rebooting cp5012 - T242093
  • 18:21 elukey: restart memcached on mc1025 with 8 threads (rollback - revert https://gerrit.wikimedia.org/r/#/c/570370/, run puppet, restart memcached)
  • 17:51 mutante: ganeti1017 - rebooting (not in use yet)
  • 17:34 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/languages/: T244300 (duration: 01m 13s)
  • 17:33 reedy@deploy1001: Synchronized php-1.35.0-wmf.18/includes/: T244300 (duration: 01m 14s)
  • 16:53 urandom: Sessionstore deployment (mediawiki-config) is done
  • 16:37 ppchelko@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:569678 Config: Enable sessionstore on group0 and 1 T243106 (duration: 01m 08s)
  • 16:25 jforrester@deploy1001: Synchronized wmf-config/CommonSettings.php: T232140 Restore wgLogoHD to wikis without a MinervaCustomLogos defined (duration: 01m 09s)
  • 16:07 elukey: update puppet compiler's facts
  • 15:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:52 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 15:29 effie: restart php-fpm on canaries - T236800
  • 15:24 effie: Rollout php-apcu_5.1.17+4.0.11-1+0~20190217111312.9+stretch~1.gbp192528+wmf2 to api, app and jobrunner canaries - T236800
  • 15:15 vgutierrez: depooling & reimaging cp5012 as buster - T242093
  • 15:12 ema: cp: unset Accept-Encoding from ats-be requests to applayer T242478
  • 14:35 vgutierrez: updating acme-chief to version 0.24 - T244236
  • 14:32 _joe_: restarting mcrouter at nice -19 on mw1331 for testing effects of that change
  • 14:30 vgutierrez: upload acme-chief 0.24 to apt.wm.o (buster) - T244236
  • 14:26 XioNoX: push inital flowspec config to all routers
  • 14:23 vgutierrez: pooling cp5006 - T242093
  • 14:13 ema: cp1075: back to leaving Accept-Encoding as it is due to unrelated applayer issues T242478
  • 13:46 marostegui: Decrease buffer pool size on db1107 for testing - T242702
  • 13:45 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:42 akosiaris: undo the manually set 10.2.1.42 eventgate-analytics.discovery.wmnet in /etc/hosts for mw1331, mw1348. Verify hypothesis that this should cause increased latency. Restart php-fpm
  • 13:41 ema: cp1075: unset Accept-Encoding on origin server requests T242478
  • 13:39 Amir1: EU SWAT is done
  • 13:38 ema: cp: disable puppet and merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/570311/ T242478
  • 13:35 XioNoX: rollback traffic steering off cr2-eqord
  • 13:29 akosiaris: manually set 10.2.1.42 eventgate-analytics.discovery.wmnet in /etc/hosts for mw1331, mw1348. Verify hypothesis that this should cause increased latency
  • 13:25 XioNoX: reboot cr2-eqord for software upgrade - yaaaaa
  • 13:24 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: SWAT: Cache PropertyInfoLookup internally (T243955) (duration: 01m 07s)
  • 13:17 XioNoX: increase ospf cost for cr2-eqord links
  • 13:16 vgutierrez: upload acme-chief 0.23 to apt.wm.o (buster) - T244236
  • 13:15 XioNoX: disable transit/peering BGP sessions on cr2-eqord
  • 13:15 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Wikibase/lib/includes/Store/CachingPropertyInfoLookup.php: SWAT: Cache PropertyInfoLookup internally (T243955) (duration: 01m 07s)
  • 13:10 XioNoX: rollback: disable transit/peering BGP sessions on cr2-eqdfw
  • 13:08 vgutierrez: depooling & reimaging cp5006 as buster - T242093
  • 13:03 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5cc2b70: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (T232140) (duration: 01m 06s)
  • 13:01 XioNoX: reboot cr2-eqdfw for software upgrade
  • 13:00 Amir1: SWAT needs more time
  • 12:55 XioNoX: disable transit/peering BGP sessions on cr2-eqdfw
  • 12:50 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: d450288: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (T232140) (duration: 01m 07s)
  • 12:48 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5cc2b70: wgLogoHD and $wgVectorPrintLogo is replaced with wgLogos (T232140) (duration: 01m 07s)
  • 12:32 awight@deploy1001: Synchronized php-1.35.0-wmf.18/extensions/Cite: SWAT: Revert follow standardization (T240858) (duration: 01m 13s)
  • 10:53 akosiaris: rolling restart of all pods on kubernetes staging cluster to make sure everything is fine after the upgrade
  • 10:50 akosiaris: T244335 upgrade kubernetes-node on kubestage1002.eqiad.wmnet to 1.13.12
  • 10:43 ema: cp4028: varnish-frontend-restart T243634
  • 10:24 akosiaris: T244335 upgrade kubernetes-master on neon.eqiad.wmnet (staging)
  • 10:24 effie: Upload php-apcu_5.1.17+4.0.11-1+0~20190217111312.9+stretch~1.gbp192528+wmf2 - T236800
  • 10:10 Urbanecm: Run mwscript deleteEqualMessages.php --delete to delete GrowthExperiments' message overrides (cswiki, viwiki, arwiki, kowiki)
  • 09:57 akosiaris: upload kubernetes 1.13.12 to apt.wikimedia.org stretch-wikimedia/main T244335
  • 09:51 effie: install libmemcached-tools on mc-gp* servers - T240684
  • 09:05 ema: add individual FortiGate IPs hitting ulsfo (currently cp4028) to vcl blocked_nets -- trying to identify problematic traffic T243634
  • 07:02 marostegui: Replay s1 traffic on db1107 (10.4) T242702
  • 06:32 elukey: force a puppet run on ores* hosts
  • 06:12 marostegui: Remove partitions from revision table db1098:3317 - T239453
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3317 - T239453', diff saved to https://phabricator.wikimedia.org/P10312 and previous config saved to /var/cache/conftool/dbconfig/20200205-060942-marostegui.json
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085:3311, db2086:3317 - T239453', diff saved to https://phabricator.wikimedia.org/P10311 and previous config saved to /var/cache/conftool/dbconfig/20200205-060911-marostegui.json
  • 02:38 cdanis: T243634 βœ”οΈ cdanis@cp4030.ulsfo.wmnet ~ πŸ•€πŸΊ sudo varnish-frontend-restart

2020-02-04

  • 22:35 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.18 refs T233866
  • 22:13 twentyafterfour@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.18 refs T233866 (duration: 32m 03s)
  • 22:03 cdanis@cumin2001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 21:41 twentyafterfour@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.18 refs T233866
  • 21:29 twentyafterfour: preparing the new mediawiki branch for deployment to test wikis
  • 20:31 shdubsh: restart kartotherian on maps2001
  • 20:24 shdubsh: temporarily enable access logs on maps2001
  • 20:20 twentyafterfour: branching mediawiki to wmf/1.35.0-wmf.18 from commit 054dd94e97d6 - train blockers should be added as subtasks under T233866
  • 20:06 marxarelli: temporarily holding 1.35.0-wmf.18 [[[phab:T233866|T233866]]] branch cut and train due to concurrent maps prod issues
  • 19:15 mutante: cp3065 - powercycling
  • 18:45 cdanis@cumin2001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 17:57 cdanis: βœ”οΈ cdanis@mw1272.eqiad.wmnet ~ πŸ•β˜• sudo restart-php7.2-fpm
  • 17:41 akosiaris: reenable kartotherian on maps100*
  • 17:34 oblivian@cumin1001: conftool action : set/weight=15; selector: cluster=appserver,service=nginx,dc=eqiad,name=mw12[3-5].*
  • 17:13 _joe_: restarting php-fpm on mw126[1-3]
  • 17:11 _joe_: restarting php-fpm on mw1266-9
  • 17:10 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/includes/filerepo/file/ForeignDBFile.php: gerrit: 570089, ongoing incident (duration: 01m 04s)
  • 17:07 _joe_: restarted php-fpm on mw1265 witrh 80 workers (teh default)
  • 17:07 _joe_: restarted php-fpm on mw1264 witrh 240 workers
  • 16:52 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Wikibase: fix for the recent outage (duration: 01m 21s)
  • 16:02 ema: cp: rolling ats-backend-restart to unset Accept-Encoding before sending origin server requests T242478
  • 14:23 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 14:18 akosiaris: deploy new wikifeeds chart that is consistent with the current scaffolding approach. No code deploy though.
  • 14:17 akosiaris@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 14:16 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 14:07 XioNoX: repool ulsfo
  • 14:03 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 14:00 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 13:36 XioNoX: restart cr3-ulsfo for software upgrade
  • 13:23 vgutierrez: upgrading acme-chief to version 0.22 - T240614
  • 13:10 vgutierrez: uploaded acme-chief 0.22 to apt.wm.o (buster) - T240614
  • 13:09 XioNoX: restart cr4-ulsfo for upgrade
  • 12:49 XioNoX: depool ulsfo for routers upgrade
  • 10:35 ema: cp4032: varnish-frontend-restart T243634
  • 09:08 vgutierrez: manually refreshing OCSP stapling response for non-canonical-redirects-3 - T243948
  • 09:07 marostegui: Upgrade s3 codfw master db2105 - T239791
  • 08:56 marostegui: Deploy schema change on enwiki eqiad host by host - T243804
  • 08:46 marostegui: Deploy schema change on enwiki codfw - T243804
  • 08:16 marostegui: Deploy schema change on testwiki - T243804
  • 08:13 marostegui: Deploy schema change on test2wiki - T243804
  • 07:36 marostegui: Upgrade Mariadb on db1107 from 10.4.11 to 10.4.12 T242702
  • 07:15 marostegui: Compress db1126 - T232446
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126 - T232446', diff saved to https://phabricator.wikimedia.org/P10302 and previous config saved to /var/cache/conftool/dbconfig/20200204-071420-marostegui.json
  • 07:09 marostegui: Compress db1091 - T232446
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1091 - T232446', diff saved to https://phabricator.wikimedia.org/P10301 and previous config saved to /var/cache/conftool/dbconfig/20200204-070804-marostegui.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3311, db2086:3317 - T239453', diff saved to https://phabricator.wikimedia.org/P10300 and previous config saved to /var/cache/conftool/dbconfig/20200204-070533-marostegui.json
  • 06:48 elukey: force a puppet run on all ores[12] nodes
  • 00:14 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [enwiki] Add Commons as an import source T242884 (duration: 00m 57s)
  • 00:09 mutante: gerrit1002 - replaced ens5 with ens6 in /etc/network/interfaces (IP and row had changed in the past, needed manual fix after reboot and now came back) ; mkfs.ext4 /dev/vdb on new additional 10GB disk. (T239151 T243983)
  • 00:06 jforrester@deploy1001: Synchronized dblists/visualeditor-nondefault.dblist: [nlwiki] Enable VisualEditor by default for all users T161365 (duration: 00m 58s)
  • 00:05 mutante: gerrit1002 - attempt to manually fix /etc/network interfaces , add IP on interface, reboot
  • 00:03 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Configure remainder of testwikis group for kask-session T243106 (duration: 00m 58s)
  • 00:02 volans: depool, varnish-frontend-restart, pool on cp4029 (~242k fds) - T243634

2020-02-03

  • 23:34 mutante: rebooting gerrit1002 (test VM)
  • 23:26 mutante: ganeti1003 - sudo gnt-instance modify --disk add:size=10G gerrit1002.wikimedia.org (T239151 T243983)
  • 23:24 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.16
  • 23:21 mutante: gerrit1002 - deleting gerrit.log and gerrit.json files from January to free about 4GB of space (T239151 T243983)
  • 23:12 XioNoX: removing AS15542 from esams
  • 22:18 andrew@deploy1001: Finished deploy [horizon/deploy@8bffc7d]: Fix for T243355 (duration: 03m 29s)
  • 22:14 andrew@deploy1001: Started deploy [horizon/deploy@8bffc7d]: Fix for T243355
  • 22:13 mutante: rebooting ganeti1010, ganeti1011 and other new ganeti machines to pickup microcode mitigations, for some reason the previous reboots did not do it. rescheduled service check on icinga for ganeti1010 and now it recovered (T228924)
  • 22:05 mutante: ganeti1010 - rebooting host to clear microcode mitigations CPU alert
  • 21:39 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.35.0-wmf.15"
  • 21:33 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.16
  • 21:28 brennen@deploy1001: Synchronized php-1.35.0-wmf.16/includes/TemplateParser.php: Syncing https://gerrit.wikimedia.org/r/c/mediawiki/core/+/569643 for T243548 (duration: 01m 08s)
  • 21:14 halfak@deploy1001: Finished deploy [ores/deploy@50a101a]: T243451 (duration: 12m 47s)
  • 21:01 halfak@deploy1001: Started deploy [ores/deploy@50a101a]: T243451
  • 20:43 mutante: doc1001 - sudo chown -R doc-uploader:doc-uploader /srv/docroot/
  • 20:19 XioNoX: reactivate L3 only LB in esams/knams
  • 20:19 XioNoX: remove test flowspec rule from cr3-knams
  • 20:13 mutante: doc1001 - re-enabled puppet after merging gerrit:569620 - Git::Clone[integration/docroot]/File[/srv/docroot]/mode: mode changed '2775' to '0755' - Profile::Doc/File[/srv/docroot/org/wikimedia/doc]/group: group changed 'doc-uploader' to 'wikidev', mode changed '0775' to '0755'. needs another follow-up (T237707)
  • 19:27 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [officewiki] Enable VisualEditor desktop section editing (duration: 01m 07s)
  • 19:21 Urbanecm: Morning SWAT done
  • 19:20 urbanecm@deploy1001: Synchronized wmf-config/InterwikiSortOrders.php: SWAT: 7b53a52: Add gcr, mnw and szy to InterwikiSortOrders (duration: 01m 11s)
  • 19:19 mutante: doc1001 - chown -R doc-uploader:doc-uploader /srv/docroot ; temp. disabled puppet (T237707)
  • 19:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 7bb6a12: Configure remainder of testwikis group for kask-transition (T243106) (duration: 01m 14s)
  • 18:58 mutante: < bblack> !log doc1001: chown -R nobody:wikidev /srv/docroot | < mutante> !doc1001 sudo -u doc-uploader chmod g+w /srv/docroot/org/wikimedia/doc | https://gerrit.wikimedia.org/r/c/operations/puppet/+/484304 | (T237707)
  • 18:44 bblack: doc1001: chown -R nobody:wikidev /srv/docroot
  • 18:34 brennen: edited /srv/mediawiki-stating/wikiversions.json on deploy1001; scap pull and scap wikiversions-compile on mwdebug1002; revert wikiversions changes on deploy1001.
  • 18:25 mholloway-shell@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 18:23 mholloway-shell@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'wikifeeds' for release 'production' .
  • 18:17 mholloway-shell@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'wikifeeds' for release 'staging' .
  • 16:52 eevans@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 16:48 eevans@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'sessionstore' for release 'production' .
  • 16:38 eevans@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'sessionstore' for release 'staging' .
  • 15:38 XioNoX: rollback: add debug on eqiad-knams link interfaces - T240659
  • 15:33 XioNoX: add debug on eqiad-knams link interfaces - T240659
  • 14:59 moritzm: restarting exim on phab* to pick up libidn security update
  • 14:55 moritzm: restarting superset on an-tool1004/1005 to pick up libidn security update
  • 14:44 moritzm: restarting apache on an-tool*. cloudmetrics*, logstash*, grafana1002 to pick up libidn security update
  • 14:21 moritzm: restarting slapd on ldap-corp* to pick up libidn2 security updates
  • 14:18 cdanis: T243634 βœ”οΈ cdanis@cp4031.ulsfo.wmnet ~ πŸ•€β˜• sudo varnish-frontend-restart
  • 13:58 moritzm: installing libidn2 security updates
  • 13:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:32 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:32 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:31 moritzm: rebooting ganeti1009 - ganeti1022 to pick up microcode update T228924
  • 12:58 XioNoX: deactivate v6 BGP to AS25596
  • 12:57 moritzm: installing spamassassin security updates
  • 12:53 Urbanecm: Previous message should be "EU SWAT done"
  • 12:52 Urbanecm: Morning SWAT done
  • 12:52 Urbanecm: Purge https://en.wikipedia.org/static/images/project-logos/zh_classicalwiki*.png (T243509)
  • 12:51 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: af0b745: Update logo for zh_classical wiki (T243509) (duration: 01m 06s)
  • 12:45 urbanecm@deploy1001: Synchronized dblists/mobilemainpagelegacy.dblist: SWAT: e9387b2: Disable MobileFrontend Mainpage special casing on frwiktionary (T241888) (duration: 01m 05s)
  • 12:40 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5f13c19: Add minerva custom log for la.wiki (T240728; 2/2) (duration: 01m 06s)
  • 12:37 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: 5f13c19: Add minerva custom log for la.wiki (T240728; 1/2) (duration: 01m 06s)
  • 12:35 moritzm: installing openjpeg2 security updates
  • 12:32 Urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-szl.svg (T233104)
  • 12:30 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: 76e67cd: e266e25: Add wordmarks for szlwiki and etwiki (T233104, T230379) (duration: 01m 06s)
  • 12:29 urbanecm@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: 76e67cd: e266e25: Add static wordmarks for szlwiki and etwiki (T233104, T230379) (duration: 01m 06s)
  • 12:25 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 32e0356: Add vzg-easydb.gbv.de to the wgCopyUploadsDomains (T243118) (duration: 01m 07s)
  • 12:20 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 6c48af8: Assign editautopatrolprotected to hewiki patrollers (T243665) (duration: 01m 06s)
  • 12:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 6b497e7: Wikidata - enable TaintedRefs (T241989) (duration: 01m 06s)
  • 12:09 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0c0ef87: Add wgImportSources for hiwikibooks (T244022) (duration: 01m 05s)
  • 12:07 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Remove $wgImgAuthDetails=true (T153459) (duration: 01m 36s)
  • 11:38 ema: powercycle cp3057 T244127 T238305
  • 10:24 godog: temp disable puppet on cp hosts as precaution for https://gerrit.wikimedia.org/r/c/operations/puppet/+/563977
  • 10:08 moritzm: installing sudo security updates on stretch

2020-02-02

  • 19:25 effie: restart varnish on cp4028
  • 08:48 effie: reboot host analytics1061 - T244081

2020-02-01

  • 18:17 effie: pool scb2003, no need for host to stay depooled - T244069
  • 17:46 cdanis: T243634 βœ”οΈ cdanis@cp4030.ulsfo.wmnet ~ πŸ•β˜• sudo varnish-frontend-restart
  • 17:27 effie: depool scb2003 T244069
  • 16:51 effie: pool mw1273
  • 16:50 effie: pool scb2003
  • 16:30 elukey: powerup analytics1073 (attempt to see if it was only a kernel-related crash) - T244064
  • 16:16 effie: poweroff analytics1073 - T244064
  • 16:16 effie: poweroff analytics1073 - /T244064
  • 16:16 effie: poweroff analytics1073
  • 13:00 effie: depool scb2003
  • 12:21 effie: depool mw1273
  • 01:03 eileen: process-control config revision is c3c8bde761
  • 00:50 eileen: civicrm revision changed from fcc5673ee7 to ee9edf8137, config revision is 2a61da0ace

2020-01-31

  • 22:25 eileen: civicrm revision changed from ac730a6bcb to fcc5673ee7, config revision is 2a61da0ace
  • 22:14 bstorm_: repooled labsdb1011 now that view work is done
  • 22:00 eileen: process-control config revision is 2a61da0ace disabled process-control
  • 21:59 bstorm_: depooled labsdb1011
  • 21:32 bstorm_: updated views on labsdb1010
  • 21:22 bstorm_: updated views on labsdb1009
  • 21:21 bstorm_: updated actor views on labsdb1012
  • 18:17 bblack: repool cp4032 (buster)
  • 18:17 bblack@cumin1001: conftool action : set/pooled=yes; selector: name=cp4032.ulsfo.wmnet
  • 18:14 bblack: repool cp4029
  • 18:13 bblack: restarted ats-tls and varnish-fe on cp4029
  • 18:05 bblack: depool varnish-fe on cp4029
  • 18:03 bblack: depool ats-tls on cp4029
  • 16:59 marostegui: Re-enable notifications on the dbstore1005:3318 check T243871
  • 09:18 addshore: addshore@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildPropertyTerms.php --wiki=wikidatawiki --sleep 4 --batch-size=25 # In a screen for T219301
  • 03:22 mutante: powercycling crashed cp3063
  • 01:09 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@322ee4c]: Update mobileapps to 3eec28d (duration: 06m 53s)
  • 01:02 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@322ee4c]: Update mobileapps to 3eec28d
  • 00:41 mutante: contint1001/contint2001 - upgrading jenkins to 2.219
  • 00:36 mutante: releases2001: upgrading jenkins to 2.219; install1002: import jenkins 2.219 into jessie-wikimedia APT repo
  • 00:31 mutante: importing jenkins 2.219 to stretch-wikimedia APT repo; releases1001: upgrading jenkins to 2.219

2020-01-30

  • 19:37 mutante: copying /var/log/apache2 to /root on all eqiad mw appservers to preserve logs
  • 18:07 vgutierrez: depool cp4032 and perform a rolling restart of varnish-fe at cp4027-cp4031 - T243634
  • 17:51 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Wikibase/lib/includes/Store/Sql/Terms/FingerprintableEntityTermStoreTrait.php: wbterms: Fix incorrect deletion of rows in findActuallyUnusedTermIds (T243944) (duration: 01m 06s)
  • 17:49 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Wikibase/repo/maintenance/rebuildItemTerms.php: wbterms: Write only to the new term store in rebuildItemTerms (T243944) (duration: 01m 09s)
  • 17:03 vgutierrez: repooling cp4032 - T243634
  • 17:02 vgutierrez: restarting varnish-frontend on cp4031 before it crashes - T243634
  • 16:26 vgutierrez: manually refreshing OCSP stapling response for non-canonical-redirects-3 - T243948
  • 12:22 arturo: add prometheus 2.7.1+ds-3+k8s+buster to buster-wikimedia T238096 (basically a rebuild from stretch)
  • 06:23 vgutierrez: restarting varnish-frontend on cp4030 before it crashes - T243634
  • 06:21 vgutierrez: depool cp4032 - T243634
  • 05:12 vgutierrez: restarting varnish-frontend and repooling cp4029 - T243634
  • 05:00 vgutierrez: depooling cp4029

2020-01-29

  • 23:37 marostegui: Remove partitions from db2087:3317 - T239453
  • 18:17 XioNoX: move knams netflow sampling to cr3-knams
  • 17:19 krinkle@deploy1001: Synchronized wmf-config/etcd.php: Ice8dad2 (duration: 01m 10s)
  • 01:11 vgutierrez: varnish-frontend restarted on cp4031
  • 01:09 vgutierrez: repool cp4031
  • 01:05 marostegui: Disable notifications for dbstore1005:3318 slave lag - T243871
  • 01:03 vgutierrez: depool cp4031
  • 00:35 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1097:3314 T239453', diff saved to https://phabricator.wikimedia.org/P10289 and previous config saved to /var/cache/conftool/dbconfig/20200129-003507-marostegui.json
  • 00:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3314 T239453', diff saved to https://phabricator.wikimedia.org/P10288 and previous config saved to /var/cache/conftool/dbconfig/20200129-002203-marostegui.json

2020-01-28

  • 23:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3314 T239453', diff saved to https://phabricator.wikimedia.org/P10287 and previous config saved to /var/cache/conftool/dbconfig/20200128-235336-marostegui.json
  • 23:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3314 T239453', diff saved to https://phabricator.wikimedia.org/P10286 and previous config saved to /var/cache/conftool/dbconfig/20200128-234601-marostegui.json
  • 23:42 marostegui@cumin1001: dbctl commit (dc=all): 'Start repooling db1084 with its original weight', diff saved to https://phabricator.wikimedia.org/P10285 and previous config saved to /var/cache/conftool/dbconfig/20200128-234219-marostegui.json
  • 23:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1121 T232446', diff saved to https://phabricator.wikimedia.org/P10284 and previous config saved to /var/cache/conftool/dbconfig/20200128-234037-marostegui.json
  • 15:06 addshore: Start addshore@mwmaint1002:~$ ./T219123.sh # Taking over from @ladsgroup for T219123
  • 09:59 effie: rolling restart mobileapps in codfw
  • 02:05 mutante: gerrit1002 - gzipping a bunch of /var/log/gerrit/ log files (T243808)

2020-01-27

  • 23:40 eileen: civicrm revision changed from fbd5c35fb0 to ac730a6bcb, config revision is 837b9d0703
  • 23:10 vgutierrez: rolling restart of varnish-frontend in cp4026 and cp4027
  • 23:06 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 23:06 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 23:01 _joe_: restart apache on gerrit
  • 22:58 vgutierrez: restarting gerrit service
  • 22:01 vgutierrez: restarting varnish-fe on cp4028
  • 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3311 - T239453', diff saved to https://phabricator.wikimedia.org/P10277 and previous config saved to /var/cache/conftool/dbconfig/20200127-191614-marostegui.json
  • 19:15 marostegui: Remove partitions from db2085 enwiki - T239453
  • 13:58 vgutierrez: repooling cp4030 - T243634
  • 13:54 vgutierrez: restarting varnish-fe on cp4030 - T243634
  • 13:54 vgutierrez: repooling cp4029 - T243634
  • 13:36 vgutierrez: restarting varnish-fe on cp4029 - T243634
  • 12:10 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --from-id 1860 --to-id 1860 (T243705)
  • 03:29 gehel: restarting blazegraph on wdqs100[57]

2020-01-26

  • 21:45 akosiaris: repool maps1003
  • 21:45 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=maps1003.*
  • 21:42 akosiaris: test depool maps1003
  • 21:42 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=maps1003.*
  • 21:38 vgutierrez: powercycling cp3051 - T238305
  • 21:23 akosiaris: restart kartotherian on maps1002
  • 21:19 vgutierrez: restart varnish-fe and ats-tls on cp3056
  • 21:02 bblack: ats-tls-restart on cp3064
  • 20:51 bblack: esams text caches: reverting earlier sysctl mitigations
  • 18:11 volans: shutdown elastic2043 - T243715
  • 18:01 volans: depooled elastic2043 - T243715
  • 18:01 volans@cumin1001: conftool action : set/pooled=inactive; selector: name=elastic2043.codfw.wmnet
  • 17:28 elukey: restart varnishkafka-webrequest on cp3064
  • 17:25 elukey: restart varnishkafka-webrequest on cp3056
  • 17:03 bblack: reduce /proc/sys/net/ipv4/tcp_max_syn_backlog to 8192 on esams text caches
  • 16:55 bblack: reduce /proc/sys/net/ipv4/tcp_synack_retries to 1 on esams text caches
  • 16:42 cdanis: βœ”οΈ cdanis@cp4030.ulsfo.wmnet ~ πŸ•¦β˜• sudo depool
  • 16:38 bblack: applying GRE MTU mitigation from T232602 to all cp1, cp3, cp5 cache nodes
  • 15:43 XioNoX: 3*prepend in esams/knams
  • 15:26 elukey: repool deployed
  • 15:24 elukey: repool esams
  • 15:01 cdanis: deployed
  • 15:00 cdanis: depool esams
  • 14:56 XioNoX: enabling netflow sampling on the knams-esams links (esams side)
  • 11:25 effie: restarted tilerator and tileratorui on maps1002
  • 11:23 effie: restarted tilerator and tileratorui on maps1001
  • 10:38 effie: deployed
  • 10:37 effie: Pool esams back
  • 01:12 cdanis: deployed
  • 01:12 cdanis: depool esams with new geo-maps-esams-offline

2020-01-25

  • 12:49 Urbanecm: Run mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=mediawikiwiki --logwiki=metawiki TokyVrpns Mike20LCN (T243668)
  • afk: restarting gerrit-replica

2020-01-24

  • 22:31 mutante: ganeti1003 - sudo gnt-instance remove etherpad1001.eqiad.wmnet (T224580)
  • 22:21 mutante: shutting down etherpad1001 - service fully migrated to etherpad1002 - running decom cookbook on ganeti VM (T224580)
  • 22:20 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 22:19 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:18 cdanis: βœ”οΈ cdanis@cp4029.ulsfo.wmnet ~ πŸ•ŸπŸ΅ sudo depool
  • 17:54 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Clean up CheckUser config (duration: 01m 09s)
  • 15:43 gehel: restart blazegraph + updater on wdqs1007 (seems stuck, known issue)
  • 15:33 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
  • 14:28 vgutierrez: uploaded mtail 3.0.0~rc5-1~bpo9+1wmf2 to apt.wm.o (buster) - T243591
  • 14:26 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:24 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 14:23 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:16 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 11:09 moritzm: purged stale grafana package from grafana1001, caused systemd unit failure
  • 11:04 effie: restart php-fpm on mw1238-mw1239
  • 09:29 akosiaris: disable and mask etherpad-lite on etherpad1002 to avoid corruption issues. T224580
  • 08:42 marostegui: Remove wikiadmin2 user from pc2XXX codfw hosts T243512
  • 08:17 moritzm: installing python-apt security updates
  • 07:19 _joe_: force run puppet on all esams cache nodes, for mitigation of T243313
  • 06:37 marostegui: Stop replication on db1107
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2085 after memory replacement T243148', diff saved to https://phabricator.wikimedia.org/P10256 and previous config saved to /var/cache/conftool/dbconfig/20200124-061228-marostegui.json
  • 01:24 mutante: running puppet on cp-text_ulsfo
  • 00:46 mutante: cp4032 - starting varnishmtail.service
  • 00:36 catrope@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/CentralNotice/resources/ext.centralNotice.display/hide.js: T240802 (duration: 01m 05s)
  • 00:34 catrope@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/CentralNotice/resources/ext.centralNotice.display/hide.js: T240802 (duration: 01m 07s)
  • 00:33 mutante: cp4032 - starting varnishmtail.service which was failed
  • 00:32 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump Parsoid/PHP cluster memory_limit again (T239806, T236833) (duration: 01m 05s)

2020-01-23

  • 21:08 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
  • 20:30 brennen@deploy1001: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.35.0-wmf.15"
  • 20:29 brennen: reverting group2 to 1.35.0-wmf.15
  • 20:10 brennen@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.16
  • 20:00 Urbanecm: Morning SWAT done
  • 19:56 mlitn@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add 3d-patents page to wgForceUIMsgAsContentMsg (duration: 01m 08s)
  • 19:15 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 2d8f773: Use editeditorprotected for protecting pages for editors (T230103) (duration: 01m 05s)
  • 19:10 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/WikimediaMessages/extension.json: SWAT: 23a6f8e: InukaPageView: update schema version (T238029) (duration: 01m 05s)
  • 19:06 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 629b5fc: Add *.eso.org to the wgCopyUploadsDomains (T243423) (duration: 01m 06s)
  • 19:03 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 18:59 mutante: ganeti1003 - creating new VM etherpad1002.eqiad.wmnet with 1GB RAM and 10GB disk, row C, private link (T243475)
  • 18:58 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 18:54 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
  • 18:47 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
  • 18:40 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Stop setting wgWikimediaMessagesPartialBlockBanner, never read T240300 (duration: 01m 06s)
  • 18:35 rlazarus: etcd main cluster switchover complete, eqiad is now read-write
  • 18:28 otto@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'eventstreams' for release 'production' .
  • 18:27 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
  • 18:22 vgutierrez: pooling cp4032 running buster - T242093
  • 18:15 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventstreams' for release 'production' .
  • 18:05 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:05 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:03 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:03 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:02 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:01 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:59 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:53 robh@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 17:53 robh@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:52 _joe_: running systemctl reset-failed on conf1005 to clear useless alerts
  • 17:33 marostegui: Poweroff db2085:3311 and db2085:3318 for maintenance - T243148
  • 17:33 jforrester@deploy1001: Synchronized static/images/project-logos: [trwiki] Tweak logo versions T242977 (duration: 01m 07s)
  • 17:00 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 16:59 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 16:58 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:56 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:51 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 16:27 vgutierrez: depool cp4032 and reimage as buster - T242093
  • 16:26 vgutierrez: pooling cp4026 running buster - T242093
  • 16:02 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/Wikibase/data-access/src/EntitySourceDefinitions.php: EntitySourceDefitions::getEntityTypeToSourceMapping fix for sub entities (T242415 T214557) (duration: 01m 08s)
  • 16:00 rlazarus: Starting etcd main cluster switchover from codfw to eqiad
  • 15:45 vgutierrez: restarting high-traffic1 && high-traffic2 primary LVSs - T236120 T238625
  • 15:32 vgutierrez: restarting secondary LVSs - T236120 T238625
  • 15:22 moritzm: mask uwsgi.service on debmonitor2001 T222874
  • 15:06 vgutierrez@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4026.ulsfo.wmnet,service=nginx
  • 14:39 vgutierrez@puppetmaster1001: conftool action : set/weight=1; selector: service=ats-tls,name=cp4026.ulsfo.wmnet
  • 14:17 marostegui: Remove wikiadmin2 user from codfw x1 hosts - T243512
  • 13:34 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:32 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:19 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:19 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:50 Amir1: EU SWAT is done
  • 12:49 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set EntitySourceBasedFederation true for testwiki (T243395) (duration: 01m 06s)
  • 12:47 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set EntitySourceBasedFederation true for testwiki (T243395) (duration: 01m 05s)
  • 12:46 Urbanecm: Run renameRestrictions.php 'autopatrol' 'editautopatrolprotected' for all Serbian wikis (T230103)
  • 12:44 Urbanecm: mwscript renameRestrictions.php --wiki=hewiki 'autopatrol' 'editautopatrolprotected' (T230103)
  • 12:44 Urbanecm: mwscript renameRestrictions.php --wiki=etwiki 'autopatrol' 'editautopatrolprotected' (T230103)
  • 12:41 urbanecm@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: 0c2fb70: Use editautopatrolprotected right for pages protected for autopatrollers (3/3; T230103) (duration: 01m 05s)
  • 12:39 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: 0c2fb70: Use editautopatrolprotected right for pages protected for autopatrollers (2/3; T230103) (duration: 01m 08s)
  • 12:35 Urbanecm: mwscript renameRestrictions.php --wiki=ckbwiki 'autopatrol' 'editautopatrolprotected' (T230103)
  • 12:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0c2fb70: Use editautopatrolprotected right for pages protected for autopatrollers; fixing broken cache (T230103) (duration: 01m 04s)
  • 12:31 twentyafterfour: Deploying hotfix for T243479, restarting php7.3-fpm on phab1003
  • 12:31 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0c2fb70: Use editautopatrolprotected right for pages protected for autopatrollers (T230103) (duration: 01m 06s)
  • 12:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set useEntitySourceBasedFederation to true for Wikidata (T241972) (duration: 01m 04s)
  • 12:14 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set useEntitySourceBasedFederation to true for Wikidata (T241972) (duration: 01m 06s)
  • 12:10 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Move CX out of beta for af, is, lv and ne WPs (T242011 T242012 T242014 T242016) (duration: 01m 05s)
  • 12:08 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Move CX out of beta for af, is, lv and ne WPs (T242011 T242012 T242014 T242016) (duration: 01m 08s)
  • 11:37 jbond42: updating order in resolve search list https://gerrit.wikimedia.org/r/c/operations/puppet/+/566567
  • 10:25 vgutierrez: depooling and reimaging cp4026 as buster - T242093
  • 09:13 moritzm: installing xen updates (only pulled in via deps, otherwise unused)
  • 08:46 marostegui: Stop mysql on es2024 to "clone" es2025 - T243052
  • 06:05 marostegui: Remove partitions from db1097:3314 - T239453
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314 - T239453', diff saved to https://phabricator.wikimedia.org/P10248 and previous config saved to /var/cache/conftool/dbconfig/20200123-060308-marostegui.json
  • 05:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3314 - T239453', diff saved to https://phabricator.wikimedia.org/P10247 and previous config saved to /var/cache/conftool/dbconfig/20200123-055919-marostegui.json
  • 05:55 marostegui: Compress some tables on db1124:3318, this might generate lag on s8 labs - T232446
  • 01:40 jforrester@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/AbuseFilter/includes/AFComputedVariable.php: T243469 When no registration date is recorded, use 2008-01-15 (duration: 01m 08s)
  • 01:37 twentyafterfour: Phabricator deployment completed with no apparent issues.
  • 01:27 twentyafterfour: Deploying phabricator update tagged release/2020-01-23/1
  • 00:41 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: resync (duration: 01m 07s)
  • 00:40 RoanKattouw: Deployment freeze lifted

2020-01-22

  • 23:46 James_F: <RoanKattouw> T236104 happened again, and this time I'm leaving it broken so I can investigate. Please don't use do any MW deployments (use scap) for now
  • 23:31 eileen: civicrm revision changed from 036b742316 to fbd5c35fb0, config revision is 74a355670a
  • 23:28 eileen: civicrm revision changed from 7595104180 to 036b742316, config revision is 74a355670a
  • 23:14 eileen: civicrm revision changed from c74092ad63 to 7595104180, config revision is 74a355670a
  • 23:06 XioNoX: configure flowspec on cr3-knams
  • 22:39 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable homepage on ukwiki, huwiki, hywiki (T238320, T231720, T230478, T230676) (duration: 01m 05s)
  • 22:30 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable help panel on ukwiki, huwiki, hywiki (T238319, T231720, T230478, T230676) (duration: 01m 04s)
  • 22:19 catrope@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/CodeReview/: T243337 (duration: 01m 06s)
  • 22:13 catrope@deploy1001: Finished scap: i18n changes for SWAT: Special page aliases for GrowthExperiments (T230676); messages for machinevision-tester group (T243440); fix namespace names for atj (T243125) (duration: 40m 48s)
  • 21:32 catrope@deploy1001: Started scap: i18n changes for SWAT: Special page aliases for GrowthExperiments (T230676); messages for machinevision-tester group (T243440); fix namespace names for atj (T243125)
  • 21:28 arlolra: Updated Parsoid to 7390988 (T242513, T243008, T241146)
  • 21:18 arlolra@deploy1001: Finished deploy [parsoid/deploy@e8610ff]: Updating Parsoid to 7390988 (duration: 08m 28s)
  • 21:10 arlolra@deploy1001: Started deploy [parsoid/deploy@e8610ff]: Updating Parsoid to 7390988
  • 20:07 brennen@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.16 (duration: 01m 05s)
  • 20:06 brennen@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.16
  • 19:46 catrope@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/WikimediaMessages/: Remove temporary partial block banner (T240300) (duration: 01m 06s)
  • 19:45 catrope@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/WikimediaMessages/: Remove temporary partial block banner (T240300) (duration: 01m 10s)
  • 19:43 gehel: restart tilerator / kartotherian on maps* servers
  • 19:36 catrope@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/WikimediaEvents/: InukaPageView: update schema version (T238029) (duration: 01m 07s)
  • 19:26 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable UnderstandingFirstDay on ukwiki, huwiki, hywiki (T238294) (duration: 01m 06s)
  • 17:46 arturo: forcing by hand the first sync on sodium for openstack packages (T238820)
  • 16:40 vgutierrez: removing nginx from the caching cluster
  • 16:26 moritzm: installing tiff security updates for buster
  • 16:21 vgutierrez: copied prometheus-trafficserver-exporter from stretch to buster on apt.w.o - T242093
  • 16:13 XioNoX: update logging target for pfw3-eqiad - T243343
  • 16:07 XioNoX: update logging target for pfw3-codfw - T243343
  • 15:43 vgutierrez: uploaded vhtcpd 0.1.2-2 to apt.w.o (buster) - T242093
  • 15:38 marostegui: Compress wikidatawiki.wbt_text wikidatawiki.wbt_text_in_lang on db1124:3318 (this might cause lag on s8 labs) - T232446
  • 15:29 vgutierrez: uploaded fifo-log-demux 0.6.1 to apt.w.o (buster) - T242093
  • 14:54 papaul: FW upgrade on db2085
  • 14:53 vgutierrez: copied python3-logstash to apt.w.o (buster) - T242093
  • 14:50 vgutierrez: copied python3-file-read-backwards to apt.w.o (buster) - T242093
  • 14:39 marostegui: Stop MySQL on db2085:3311 and db2085:3318 for onsite maintenance - T243148
  • 14:39 marostegui: Stop MySQL on db2085:3311 and db2085:3318 for onsite maintenance -
  • 14:18 akosiaris: upload etherpad-lite_1.7.5-3 to apt.wikimedia.org buster-wikimedia/main T224580
  • 13:07 Amir1: EU SWAT is over
  • 13:03 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert: Set useEntitySourceBasedFederation to true for Wikidata (T241972) (duration: 01m 05s)
  • 13:02 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert: Set useEntitySourceBasedFederation to true for Wikidata (T241972) (duration: 01m 05s)
  • 12:59 effie: restart npre on notebook1003
  • 12:57 hoo: Updated the Wikidata property suggester with data from the 2020-01-06 JSON dump and applied the T132839 workarounds
  • 12:51 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set useEntitySourceBasedFederation to true for Wikidata (T241972) (duration: 01m 05s)
  • 12:50 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set useEntitySourceBasedFederation to true for Wikidata (T241972) (duration: 01m 06s)
  • 12:47 jbond42: disable puppet fleat wide - upgrade jdk on puppetdb
  • 12:46 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/WikibaseQualityConstraints: Better dependency injection of base URI in ConstraintParameterParser (T241972) (duration: 01m 05s)
  • 12:43 ladsgroup@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 12:36 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.16/extensions/WikibaseQualityConstraints: Better dependency injection of base URI in ConstraintParameterParser (T241972) (duration: 01m 14s)
  • 12:35 effie: enable puppet and restart mtail on mw* and wtp*
  • 12:30 vgutierrez: uploaded trafficserver 8.0.5-1wm13 to apt.w.o (buster) - T242093
  • 12:17 effie: Disable puppet on mw* and wtp* to merge 563206
  • 12:15 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 12:14 jmm@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:40 moritzm: restarting apache on puppetboard/graphite/webperf to pick up OpenLDAP update
  • 11:38 cormacparle__: disabled wikitech 2fa for Cparle
  • 11:16 moritzm: restarting exim on MXes to pick up new openldap
  • 11:04 moritzm: restarting mw canaries to pick up openldap update
  • 10:09 marostegui: Stop MySQL on es2023 to "clone" es2024 - T243052
  • 10:04 moritzm: installing openldap security updates on stretch
  • 08:45 moritzm: upload prometheus-etherpad-exporter 0.2 to buster-wikimedia T224580
  • 08:27 marostegui: Stop MySQL on es2021 to "clone" es2023 - T243052
  • 06:16 marostegui: Remove partitions from db1103:3314 - T239453
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3314 T239453', diff saved to https://phabricator.wikimedia.org/P10242 and previous config saved to /var/cache/conftool/dbconfig/20200122-061522-marostegui.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091:3314 T239453', diff saved to https://phabricator.wikimedia.org/P10241 and previous config saved to /var/cache/conftool/dbconfig/20200122-061429-marostegui.json
  • 01:13 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: resync, the last sync only took on half the appservers (duration: 01m 05s)
  • 00:31 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable topics in suggested edits on cswiki, kowiki, arwiki, viwiki (duration: 01m 05s)
  • 00:26 catrope@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/GrowthExperiments/: SWAT for T242811, T242052 (duration: 01m 05s)

2020-01-21

  • 20:09 brennen@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.16
  • 19:59 mutante: puppet-compilers: syncing facts from puppetmasters to 3 compiler instances
  • 19:55 XioNoX: restart mr1-esams for software upgrade - T242097
  • 19:46 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@1ca3071]: Add separate rule for machine vision jobs T241072 (duration: 01m 11s)
  • 19:45 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@1ca3071]: Add separate rule for machine vision jobs T241072
  • 19:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:39 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:39 XioNoX: mr1-esams> request system software add /var/tmp/junos-srxsme-18.2R3-S2... - T242097
  • 19:39 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:38 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:22 XioNoX: cr3-knams# set routing-options ppm no-delegate-processing - T240659
  • 19:01 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 19:00 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 19:00 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 18:59 andrew@cumin1001: START - Cookbook sre.hosts.decommission
  • 18:50 brennen@deploy1001: Finished scap: testwiki to php-1.35.0-wmf.16 and rebuild l10n cache (duration: 30m 27s)
  • 18:19 brennen@deploy1001: Started scap: testwiki to php-1.35.0-wmf.16 and rebuild l10n cache
  • 17:45 XioNoX: add dwisehaupt user to pfw/fasw - T242758
  • 17:44 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@986769c]: bulk_daemon: Treat model exists as unrecoverable failure (duration: 05m 42s)
  • 17:39 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@986769c]: bulk_daemon: Treat model exists as unrecoverable failure
  • 17:37 bstorm_: re-exported NFS from labstore1006/7
  • 17:33 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@ae77f9d]: Deploy ores_drafttopics dag (duration: 00m 22s)
  • 17:32 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@ae77f9d]: Deploy ores_drafttopics dag
  • 17:20 brennen: starting branch cut for T233864
  • 17:08 XioNoX: restart pfw3-eqiad for software upgrade
  • 16:45 XioNoX: install software upgrade on pfw3a-eqiad (primary, no restart yet)
  • 16:35 XioNoX: install software upgrade on pfw3b-eqiad (secondary, no restart yet)
  • 16:15 vgutierrez: copied prometheus-varnishkafka-exporter from stretch to buster on apt.w.o - T242093
  • 16:02 vgutierrez: uploaded libvmod-tbf 2.0.91-2wm to apt.w.o (buster) - T242093
  • 14:57 vgutierrez: uploaded libvmod-re2 1.3.1-3 to apt.w.o (buster) - T242093
  • 14:56 vgutierrez: uploaded libvmod-netmapper 1.7-3 to apt.w.o (buster) - T242093
  • 14:39 moritzm: stopping/masking tor on torrelay1001 T243288
  • 14:38 effie: Rolling restart all eqiad mw api servers
  • 14:37 vgutierrez: uploaded varnish-modules 0.12-1+wmf2 to apt.w.o (buster) - T242093
  • 14:36 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:36 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 14:36 _joe_: restart pybal on low-traffic eqiad to pick up new configuration
  • 14:33 cdanis@cumin2001: conftool action : set/weight=30; selector: cluster=api_appserver,dc=eqiad,service=apache2,name=mw13.*
  • 14:33 cdanis@cumin2001: conftool action : set/weight=30; selector: cluster=api_appserver,dc=eqiad,service=nginx,name=mw13.*
  • 14:30 cdanis@cumin2001: conftool action : set/weight=15; selector: cluster=api_appserver,dc=eqiad,service=nginx,name=mw12[23].*
  • 14:24 _joe_: restarting pybal on lvs low-traffic in codfw
  • 14:02 oblivian@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: service=kubesvc,cluster=kubernetes
  • 13:24 marostegui: Clean up some gerrit grants on db1132 (m2 master) T233714
  • 13:00 mvolz@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'zotero' for release 'production' .
  • 12:29 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert Set useEntitySourceBasedFederation to true for Wikidata (T241972) (duration: 00m 58s)
  • 12:28 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Revert Set useEntitySourceBasedFederation to true for Wikidata (T241972) (duration: 01m 00s)
  • 12:21 mvolz@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'zotero' for release 'production' .
  • 12:19 vgutierrez: upgrading pybal on esams and eqiad - T169765
  • 12:12 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set useEntitySourceBasedFederation to true for Wikidata (T241972) (duration: 00m 59s)
  • 12:07 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set useEntitySourceBasedFederation to true for Wikidata (T241972) (duration: 01m 12s)
  • 11:56 vgutierrez: upgrading pybal on eqsin and codfw - T169765
  • 11:54 vgutierrez: restarting pybal instancs on eqsin
  • 11:52 _joe_: restarting etcd on conf2003 to test new pybal reconnection. Issues expected for pybal in eqsin, but not in ulsfo
  • 11:44 jbond42: importing puppet-master packages to component/puppet5
  • 11:39 mvolz@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 11:24 vgutierrez: Updating pybal to 1.15.7 on ulsfo load balancers - T169765
  • 11:23 vgutierrez: uploaded pybal 1.15.7 to apt.w.o (stretch) - T169765
  • 11:22 mvolz@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 10:47 mvolz@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
  • 10:40 mvolz@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
  • 10:38 akosiaris@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'zotero' for release 'staging' .
  • 10:36 godog: roll-restart thumbor after https://gerrit.wikimedia.org/r/c/operations/puppet/+/566069
  • 10:05 volans@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:05 volans@cumin2001: START - Cookbook sre.hosts.downtime
  • 07:34 oblivian@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 07:29 oblivian@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'citoid' for release 'production' .
  • 07:23 _joe_: adding TLS to citoid in production
  • 07:20 oblivian@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'citoid' for release 'staging' .
  • 06:28 marostegui: Remove the following users from phabricator database: 'phadmin'@'10.64.48.21' 'phuser'@'10.64.48.21' 'phstats'@'10.64.48.21' 'phmanifest'@'10.64.48.21' T238957
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1087', diff saved to https://phabricator.wikimedia.org/P10233 and previous config saved to /var/cache/conftool/dbconfig/20200121-061932-marostegui.json
  • 06:19 marostegui: Aborted upgrade on db1087 (wiki dumps are running)
  • 06:18 marostegui: Upgrade db1087
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1087 for upgrade', diff saved to https://phabricator.wikimedia.org/P10232 and previous config saved to /var/cache/conftool/dbconfig/20200121-061756-marostegui.json
  • 06:05 marostegui: Stop replication on db1107
  • 05:58 marostegui: Stop MySQL on es2021 to clone es2022 - T243052
  • 05:52 marostegui: Remove partitions from db2091:3314 - T239453
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2091:3314 T239453', diff saved to https://phabricator.wikimedia.org/P10231 and previous config saved to /var/cache/conftool/dbconfig/20200121-055149-marostegui.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2084:3314 T239453', diff saved to https://phabricator.wikimedia.org/P10230 and previous config saved to /var/cache/conftool/dbconfig/20200121-055023-marostegui.json

2020-01-20

  • 16:14 Urbanecm: Change email assigned to User:Sadsadas (T243222)
  • 15:28 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@2a1f493]: Update mobileapps to 1848cf5 (duration: 05m 55s)
  • 15:22 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@2a1f493]: Update mobileapps to 1848cf5
  • 15:20 vgutierrez: rolling upgrade of ats to version 8.0.5-1wm12 - T242620 T242778
  • 15:03 vgutierrez: uploaded trafficserver 8.0.5-1wm12 to apt.wm.o (stretch) - T242620 T242778
  • 13:06 jbond42_: add SSL validation to conftool/etcd expected no-op (https://gerrit.wikimedia.org/r/c/operations/puppet/+/566009)
  • 12:45 vgutierrez: uploaded varnishkafka 1.0.14-1 to apt.wm.o (buster) - T242093
  • 12:25 elukey@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0)
  • 12:18 elukey@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper
  • 12:09 moritzm: removing actinium in Ganeti T224551
  • 12:08 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 12:07 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 11:43 moritzm: removing alsafi in Ganeti T224551
  • 11:41 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 11:40 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 11:32 jbond42_: reverting untill joes change is finished - add SSL validation to conftool/etcd expected no-op (https://gerrit.wikimedia.org/r/c/operations/puppet/+/561817)
  • 11:30 jbond42_: add SSL validation to conftool/etcd expected no-op (https://gerrit.wikimedia.org/r/c/operations/puppet/+/561817)
  • 11:14 vgutierrez: deploying wikiworkshop TLS certificate on the text cluster - T242374
  • 10:06 moritzm: removing alcyone/aluminium in Ganeti T224551
  • 10:06 moritzm: removing alcyone/aluminium in Ganeti
  • 10:04 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 10:04 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 10:01 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 10:01 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1129', diff saved to https://phabricator.wikimedia.org/P10225 and previous config saved to /var/cache/conftool/dbconfig/20200120-094445-marostegui.json
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1129', diff saved to https://phabricator.wikimedia.org/P10224 and previous config saved to /var/cache/conftool/dbconfig/20200120-093603-marostegui.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1129', diff saved to https://phabricator.wikimedia.org/P10223 and previous config saved to /var/cache/conftool/dbconfig/20200120-092642-marostegui.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1129', diff saved to https://phabricator.wikimedia.org/P10222 and previous config saved to /var/cache/conftool/dbconfig/20200120-091929-marostegui.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1094', diff saved to https://phabricator.wikimedia.org/P10221 and previous config saved to /var/cache/conftool/dbconfig/20200120-090850-marostegui.json
  • 09:06 marostegui: Upgrade db1129
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1129', diff saved to https://phabricator.wikimedia.org/P10220 and previous config saved to /var/cache/conftool/dbconfig/20200120-090617-marostegui.json
  • 09:05 moritzm: restarting CAS to pick up Java security updates
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P10219 and previous config saved to /var/cache/conftool/dbconfig/20200120-090336-marostegui.json
  • 09:01 moritzm: installing Java security updates on an-conf*
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P10218 and previous config saved to /var/cache/conftool/dbconfig/20200120-085537-marostegui.json
  • 08:51 marostegui: Upgrade db1139:3311 db1139:3316
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1094', diff saved to https://phabricator.wikimedia.org/P10217 and previous config saved to /var/cache/conftool/dbconfig/20200120-084908-marostegui.json
  • 08:44 marostegui: Upgrade db1094
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1094', diff saved to https://phabricator.wikimedia.org/P10216 and previous config saved to /var/cache/conftool/dbconfig/20200120-084408-marostegui.json
  • 08:10 marostegui: Compare data on db2085:3318 - T243148
  • 08:07 ema: powercycle cp3061 T238305
  • 07:15 marostegui: Remove partitions from revision on db2084:3314 T239453
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2084 T239453', diff saved to https://phabricator.wikimedia.org/P10215 and previous config saved to /var/cache/conftool/dbconfig/20200120-071513-marostegui.json
  • 07:10 marostegui: Stop MySQL on es2020 to clone es2021 - T243052
  • 06:09 marostegui: Stop replication on db1107
  • 06:08 marostegui: Compress db1121 - T232446
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121, pool db1084 into vslow T232446', diff saved to https://phabricator.wikimedia.org/P10214 and previous config saved to /var/cache/conftool/dbconfig/20200120-060759-marostegui.json

2020-01-19

  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2085:3311, db2085:3318 T243148', diff saved to https://phabricator.wikimedia.org/P10210 and previous config saved to /var/cache/conftool/dbconfig/20200119-120236-marostegui.json
  • 11:20 elukey: restart-php-fpm on mw2181 to rule out temporary php-related issues in codfw
  • 00:46 cdanis: T238305 cp3053.mgmt /admin1-> racadm serveraction hardreset

2020-01-18

  • off: upgraded spicerack to 0.0.29 on cumin hosts
  • 09:00 dcausse: repool wdqs1007 (T242453)
  • 07:05 marostegui: Remove partitions from enwiki.revision on db2085 T239453
  • 04:15 cdanis: cp3065.mgmt: /admin1-> racadm serveraction hardreset T238305

2020-01-17

  • 21:56 urandom: bootstrapping restbase2023-c β€” T243000
  • 21:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 20:40 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 20:07 urandom: bootstrapping restbase2023-b β€” T243000
  • 20:01 bblack: reset bgp peerings with gfiber on cr2-eqiad
  • 19:14 mutante: gerrit - switching operations/debs/hhvm to READONLY mode and adding ARCHIVED to description (T237038)
  • 18:19 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 18:18 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 17:15 urandom: bootstrapping restbase2023-a β€” T243000
  • 16:33 marostegui: Stop replication on db1107
  • 16:25 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@938d253]: Move weekly elasticsearch transfer to airflow (duration: 00m 21s)
  • 16:25 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@938d253]: Move weekly elasticsearch transfer to airflow
  • 14:31 urandom: bootstrapping restbase2022-c β€” T243000
  • 14:09 awight@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/Cite: UBN backport: Fix for nested #tag:references and empty name (T242437) (duration: 00m 57s)
  • 14:03 awight: beginning Friday deployment for UBN, T242437
  • 13:38 moritzm: masking squid3 on old URL downloaders T224551
  • 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:35 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:35 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 12:55 effie: Updgrade netmon* to to php 7.2.26 and restart - T241222
  • 11:48 moritzm: upgrading PHP 7.2 on netmon* (also apache restart for SSL update)
  • 11:13 elukey: restart nginx on analitycs tool hosts to pick up openssl updates
  • 11:05 moritzm: restarting apache on matomo1001 to pick up SSL updates
  • 11:04 XioNoX: Running homer to remove decom cloud vlans in eqiad/codfw - T240670
  • 11:01 XioNoX: delete vlan cloud-instances1-b-eqiad from asw2-b-eqiad - T240670
  • 10:43 moritzm: restarting apache on miscweb* to pick up SSL updates
  • 10:39 moritzm: restarting apache on puppetboard* to pick up SSL updates
  • 10:32 moritzm: installing remaining OpenSSL 1.0.2 updates
  • 09:25 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:25 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2103', diff saved to https://phabricator.wikimedia.org/P10202 and previous config saved to /var/cache/conftool/dbconfig/20200117-085808-marostegui.json
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1082', diff saved to https://phabricator.wikimedia.org/P10201 and previous config saved to /var/cache/conftool/dbconfig/20200117-075125-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P10200 and previous config saved to /var/cache/conftool/dbconfig/20200117-074626-marostegui.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1081', diff saved to https://phabricator.wikimedia.org/P10199 and previous config saved to /var/cache/conftool/dbconfig/20200117-073954-marostegui.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P10198 and previous config saved to /var/cache/conftool/dbconfig/20200117-073917-marostegui.json
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1082', diff saved to https://phabricator.wikimedia.org/P10197 and previous config saved to /var/cache/conftool/dbconfig/20200117-072544-marostegui.json
  • 07:10 marostegui: Stop and upgrade db1082
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2019', diff saved to https://phabricator.wikimedia.org/P10193 and previous config saved to /var/cache/conftool/dbconfig/20200117-070636-marostegui.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es2012', diff saved to https://phabricator.wikimedia.org/P10192 and previous config saved to /var/cache/conftool/dbconfig/20200117-070516-marostegui.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2012', diff saved to https://phabricator.wikimedia.org/P10191 and previous config saved to /var/cache/conftool/dbconfig/20200117-070320-marostegui.json
  • 06:35 marostegui: Compress db1125:3314 tables - this will create lag on s4 labs hosts
  • 06:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1081', diff saved to https://phabricator.wikimedia.org/P10190 and previous config saved to /var/cache/conftool/dbconfig/20200117-062838-marostegui.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1081', diff saved to https://phabricator.wikimedia.org/P10189 and previous config saved to /var/cache/conftool/dbconfig/20200117-061602-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1081', diff saved to https://phabricator.wikimedia.org/P10188 and previous config saved to /var/cache/conftool/dbconfig/20200117-060259-marostegui.json
  • 02:45 urandom: bootstrapping restbase2022-b β€” T243000
  • 00:45 mutante: urldownloaders - rm /etc/logrotate.d/squid3 ; systemctl start logrotate (this fixes failed logrotate because of squid3 vs squid file = duplicate entry, but puppet will recreate it)
  • 00:33 urandom: bootstrapping restbase2022-a β€” T243000

2020-01-16

  • 22:38 mutante: ganeti1003 - deleting VM gerrit-test (T239151)
  • 22:37 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97)
  • 22:37 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 22:34 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 22:34 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 22:22 urandom: bootstrapping restbase2021-c β€” T243000
  • 20:40 mforns@deploy1001: Finished deploy [analytics/refinery@26a587a] (thin): deploying analytics-refinery to accompany refinery-source v0.0.112 (duration: 00m 07s)
  • 20:40 mforns@deploy1001: Started deploy [analytics/refinery@26a587a] (thin): deploying analytics-refinery to accompany refinery-source v0.0.112
  • 20:37 mforns@deploy1001: Finished deploy [analytics/refinery@26a587a]: deploying analytics-refinery to accompany refinery-source v0.0.112 (duration: 14m 06s)
  • 20:29 jforrester@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/CentralAuth/includes/GlobalRename/GlobalRenameBlacklist.php: Special:GlobalRenameRequest: Initialize blacklist even if empty T242974 (duration: 00m 57s)
  • 20:23 mforns@deploy1001: Started deploy [analytics/refinery@26a587a]: deploying analytics-refinery to accompany refinery-source v0.0.112
  • 20:13 urandom: bootstrapping restbase2021-b β€” T243000
  • 20:01 Urbanecm: Purge 12 logos URLs (T150618)
  • 20:00 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5a32bde: Add logos to IS.php (T150618) (duration: 00m 56s)
  • 19:58 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: b558eea: Fix mistakes in HD logos (T150618) (duration: 00m 56s)
  • 19:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable topic search, behind a hidden preference (T242698) (duration: 00m 56s)
  • 19:15 arlolra@deploy1001: Finished deploy [parsoid/deploy@7bf9819]: (no justification provided) (duration: 07m 13s)
  • 19:08 arlolra@deploy1001: Started deploy [parsoid/deploy@7bf9819]: (no justification provided)
  • 19:08 catrope@deploy1001: Synchronized wmf-config/CommonSettings.php: Remove kask-echoseen-transition definition, now unused (T234963) (duration: 01m 35s)
  • 19:05 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Echo: switch entirely to Kask, remove Redis fallback (T234963) (duration: 00m 56s)
  • 19:02 arlolra@deploy1001: Finished deploy [parsoid/deploy@7bf9819]: Updating Parsoid to 02f0066 (duration: 08m 30s)
  • 18:54 arlolra@deploy1001: Started deploy [parsoid/deploy@7bf9819]: Updating Parsoid to 02f0066
  • 18:01 urandom: bootstrapping restbase2021-a β€” T243000
  • 17:48 James_F: Manually purged the trwiki logos from Varnish as part of updating them to reflect unblocking, T242977
  • 17:48 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki-2x.png: [trwiki] Change logo to reflect unblocking, 2x T242977 (duration: 00m 56s)
  • 17:47 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki-1.5x.png: [trwiki] Change logo to reflect unblocking, 1.5x T242977 (duration: 00m 55s)
  • 17:46 jforrester@deploy1001: Synchronized static/images/project-logos/trwiki.png: [trwiki] Change logo to reflect unblocking, 1x T242977 (duration: 00m 56s)
  • 17:39 effie: Updgrade parsoid to to php 7.2.26 and restart - T241222
  • 17:05 dcausse: restarting blazegraph@wdqs1007 (T242453)
  • 17:02 jakob@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 16:53 jakob@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'termbox' for release 'production' .
  • 16:51 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'staging' .
  • 16:31 dcausse: depooling wdqs1007, blazegraph stuck (T242453)
  • 16:30 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 15:59 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:59 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:59 effie: Updgrade appservers and api to php 7.2.26 and restart - T241222
  • 15:16 elukey@deploy1001: Finished deploy [analytics/superset/deploy@16a1644]: Upgrade to superset 0.35.2 (duration: 00m 40s)
  • 15:15 elukey@deploy1001: Started deploy [analytics/superset/deploy@16a1644]: Upgrade to superset 0.35.2
  • 15:04 vgutierrez: rolling restart of ats-tls. This effectively disables TLSv1/1.1 across the caching cluster - T238038
  • 14:28 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1097:3314 db1097:3315', diff saved to https://phabricator.wikimedia.org/P10182 and previous config saved to /var/cache/conftool/dbconfig/20200116-142800-marostegui.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3314 db1097:3315', diff saved to https://phabricator.wikimedia.org/P10181 and previous config saved to /var/cache/conftool/dbconfig/20200116-140501-marostegui.json
  • 14:04 liw@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.15
  • 13:57 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3314 db1097:3315', diff saved to https://phabricator.wikimedia.org/P10180 and previous config saved to /var/cache/conftool/dbconfig/20200116-135659-marostegui.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1097:3314 db1097:3315', diff saved to https://phabricator.wikimedia.org/P10179 and previous config saved to /var/cache/conftool/dbconfig/20200116-134801-marostegui.json
  • 13:37 marostegui: Upgrade db1097:3314 db1097:3315
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3314 db1097:3315', diff saved to https://phabricator.wikimedia.org/P10178 and previous config saved to /var/cache/conftool/dbconfig/20200116-133515-marostegui.json
  • 13:30 moritzm: restarting Swift frontends to pick up OpenSSL security update
  • 13:09 Urbanecm: EU SWAT done late
  • 13:02 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: aedd2c4: Add HD logos to IS.php (duration: 01m 04s)
  • 13:00 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 940b9a2: Add wgLogoHD entry for fa, te wikiquote & fr wikisource in IS.php (duration: 01m 05s)
  • 12:54 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: Sync project logos (duration: 01m 06s)
  • 12:51 XioNoX: remove BGP sessions to AS22652 in eqiad (left the IX)
  • 12:45 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1080', diff saved to https://phabricator.wikimedia.org/P10176 and previous config saved to /var/cache/conftool/dbconfig/20200116-124516-marostegui.json
  • 12:42 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 7381446: Add `Tutoriel` namespace for French Wiktionary (T242102) (duration: 01m 04s)
  • 12:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10175 and previous config saved to /var/cache/conftool/dbconfig/20200116-123841-marostegui.json
  • 12:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:33 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 65e17eb: Configure GlobalRename blacklist (T101615) (duration: 01m 05s)
  • 12:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Stop writing to wb_terms for properties in Test Wikidata (T225054) (duration: 01m 05s)
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10174 and previous config saved to /var/cache/conftool/dbconfig/20200116-122806-marostegui.json
  • 12:23 effie: restart php-fpm on labweb*
  • 12:19 Amir1: "delete from testwikidatawiki.wb_terms where term_full_entity_id like 'P%'" (T219301 T225054)
  • 12:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Another sync for the IS.php cache issue (duration: 01m 04s)
  • 12:16 effie: Updgrade jobrunners to php 7.2.26 and restart - T241222
  • 12:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Stop writing to wb_terms for properties in Test Wikidata (T225054) (duration: 01m 04s)
  • 12:14 moritzm: installing OpenSSL security updates
  • 12:10 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set read for items in Wikidata for new term store up to Q8M (T225057) (duration: 01m 07s)
  • 11:59 _joe_: delete mediawiki-core images from october 2019 T242775
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10172 and previous config saved to /var/cache/conftool/dbconfig/20200116-115420-marostegui.json
  • 11:28 _joe_: uploading docker-report 0.0.3
  • 11:27 akosiaris: delete etcd100{4,5,6} from netbox. T239835
  • 11:27 akosiaris: delete etcd100{4,5,6} from ganeti01.svc.eqiad.wmnet. T239835
  • 11:22 elukey: import packages in stretch-wikimedia's thirdparty/bigtop14 component
  • 11:20 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 11:18 volans: uploaded spicerack_0.0.29-1_amd64.deb to apt.wikimedia.org stretch-wikimedia
  • 11:17 vgutierrez: restarting pybal on lvs5001 (high-traffic1) - T242321
  • 11:16 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:16 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99)
  • 11:16 akosiaris@cumin1001: START - Cookbook sre.hosts.decommission
  • 11:13 vgutierrez: restarting pybal on lvs5003 (secondary LVS) - T242321
  • 11:10 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes:weight=1; selector: service=nginx,name=ncredir5002.eqsin.wmnet
  • 11:10 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes:weight=1; selector: service=nginx,name=ncredir5001.eqsin.wmnet
  • 09:24 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1100', diff saved to https://phabricator.wikimedia.org/P10171 and previous config saved to /var/cache/conftool/dbconfig/20200116-092409-root.json
  • 09:18 effie: restart php-fpm on mwmaint2001.codfw.wmnet,mwmaint1002.eqiad.wmnet,scandium.eqiad.wmnet
  • 09:16 effie: Updgrade mwmaint2001.codfw.wmnet,mwmaint1002.eqiad.wmnet,scandium.eqiad.wmnet, to php 7.2.26 - T241222
  • 09:09 effie: restart php-fpm on cloudweb2001-dev.wikimedia.org,labweb[1001-1002].wikimedia.org
  • 09:02 effie: Updgrade cloudweb2001-dev.wikimedia.org,labweb[1001-1002].wikimedia.org to php 7.2.26 - T241222
  • 08:55 ema: cp3063: ats-backend-restart to clear things up after traffic_server crash T242952
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P10170 and previous config saved to /var/cache/conftool/dbconfig/20200116-085047-marostegui.json
  • 08:39 effie: Upgrade deploy*, snapshot* to php 7.2.26 - T241222
  • 08:27 moritzm: installing OpenSSL security updates on Parsoid hosts
  • 08:20 XioNoX: reject RPKI invalids in eqord/eqiad - T220669
  • 08:05 _joe_: deleting mediawiki-core docker images from september 2019 from the registry, T242775
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1110', diff saved to https://phabricator.wikimedia.org/P10169 and previous config saved to /var/cache/conftool/dbconfig/20200116-073012-marostegui.json
  • 07:22 marostegui: Upgrade db1110
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110', diff saved to https://phabricator.wikimedia.org/P10168 and previous config saved to /var/cache/conftool/dbconfig/20200116-072219-marostegui.json
  • 06:58 marostegui: stop db1107 and db1080 replication in sync
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080', diff saved to https://phabricator.wikimedia.org/P10166 and previous config saved to /var/cache/conftool/dbconfig/20200116-065505-marostegui.json
  • 02:46 Krinkle: krinkle@mwmaint1002 Change code_repo.repo_viewvc from 'http://svn.wikimedia.org/viewvc/pywikipedia' to for repo_id 2 (pywikipedia) for. Ref 2162cf2fc46cfe.
  • 02:35 Krinkle: krinkle@mwmaint1002 Change code_repo.repo_viewvc from 'https://svn.wikimedia.org/viewvc/mediawiki' to for 'MediaWiki' repo_name. Ref 2162cf2fc46cfe, T205361.
  • 00:40 bstorm_: set max_connections on db1133 (m5-master) back to 500 since the neutron connections seem fairly stable now T242817
  • 00:23 catrope@deploy1001: Synchronized static/images/project-logos/: Restore pre-censorship trwiki logos (T242932) (duration: 01m 05s)
  • 00:17 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: GrowthExperiments: Enable topics for suggested edits on testwiki (duration: 01m 04s)

2020-01-15

  • 22:40 mutante: phabricator - disabling 'bzimport' user (T242860)
  • 21:03 jforrester@deploy1001: Synchronized php-1.35.0-wmf.14/languages/messages/MessagesMrj.php: Fix fallbacks of mrj (Hill Mari) T242409 T242796 (duration: 01m 05s)
  • 20:47 mutante: gerrit - adding Zoranzoki to members of extension-GoogleAdSense (endorsed by extension owner Siebrand) (T241509)
  • 20:28 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Touched IS.php for sync (duration: 01m 05s)
  • 20:27 jforrester@deploy1001: sync-file aborted: Enable partial blocks on last wiki, (duration: 00m 01s)
  • 20:17 krinkle@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/MultimediaViewer/resources/: T229484 (duration: 01m 06s)
  • 19:57 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable partial blocks on last wiki, Commons T242570 (duration: 01m 03s)
  • 19:54 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable banner for wikis that recently opted in to partial blocks T240300 T242570 T242569 (duration: 01m 05s)
  • 18:10 anomie@deploy1001: Synchronized wmf-config/CommonSettings.php: Set OAuth 2 access token expiry to "infinity" (duration: 01m 04s)
  • 17:50 anomie@deploy1001: Synchronized private/PrivateSettings.php: Setting RSA keys for OAuth 2.0 (T242872) (duration: 01m 05s)
  • 16:27 elukey: import key 0xDBBF9D42B7B4BD70 (Apache BigTop) manually on install1002's gpg
  • 15:55 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/WikibaseQualityConstraints/extension.json: Fix service injection for special page (T242846) (duration: 01m 08s)
  • 15:40 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.15/extensions/Wikibase/client/includes/Api/PageTerms.php: Fix invalid iteration over false in PageTerms (T242856) (duration: 01m 06s)
  • 15:37 vgutierrez: rolling restart of ats-tls instances - T196558 T242778
  • 15:28 ema: cp3064: ats-tls-restart to apply https://gerrit.wikimedia.org/r/559711 T196558
  • 15:20 moritzm: installing OpenSSL security updates on db* hosts
  • 15:02 moritzm: installing OpenSSL security updates on mw*
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1252.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1251.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1250.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1249.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1248.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1247.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1246.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1245.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1244.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1243.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1242.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1241.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1240.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1239.eqiad.wmnet
  • 14:54 jiji@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1238.eqiad.wmnet
  • 14:54 effie: lower weights on slower servers mw1238-mw1252
  • 14:53 effie: pool mw1238, mw1240, mw1246
  • 14:44 XioNoX: reject RPKI invalids in dfw - T220669
  • 14:30 moritzm: rolling restart of FPM on mw1261-mw1265 to pick up OpenSSL security update
  • 14:25 XioNoX: reject RPKI invalids in ams - T220669
  • 14:18 godog: reenable puppet on cp hosts, after https://gerrit.wikimedia.org/r/c/operations/puppet/+/563430 deployment
  • 14:08 effie: depool mw1238, mw1240, mw1246
  • 14:06 liw@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.15 (duration: 01m 07s)
  • 14:05 liw@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.15
  • 13:58 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:56 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 akosiaris@deploy1001: helmfile [EQIAD] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:54 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:53 akosiaris: update calico policy on eqiad/codfw/staging. Add new urldownloaders. T224551
  • 13:52 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'kube-system' for release 'calico-policy-controller' .
  • 13:02 _joe_: restarting gerrit
  • 12:50 XioNoX: reject RPKI invalids in eqsin - T220669
  • 12:38 vgutierrez: Pooling ulsfo for ncredir service - T242321
  • 12:27 awight: EU SWAT done
  • 12:24 awight@deploy1001: Synchronized php-1.35.0-wmf.14/extensions/Cite: SWAT: Don't fail with a LogicException during section preview (T242434) (duration: 01m 10s)
  • 12:22 vgutierrez: upgrading ats on cp4026, cp4032, cp5006 and cp5012 - T242778 T242620
  • 12:06 XioNoX: reject RPKI invalids in ulsfo - T220669
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1112', diff saved to https://phabricator.wikimedia.org/P10161 and previous config saved to /var/cache/conftool/dbconfig/20200115-115826-marostegui.json
  • 11:36 elukey: restart all varnishkafka daemons on cp4031
  • 11:09 legoktm: added SonarQubeBot to "Non-Interactive Users" group on Gerrit
  • 10:38 moritzm: installing openssl1.0 updates on stretch (update to 1.0.2u)
  • 10:08 ema: cache: rolling varnish-frontend-restart to add CAP_KILL to varnish-frontend.service T242411
  • 09:56 vgutierrez: repooling cp5012
  • 09:46 vgutierrez: depooling cp5012 for some ats parent select tests
  • 09:42 XioNoX: enable ping offload in esams - T190090
  • 09:32 marostegui: Deploy schema change on x1 eqiad hosts T242749
  • 09:19 elukey: roll-restart druid brokers on druid100[4-6] - locked up after segments deletion
  • 09:11 marostegui: Deploy schema change on x1 codfw - T242749
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10160 and previous config saved to /var/cache/conftool/dbconfig/20200115-085145-marostegui.json
  • 08:44 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 08:40 godog: roll restart ores in codfw/eqiad to apply logging pipeline changes
  • 08:40 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 08:40 elukey@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
  • 08:40 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 08:23 godog: roll restart ores in codfw/eqiad to apply logging pipeline changes
  • 08:13 godog: testing ores logging to pipeline on ores2001
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10159 and previous config saved to /var/cache/conftool/dbconfig/20200115-070201-marostegui.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10158 and previous config saved to /var/cache/conftool/dbconfig/20200115-065353-marostegui.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1080', diff saved to https://phabricator.wikimedia.org/P10157 and previous config saved to /var/cache/conftool/dbconfig/20200115-065305-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10156 and previous config saved to /var/cache/conftool/dbconfig/20200115-064606-marostegui.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1098:3316 and db1098:3317', diff saved to https://phabricator.wikimedia.org/P10155 and previous config saved to /var/cache/conftool/dbconfig/20200115-064535-marostegui.json
  • 06:25 marostegui: Upgrade db1098:3316 and db1098:3317
  • 06:23 mholloway-shell@deploy1001: Synchronized wmf-config/InitialiseSettings.php: MachineVision: Make testcommonswiki behavior consistent with commonswiki (duration: 01m 16s)
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 db1098:3317 for upgrade', diff saved to https://phabricator.wikimedia.org/P10152 and previous config saved to /var/cache/conftool/dbconfig/20200115-062028-marostegui.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10151 and previous config saved to /var/cache/conftool/dbconfig/20200115-061859-marostegui.json
  • 06:16 marostegui: Remove revision partitions from db2088:3311 - T239453
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1103:3312 - T239453', diff saved to https://phabricator.wikimedia.org/P10150 and previous config saved to /var/cache/conftool/dbconfig/20200115-061052-marostegui.json
  • 06:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10148 and previous config saved to /var/cache/conftool/dbconfig/20200115-060347-marostegui.json
  • 06:00 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@3c5f615]: Update mobileapps to 7f507ae (duration: 05m 56s)
  • 05:54 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@3c5f615]: Update mobileapps to 7f507ae
  • 01:32 mutante: lvs1015 powercycling, crashed, nothing on console, lots of unknowns in icinga
  • 01:17 mutante: dbproxy1017 and dbproxy1021 were showing "haproxy failover" icinga alerts. did the check described on https://wikitech.wikimedia.org/wiki/HAProxy#Failover and it claimed on both that db1133 was DOWN..but checking db1133 itself showed it was up and working normal. in that case the docs said to 'systemctl reload haproxy'. done on both and things recovered
  • 01:13 mutante: dbproxy1017 - systemctl reload haproxy
  • 00:22 bstorm_: restarted maintain-dbusers on labstore1004 after recovering the m5 DB's connection issue
  • 00:12 bstorm_: set max_connections to 600 temporarily while troubleshooting on m5 (db1133)

2020-01-14

  • 20:11 milimetric@deploy1001: Finished deploy [analytics/aqs/deploy@1cf0530]: Increment service-runner to latest version (duration: 04m 48s)
  • 20:07 milimetric@deploy1001: Started deploy [analytics/aqs/deploy@1cf0530]: Increment service-runner to latest version
  • 19:22 urbanecm@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: e400916: [wikitech] Restore contentadmin ability to manage abuse filters (duration: 01m 05s)
  • 18:11 vgutierrez: repooling cp5012
  • 18:06 vgutierrez: depool cp5012 for some ats parent select debugging
  • 17:43 vgutierrez: repooling cp4027
  • 17:39 vgutierrez: depooling cp4027 for some ats-tls parent balancing tests
  • 17:21 _joe_: upload docker-report 0.0.2 to {buster,stretch}-wikimedia T242604
  • 16:53 liw@deploy1001: rebuilt and synchronized wikiversions files: group0 to 1.35.0-wmf.15
  • 16:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:44 liw: branch is cut for 1.35.0-wmv.15; train window is closed, but I'll continue train since the next time slot seems to not have anything
  • 16:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:41 marostegui: Enable puppet back on install1002 and install2002 - T242481
  • 16:31 liw@deploy1001: Finished scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (try 2) (duration: 43m 29s)
  • 16:26 marostegui: Disable temporarily puppet on install1002 and install2002 - T242481
  • 16:08 volans@deploy1001: Finished deploy [debmonitor/deploy@e72911c]: Release v0.2.4 (duration: 01m 09s)
  • 16:07 volans@deploy1001: Started deploy [debmonitor/deploy@e72911c]: Release v0.2.4
  • 15:47 liw@deploy1001: Started scap: testwiki to php-1.34.0-wmf.15 and rebuild l10n cache (try 2)
  • 15:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:02 marostegui: Copy data from db1080 to db1107 T242702
  • 15:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for tranfer', diff saved to https://phabricator.wikimedia.org/P10144 and previous config saved to /var/cache/conftool/dbconfig/20200114-150223-marostegui.json
  • 15:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime
  • 14:51 liw@deploy1001: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_44869219" --threads=30 --lang en --quiet' returned non-zero exit status 1 (duration: 03m 55s)
  • 14:47 liw@deploy1001: Started scap: testwiki to php-1.35.0-wmf.15 and rebuild l10n cache
  • 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1080', diff saved to https://phabricator.wikimedia.org/P10143 and previous config saved to /var/cache/conftool/dbconfig/20200114-144341-marostegui.json
  • 14:26 marostegui: Move db1114 under db1080
  • 14:24 marostegui: Stop db1080 and db1107 replication in sync
  • 14:21 XioNoX: push firewall policies to pfw3-eqiad - T242681
  • 14:15 XioNoX: push firewall policies to pfw3-codfw - T242681
  • 14:12 liw: branch cut for 1.35.0-wmf.15
  • 14:09 vgutierrez: upgrade ats to 8.0.5-1wm12 in cp5006 and cp5012 - T242620
  • 14:03 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 13:54 marostegui: Upgrade db1080
  • 13:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1080 for upgrade', diff saved to https://phabricator.wikimedia.org/P10142 and previous config saved to /var/cache/conftool/dbconfig/20200114-135238-marostegui.json
  • 12:16 vgutierrez@puppetmaster1001: conftool action : set/weight=1; selector: service=nginx,name=ncredir3002.esams.wmnet
  • 12:16 vgutierrez@puppetmaster1001: conftool action : set/weight=1; selector: service=nginx,name=ncredir3001.esams.wmnet
  • 12:14 vgutierrez@puppetmaster1001: conftool action : set/weight=1; selector: service=nginx,name=ncredir4001.ulsfo.wmnet
  • 12:14 vgutierrez@puppetmaster1001: conftool action : set/weight=1; selector: service=nginx,name=ncredir4002.ulsfo.wmnet
  • 12:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:02 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 12:02 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:51 vgutierrez: restarting pybal on lvs4005 (high-traffic1 LVS) - T242321
  • 11:49 vgutierrez: restarting pybal on lvs4007 (secondary LVS) - T242321
  • 11:48 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ncredir4002.ulsfo.wmnet
  • 11:47 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ncredir4001.ulsfo.wmnet
  • 11:15 vgutierrez: Updating puppet-compiler facts
  • 10:40 vgutierrez: upgrade ats to 8.0.5-1wm12 in cp4026 and cp4032 - T242620
  • 10:07 moritzm: installing remaining cyrus-sasl security updates
  • 09:44 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.14/extensions/Wikibase/lib/includes/Store/Sql/Terms: wbterms: Add Statsd metrics in critical parts of the new term store (duration: 00m 57s)
  • 07:33 XioNoX: add peering to AS26744 in eqiad, eqord and eqdfw
  • 06:25 marostegui: Deploy schema change on flowdb (x1) directly on the master T242688
  • 06:23 marostegui: Deploy schema change on labswiki (wikitech) T242688
  • 06:20 marostegui: Deploy schema change on s3 master for officewiki and techconductwiki T242688
  • 06:01 marostegui: Remove partitions from revision table on db1103:3312
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103:3312 - T239453', diff saved to https://phabricator.wikimedia.org/P10141 and previous config saved to /var/cache/conftool/dbconfig/20200114-060116-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1105:3312 after removing partitions from revision table', diff saved to https://phabricator.wikimedia.org/P10140 and previous config saved to /var/cache/conftool/dbconfig/20200114-060003-marostegui.json
  • 05:29 andrewbogott: rebooting cloudservices1004 to make sure all my upgrades are sustainable
  • 01:03 catrope@deploy1001: Synchronized php-1.35.0-wmf.14/extensions/GrowthExperiments/: Various topic search-related cherry-picks (duration: 00m 57s)

2020-01-13

  • 21:35 milimetric@deploy1001: Finished deploy [analytics/refinery@690517c]: Referer Classify change (duration: 09m 08s)
  • 21:32 arlolra@deploy1001: Finished deploy [parsoid/deploy@dd92eeb]: Updating Parsoid to 5d37da1 (duration: 08m 21s)
  • 21:26 milimetric@deploy1001: Started deploy [analytics/refinery@690517c]: Referer Classify change
  • 21:24 arlolra@deploy1001: Started deploy [parsoid/deploy@dd92eeb]: Updating Parsoid to 5d37da1
  • 20:37 clarakosi@deploy1001: Finished deploy [restbase/deploy@bfdd342]: Use parsoid_uri, add ngwiki. T241756, T240771 (duration: 15m 41s)
  • 20:21 clarakosi@deploy1001: Started deploy [restbase/deploy@bfdd342]: Use parsoid_uri, add ngwiki. T241756, T240771
  • 19:39 tgr: ran disableOATHAuthForUser.php for T242543
  • 19:22 tgr@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Revert a temporary CommonsMetadata cache validation hook that has been unneeded for a long time (duration: 00m 56s)
  • 15:56 moritzm: installing cyrus-sasl security updates
  • 15:19 moritzm: remove hassium in Ganeti T224567
  • 15:19 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 15:18 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 15:18 akosiaris@deploy1001: helmfile [CODFW] Ran 'sync' command on namespace 'kube-system' for release 'rbac-deploy-clusterrole' .
  • 15:18 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 15:00 joal@deploy1001: Finished deploy [analytics/hdfs-tools/deploy@a1b4d34]: Deploy hdfs-rsync bug correction (duration: 00m 08s)
  • 15:00 joal@deploy1001: Started deploy [analytics/hdfs-tools/deploy@a1b4d34]: Deploy hdfs-rsync bug correction
  • 14:58 jmm@cumin2001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1)
  • 14:57 jmm@cumin2001: START - Cookbook sre.hosts.decommission
  • 14:55 moritzm: remove hassaleh in Ganeti T224567
  • 14:24 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (563985) (duration: 00m 55s)
  • 14:24 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (563985) (duration: 00m 56s)
  • 13:11 moritzm: upgrade mw canaries to PHP 7.2.26 T241222
  • 12:08 Urbanecm: EU SWAT done
  • 12:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c7cf53c: Deploy partial blocks on enwiki (T242569) (duration: 00m 55s)
  • 11:58 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (563985) (duration: 00m 55s)
  • 11:57 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (563985) (duration: 00m 55s)
  • 11:42 moritzm: upgrading remaining mwdebug* servers and mw1261 to PHP 7.2.26 T241222
  • 11:04 volans@deploy1001: Finished deploy [debmonitor/deploy@265059b]: Release v0.2.3 (duration: 01m 10s)
  • 11:03 volans@deploy1001: Started deploy [debmonitor/deploy@265059b]: Release v0.2.3
  • 10:51 vgutierrez: pooling esams for ncredir - T242321
  • 09:38 moritzm: rename Ganeti group in ulsfo from "default" to "row_1"
  • 09:16 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:16 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112', diff saved to https://phabricator.wikimedia.org/P10134 and previous config saved to /var/cache/conftool/dbconfig/20200113-075334-marostegui.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112', diff saved to https://phabricator.wikimedia.org/P10133 and previous config saved to /var/cache/conftool/dbconfig/20200113-073656-marostegui.json
  • 07:30 XioNoX: cr3-knams> clear bfd session fe80::5e5e:ab00:d3d:85c - T240659
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1112', diff saved to https://phabricator.wikimedia.org/P10132 and previous config saved to /var/cache/conftool/dbconfig/20200113-072611-marostegui.json
  • 06:45 marostegui: Upgrade db1112
  • 06:36 marostegui: Deploy schema change on db1112 with replication (lag will appear on s3 on labs) - T234052
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P10131 and previous config saved to /var/cache/conftool/dbconfig/20200113-063513-marostegui.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1081 for compression T232446', diff saved to https://phabricator.wikimedia.org/P10130 and previous config saved to /var/cache/conftool/dbconfig/20200113-062007-marostegui.json
  • 06:18 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1084', diff saved to https://phabricator.wikimedia.org/P10129 and previous config saved to /var/cache/conftool/dbconfig/20200113-061835-marostegui.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after compression', diff saved to https://phabricator.wikimedia.org/P10128 and previous config saved to /var/cache/conftool/dbconfig/20200113-061434-marostegui.json
  • 06:11 marostegui: Deploy schema change on s1 master (db1083) - T234052
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es1013', diff saved to https://phabricator.wikimedia.org/P10127 and previous config saved to /var/cache/conftool/dbconfig/20200113-061106-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1075 T234052', diff saved to https://phabricator.wikimedia.org/P10126 and previous config saved to /var/cache/conftool/dbconfig/20200113-061025-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1013', diff saved to https://phabricator.wikimedia.org/P10125 and previous config saved to /var/cache/conftool/dbconfig/20200113-060841-marostegui.json
  • 06:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after compression', diff saved to https://phabricator.wikimedia.org/P10124 and previous config saved to /var/cache/conftool/dbconfig/20200113-060112-marostegui.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1075 T234052', diff saved to https://phabricator.wikimedia.org/P10123 and previous config saved to /var/cache/conftool/dbconfig/20200113-060012-marostegui.json
  • 05:58 marostegui: Remove partitions from db1105:3312 - T239453
  • 05:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105:3312 - T239453', diff saved to https://phabricator.wikimedia.org/P10122 and previous config saved to /var/cache/conftool/dbconfig/20200113-055811-marostegui.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2091:3312', diff saved to https://phabricator.wikimedia.org/P10121 and previous config saved to /var/cache/conftool/dbconfig/20200113-055554-marostegui.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1084 after compression', diff saved to https://phabricator.wikimedia.org/P10120 and previous config saved to /var/cache/conftool/dbconfig/20200113-055315-marostegui.json
  • 05:51 marostegui: Deploy schema change on x1 master on flowdb with replication - T241387
  • 02:02 andrewbogott: restarted mariadb on cloudservices1003, cloudservices1004, cloudservices2001-dev, clouddb2001-dev for T239791
  • 00:58 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=cp3061.esams.wmnet
  • 00:53 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=cp3065.esams.wmnet
  • 00:23 jiji@cumin1001: conftool action : set/pooled=no; selector: name=cp3061.esams.wmnet
  • 00:23 jiji@cumin1001: conftool action : set/pooled=no; selector: name=cp3065.esams.wmnet
  • 00:22 effie: depool and restart cp3065 cp3061 - T238305
  • 00:21 effie: depool and restart cp3065 cp3061

2020-01-12

  • 14:48 effie: restart php on mw1240
  • 14:46 effie: restart php on mw1238
  • 04:35 volker-e@deploy1001: Finished deploy [design/style-guide@8bec25e]: Deploy design/style-guide: (duration: 00m 07s)
  • 04:35 volker-e@deploy1001: Started deploy [design/style-guide@8bec25e]: Deploy design/style-guide:
  • 02:57 volker-e@deploy1001: Finished deploy [design/style-guide@cebc152]: Deploy design/style-guide: (duration: 00m 07s)
  • 02:57 volker-e@deploy1001: Started deploy [design/style-guide@cebc152]: Deploy design/style-guide:

2020-01-11

  • 05:34 volker-e@deploy1001: Finished deploy [design/style-guide@6a44c69]: Deploy design/style-guide: (duration: 00m 08s)
  • 05:34 volker-e@deploy1001: Started deploy [design/style-guide@6a44c69]: Deploy design/style-guide:

2020-01-10

  • 22:33 mutante: ms-be1026 sudo systemctl reset-failed (failed Session 372989 of user debmonitor)
  • 20:45 jeh: cloudcontrol200[13]-dev schedule downtime until Feb 28 2020 on systemd service check T242462
  • 20:29 jeh: cloudmetrics100[12] schedule downtime until Feb 28 2020 on prometheus check T242460
  • 20:03 urandom: drop legacy Parsoid/JS storage keyspaces, production env -- T242344
  • 19:56 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:54 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:52 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 19:51 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 19:48 mutante: LDAP - add Zbyszko Papierski to "wmf" group (T242341)
  • 19:47 mutante: LDAP - add Hugh Nowlan to "wmf" group (T242309)
  • 19:42 dcausse: restarting blazegraph on wdqs1005
  • 19:40 ebernhardson: restart mjolnir-kafka-bulk-daemon across eqiad and codfw search clusters
  • 19:40 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@e141941]: repair model upload in bulk daemon (duration: 05m 02s)
  • 19:35 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@e141941]: repair model upload in bulk daemon
  • 19:13 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:53 mutante: welcome new (restbase) service deployer Clara Andrew-Wani (T242152)
  • 18:29 bd808: Restarted zuul on contint1001; no logs since 2020-01-10 17:55:28,452
  • 11:48 moritzm: stop/mask nginx on hassium/hassaleh T224567
  • 10:56 akosiaris: repool mathoid codfw for testing canary support in the mathoid helm chart
  • 10:56 akosiaris@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=mathoid
  • 10:51 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'canary' .
  • 10:51 akosiaris@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'mathoid' for release 'production' .
  • 10:40 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'mathoid' for release 'staging' .
  • 10:38 akosiaris: depool mathoid codfw in preparation for testing canary support in the mathoid helm chart
  • 10:37 akosiaris@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=mathoid
  • 10:24 moritzm: rename Ganeti group for esams from "default" to "row_OE" T236216
  • 10:21 moritzm: rename Ganeti group for eqsin from "default" to "row_1" T228099
  • 09:02 marostegui: Remove revision partitions from db2091:3312
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depoool db2091:3312', diff saved to https://phabricator.wikimedia.org/P10113 and previous config saved to /var/cache/conftool/dbconfig/20200110-090143-marostegui.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db2088:3312', diff saved to https://phabricator.wikimedia.org/P10112 and previous config saved to /var/cache/conftool/dbconfig/20200110-085921-marostegui.json
  • 08:55 vgutierrez: restarting pybal on lvs3005 (high-traffic1) - T242321
  • 08:51 vgutierrez: restarting pybal on lvs3007 - T242321
  • 08:48 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ncredir3002.esams.wmnet
  • 08:48 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ncredir3001.esams.wmnet
  • 08:24 ema: cp3062: varnish-frontend-restart to clear things up after child crash the past days
  • 02:11 jhuneidi@deploy1001: Pruned MediaWiki: 1.35.0-wmf.10 (duration: 04m 13s)
  • 00:45 catrope@deploy1001: Synchronized php-1.35.0-wmf.14/extensions/GrowthExperiments/: Expose tasktype/topic API parameter info (T240512) (duration: 01m 01s)
  • 00:35 shdubsh: restart prometheus on prometheus2004, enabling debug log

2020-01-09

  • 21:25 ebernhardson@deploy1001: Finished deploy [search/airflow@746c149]: Add skein to airflow venv (duration: 00m 55s)
  • 21:24 ebernhardson@deploy1001: Started deploy [search/airflow@746c149]: Add skein to airflow venv
  • 20:32 chasemp: add phabtest2 to #security temp to ensure reporting settings (T240605)
  • 20:06 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.35.0-wmf.14 refs T233862
  • 19:51 Urbanecm: Morning SWAT done
  • 19:51 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.14/resources/Resources.php: SWAT: 39bc331: Enable mediawiki.page.patrol.ajax on mobile (T242310) (duration: 01m 05s)
  • 19:35 urbanecm@deploy1001: Synchronized php-1.35.0-wmf.14/extensions/MobileFrontend/: SWAT: 31d3be7: Hot fixes for mobile diff page (T242310) (duration: 01m 09s)
  • 19:13 urbanecm@deploy1001: Synchronized wmf-config/mobile.php: SWAT: 2f9ee90: Drop beta setting (T237290) (duration: 01m 06s)
  • 18:56 otto@deploy1001: Finished deploy [analytics/hdfs-tools/deploy@f8e9d6f]: (no justification provided) (duration: 00m 08s)
  • 18:55 otto@deploy1001: Started deploy [analytics/hdfs-tools/deploy@f8e9d6f]: (no justification provided)
  • 18:05 otto@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:03 otto@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:01 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 17:38 volans@cumin1001: conftool action : set/weight=10; selector: name=elastic106.*.eqiad.wmnet
  • 17:38 volans@cumin1001: conftool action : set/weight=10; selector: name=elastic105[3-9].eqiad.wmnet
  • 17:37 volans: confctl set/weight=10 for elastic10[53-67] - T242348
  • 15:46 ema: cp3058: varnish-frontend-restart to clear things up after child crash yesterday
  • 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1078', diff saved to https://phabricator.wikimedia.org/P10110 and previous config saved to /var/cache/conftool/dbconfig/20200109-152545-marostegui.json
  • 15:21 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P10109 and previous config saved to /var/cache/conftool/dbconfig/20200109-152157-marostegui.json
  • 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P10108 and previous config saved to /var/cache/conftool/dbconfig/20200109-151434-marostegui.json
  • 15:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1078', diff saved to https://phabricator.wikimedia.org/P10107 and previous config saved to /var/cache/conftool/dbconfig/20200109-150333-marostegui.json
  • 14:38 papaul: upgrading Firmware on backup2001
  • 14:27 marostegui: Upgrade db1078
  • 14:27 ema: cp3054: varnish-frontend-restart to clear things up after child crash yesterday
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1078', diff saved to https://phabricator.wikimedia.org/P10105 and previous config saved to /var/cache/conftool/dbconfig/20200109-141057-marostegui.json
  • 14:04 moritzm: imported PHP 7.2.26 to component/php72 for stretch-wikimedia
  • 13:48 moritzm: upgrading mwdebug2002 to PHP 7.2.26 T241224
  • 13:47 moritzm: upgrading mwdebug2002 to PHP 7.2.26
  • 12:41 marostegui: Deploy schema change on s3 codfw, lag will appear on s3 codfw - T234052
  • 12:25 jynus: shutting down backup2001 T240177
  • 12:22 Urbanecm: EU SWAT done
  • 12:19 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: ed0357a: Set $wgArticleCountMethod to any for minwiktionary (T241694) (duration: 01m 08s)
  • 12:17 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 06394ea: Add ipblock-exempt and extendedconfirmed to bot group on fawiki (T241904) (duration: 01m 05s)
  • 12:11 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set wmgUseEntitySourceBasedFederation for test.wikidata.org (T241973) (duration: 01m 07s)
  • 11:23 moritzm: installing cyrus-sasl security updates
  • 11:04 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:04 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1106', diff saved to https://phabricator.wikimedia.org/P10104 and previous config saved to /var/cache/conftool/dbconfig/20200109-100948-marostegui.json
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P10103 and previous config saved to /var/cache/conftool/dbconfig/20200109-100552-marostegui.json
  • 09:56 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P10102 and previous config saved to /var/cache/conftool/dbconfig/20200109-095433-marostegui.json
  • 09:53 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1106', diff saved to https://phabricator.wikimedia.org/P10101 and previous config saved to /var/cache/conftool/dbconfig/20200109-095249-marostegui.json
  • 09:48 marostegui: Upgrade db1106
  • 09:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1106 for upgrade', diff saved to https://phabricator.wikimedia.org/P10100 and previous config saved to /var/cache/conftool/dbconfig/20200109-094748-marostegui.json
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1118', diff saved to https://phabricator.wikimedia.org/P10099 and previous config saved to /var/cache/conftool/dbconfig/20200109-093946-marostegui.json
  • 09:32 marostegui: Deploy schema change on db1106, this will generate a bit of lag on s1 labs
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118', diff saved to https://phabricator.wikimedia.org/P10098 and previous config saved to /var/cache/conftool/dbconfig/20200109-093119-marostegui.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118', diff saved to https://phabricator.wikimedia.org/P10097 and previous config saved to /var/cache/conftool/dbconfig/20200109-082243-marostegui.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1118', diff saved to https://phabricator.wikimedia.org/P10096 and previous config saved to /var/cache/conftool/dbconfig/20200109-081629-marostegui.json
  • 07:40 XioNoX: enable traceoptions for BFD on cr2-eqdfw - T240659
  • 07:37 marostegui: Upgrade db1118
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P10094 and previous config saved to /var/cache/conftool/dbconfig/20200109-073713-marostegui.json
  • 06:27 marostegui: Remove revision partitions from db2088:3312 T239453
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2088:3312 T239453', diff saved to https://phabricator.wikimedia.org/P10093 and previous config saved to /var/cache/conftool/dbconfig/20200109-062608-marostegui.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1096:3315 db1096:3316 T239453', diff saved to https://phabricator.wikimedia.org/P10092 and previous config saved to /var/cache/conftool/dbconfig/20200109-062157-marostegui.json
  • 00:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: (no-op) set config page for newcomer tasks (T233465) (duration: 01m 05s)

2020-01-08

  • 23:44 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: Roll commonswiki forward to 1.35.0-wmf.14
  • 23:34 jforrester@deploy1001: Synchronized php-1.35.0-wmf.14/extensions/WikibaseMediaInfo/resources/statements/StatementWidget.js: T242286 Update StatementWidget initialization logic (duration: 01m 05s)
  • 23:14 XenoRyet: updated civicrm from 42e88f92a9 to 9ac771a913
  • 23:09 mutante: LDAP - added moushirael to 'wmf' (T242000)
  • 22:39 mutante: restarted zuul on contint1001
  • 21:56 arlolra: Updated Parsoid to f963e51 (T238934, T237318, T238022, T228217)
  • 21:46 XenoRyet: updated civicrm from 2468d85f95 to 42e88f92a9
  • 21:46 arlolra@deploy1001: Finished deploy [parsoid/deploy@45a4245]: Updating Parsoid to f963e51 (duration: 08m 00s)
  • 21:38 arlolra@deploy1001: Started deploy [parsoid/deploy@45a4245]: Updating Parsoid to f963e51
  • 21:30 mutante: phab1003 - running decom cookbook - shutdown host, removed from puppetmaster, debmonitor etc (T238957)
  • 21:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0)
  • 21:29 dzahn@cumin1001: START - Cookbook sre.hosts.decommission
  • 21:28 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: Revert "commonswiki to 1.35.0-wmf.11"
  • 21:21 halfak@deploy1001: Finished deploy [ores/deploy@039251f]: T242035 (duration: 16m 32s)
  • 21:07 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-main' for release 'main' .
  • 21:04 halfak@deploy1001: Started deploy [ores/deploy@039251f]: T242035
  • 21:03 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 21:00 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 20:53 XenoRyet: updated civicrm from 51b6fca9b2 to 2468d85f95
  • 20:51 jhuneidi@deploy1001: Synchronized php: group1 wikis to 1.35.0-wmf.14 refs T233862 (duration: 01m 04s)
  • 20:50 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.35.0-wmf.14 refs T233862
  • 20:40 mutante: contint1001 - restarting zuul service
  • 20:00 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-analytics' for release 'analytics' .
  • 19:31 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:16 mutante: LDAP - added 'sihe' to 'wmde' and 'nda' (T242080)
  • 19:15 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 19:13 joal@deploy1001: Finished deploy [analytics/refinery@c205576] (thin): Regular analytics weekly deploy train [thin] (duration: 00m 07s)
  • 19:13 joal@deploy1001: Started deploy [analytics/refinery@c205576] (thin): Regular analytics weekly deploy train [thin]
  • 19:13 joal@deploy1001: Finished deploy [analytics/refinery@c205576]: Regular analytics weekly deploy train (duration: 08m 36s)
  • 19:04 joal@deploy1001: Started deploy [analytics/refinery@c205576]: Regular analytics weekly deploy train
  • 18:46 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:46 marostegui: Remove partitions from dewiki.revision on db1096:3315 T239453
  • 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3315', diff saved to https://phabricator.wikimedia.org/P10090 and previous config saved to /var/cache/conftool/dbconfig/20200108-184510-marostegui.json
  • 18:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1097:3315', diff saved to https://phabricator.wikimedia.org/P10089 and previous config saved to /var/cache/conftool/dbconfig/20200108-184350-marostegui.json
  • 18:39 otto@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'eventgate-logging-external' for release 'logging-external' .
  • 18:36 ppchelko@deploy1001: Finished deploy [restbase/deploy@ebb1849]: Clean up Parsoid-PHP transition code & config T241756 (duration: 14m 27s)
  • 18:33 volans: restarted wikibugs
  • 18:22 ppchelko@deploy1001: Started deploy [restbase/deploy@ebb1849]: Clean up Parsoid-PHP transition code & config T241756
  • 18:21 ppchelko@deploy1001: Finished deploy [restbase/deploy@ebb1849] (dev-cluster): Clean up Parsoid-PHP transition code & config T241756 (duration: 02m 41s)
  • 18:18 ppchelko@deploy1001: Started deploy [restbase/deploy@ebb1849] (dev-cluster): Clean up Parsoid-PHP transition code & config T241756
  • 18:07 elukey@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0)
  • 18:04 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 18:03 elukey@cumin1001: END (FAIL) - Cookbook sre.aqs.roll-restart (exit_code=99)
  • 18:03 elukey@cumin1001: START - Cookbook sre.aqs.roll-restart
  • 16:25 _joe_: running puppet on deploy1001 to remove my hot-patch to scap.cfg
  • 16:20 ema: rolling ats-be restart on !text@eqiad, !text@esams to apply https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/562849/
  • 16:00 bblack: re-pooling esams text traffic in DNS
  • 15:45 ema: cumin -s10 -b1 'A:cp-text_eqiad' 'run-puppet-agent -q ; ats-backend-restart'
  • 15:40 vgutierrez: restarting ats-tls on esams text nodes
  • 15:37 ema: cumin -s10 -b1 'A:cp-text_esams' 'run-puppet-agent -q ; ats-backend-restart'
  • 15:37 bblack: authdns-update to depool esams
  • 15:26 otto@deploy1001: Synchronized wmf-config/ProductionServices.php: REVERT Make EventBus use TLS for eventgate-analytics - T242224 (duration: 00m 34s)
  • 15:24 otto@deploy1001: sync-file aborted: REVERT Make EventBus use TLS for eventgate-analytics - T242224 (duration: 03m 56s)
  • 15:20 otto@deploy1001: sync-file aborted: REVERT Make EventBus use TLS for eventgate-analytics - T242224 (duration: 06m 33s)
  • 15:12 otto@deploy1001: Scap failed!: 4/11 canaries failed their endpoint checks(http://en.wikipedia.org)
  • 15:11 otto@deploy1001: sync-file aborted: Make EventBus use TLS for eventgate-analytics - T242224 (duration: 00m 00s)
  • 15:10 otto@deploy1001: Synchronized wmf-config/ProductionServices.php: Make EventBus use TLS for eventgate-analytics - T242224 (duration: 06m 10s)
  • 15:02 XioNoX: Routinator 0.6.4 looking good on rpki2001, upgrading rpki1001 - T242197
  • 15:00 ottomata: deploying change to make EventBus use new TLS port for eventgate-analytics - T242224
  • 14:35 ema: repool cp4028 after successful X-Analytics-TLS patch test T237993
  • 14:23 ema: depool cp4028 to test X-Analytics-TLS patch T237993
  • 14:07 XioNoX: add routinator 0.6.4 to reprepro stretch-wikimedia - T242197
  • 14:00 ariel@deploy1001: Finished deploy [dumps/dumps@dbd0ecd]: don't regenerate existing 7z files on rerun of the 7z recompression job (duration: 00m 05s)
  • 14:00 ariel@deploy1001: Started deploy [dumps/dumps@dbd0ecd]: don't regenerate existing 7z files on rerun of the 7z recompression job
  • 12:46 _joe_: deleting releng/composer-php55:0.1.0 from the docker registry
  • 12:36 Lucas_WMDE: EU SWAT done
  • 12:34 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Update Skolt Sami language name (T223544) (duration: 01m 06s)
  • 12:30 lucaswerkmeister-wmde@deploy1001: Synchronized php-1.35.0-wmf.11/extensions/Cite: SWAT: Fix handling of `` (T241303) (duration: 01m 06s)
  • 12:17 tarrow@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable tainted references on test.wikidata.org (T239621) (duration: 01m 19s)
  • 12:08 kart_: Updated cxserver to 2020-01-06-070550-production (T233405)
  • 12:04 kartik@deploy1001: helmfile [EQIAD] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 12:01 kartik@deploy1001: helmfile [CODFW] Ran 'apply' command on namespace 'cxserver' for release 'production' .
  • 12:00 kartik@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'cxserver' for release 'staging' .
  • 11:47 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2001.*
  • 11:45 akosiaris@cumin1001: conftool action : set/weight=10; selector: service=echostore
  • 11:44 vgutierrez: uploaded varnish 5.1.3-1wm12 to apt.wikimedia.org (buster) - T242093
  • 11:44 akosiaris@cumin1001: conftool action : set/weight=10; selector: name=kubernetes1001.*
  • 11:44 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes1001.*
  • 11:07 moritzm: test failover of Ganeti master in eqsin T228099
  • 11:00 moritzm: drain ganeti5003 to test new Ganeti setup in eqsin T228099
  • 10:53 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 10:41 moritzm: rebooting netflow5001 to pick up microcode
  • 10:08 moritzm: enabling spec-ctr, ssbd. md-clear passthrough for new eqsin cluster T228099
  • 09:27 moritzm: installing urldownloader1002 T241979
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1085', diff saved to https://phabricator.wikimedia.org/P10088 and previous config saved to /var/cache/conftool/dbconfig/20200108-091124-marostegui.json
  • 09:00 moritzm: installing urldownloader1001 T241979
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P10087 and previous config saved to /var/cache/conftool/dbconfig/20200108-082930-marostegui.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1085', diff saved to https://phabricator.wikimedia.org/P10086 and previous config saved to /var/cache/conftool/dbconfig/20200108-082050-marostegui.json
  • 08:09 marostegui: Upgrade db1085
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1085', diff saved to https://phabricator.wikimedia.org/P10085 and previous config saved to /var/cache/conftool/dbconfig/20200108-080853-marostegui.json
  • 08:07 marostegui: Deploy schema change on s1 codfw, there will be lag on s1 codfw - T234052
  • 07:58 marostegui: Deploy schema change on clouddb2001-dev.labtestwiki - T234052
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1079', diff saved to https://phabricator.wikimedia.org/P10084 and previous config saved to /var/cache/conftool/dbconfig/20200108-072017-marostegui.json
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P10083 and previous config saved to /var/cache/conftool/dbconfig/20200108-071312-marostegui.json
  • 07:07 marostegui: Remove partitions from dewiki.revision on db1097:3315 T239453
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1097:3315', diff saved to https://phabricator.wikimedia.org/P10082 and previous config saved to /var/cache/conftool/dbconfig/20200108-070712-marostegui.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P10081 and previous config saved to /var/cache/conftool/dbconfig/20200108-070614-marostegui.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1079', diff saved to https://phabricator.wikimedia.org/P10080 and previous config saved to /var/cache/conftool/dbconfig/20200108-070009-marostegui.json
  • 06:56 marostegui: Upgrade db1079
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1079', diff saved to https://phabricator.wikimedia.org/P10079 and previous config saved to /var/cache/conftool/dbconfig/20200108-064404-marostegui.json
  • 06:42 marostegui: Remove partitions from revision table on s6 for db1096:3316 - T239453
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1096:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10078 and previous config saved to /var/cache/conftool/dbconfig/20200108-064144-marostegui.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10077 and previous config saved to /var/cache/conftool/dbconfig/20200108-063550-marostegui.json
  • 05:41 XioNoX: enable netflow in eqsin
  • 03:54 volker-e@deploy1001: Finished deploy [design/style-guide@ad595d5]: Deploy design/style-guide: (duration: 00m 08s)
  • 03:54 volker-e@deploy1001: Started deploy [design/style-guide@ad595d5]: Deploy design/style-guide:
  • 00:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@024488f]: airflow: set mjolnir dag start date to today (20200108) (duration: 00m 42s)
  • 00:37 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@024488f]: airflow: set mjolnir dag start date to today (20200108)
  • 00:21 reedy@deploy1001: Synchronized wmf-config/throttle.php: T240845 (duration: 01m 04s)

2020-01-07

  • 23:53 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@cb228ae]: Force python to use python3.5 dependencies (take two) (duration: 00m 10s)
  • 23:53 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@cb228ae]: Force python to use python3.5 dependencies (take two)
  • 23:36 mutante: [puppetmaster2001:/var/run/confd-template] $ sudo rm .cloudceph*.err
  • 23:02 cdanis: cp3055.mgmt% racadm serveraction powercycle T240425
  • 20:42 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@6c1f455]: Bump to master: Allow cli to load without pyspark (duration: 05m 55s)
  • 20:40 jhuneidi@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.35.0-wmf.14 refs T233862
  • 20:36 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@6c1f455]: Bump to master: Allow cli to load without pyspark
  • 20:30 jhuneidi@deploy1001: Finished scap: testwikis wikis to 1.35.0-wmf.14 refs T233862 (duration: 29m 01s)
  • 20:12 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@867d674]: Bump to master: Allow cli to load without pyspark (duration: 05m 13s)
  • 20:06 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@867d674]: Bump to master: Allow cli to load without pyspark
  • 20:01 jhuneidi@deploy1001: Started scap: testwikis wikis to 1.35.0-wmf.14 refs T233862
  • 19:28 James_F: mwscript createAndPromote.php foundationwiki 'Jdforrester (WMF)' --force --custom-groups=interface-admin for T241950
  • 19:02 James_F: 1.35.0-wmf.14 was branched at fb16374 T233862
  • 18:38 ebernhardson@deploy1001: Finished deploy [wikimedia/discovery/analytics@511f745]: [airflow] Force PYTHONPATH to use pyspark 3.5 deps (duration: 00m 14s)
  • 18:38 ebernhardson@deploy1001: Started deploy [wikimedia/discovery/analytics@511f745]: [airflow] Force PYTHONPATH to use pyspark 3.5 deps
  • 17:31 Urbanecm: Run scap pull at mwdebug1001, test over
  • 17:29 Urbanecm: Stashing at mwdebug1001
  • 17:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2076 T241647', diff saved to https://phabricator.wikimedia.org/P10072 and previous config saved to /var/cache/conftool/dbconfig/20200107-172839-marostegui.json
  • 17:23 marostegui: Remove partitions from dewiki.revision from db2089:3315 T239453
  • 17:19 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1088', diff saved to https://phabricator.wikimedia.org/P10071 and previous config saved to /var/cache/conftool/dbconfig/20200107-171955-marostegui.json
  • 17:18 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@b378752]: bump numpy to 1.17.2 (duration: 05m 53s)
  • 17:18 vgutierrez: restarting pybal on lvs1015 - T240715
  • 17:13 vgutierrez: restarting pybal on lvs1016 - T240715
  • 17:12 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@b378752]: bump numpy to 1.17.2
  • 17:10 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=cloudceph,name=cloudcephmon1003.wikimedia.org
  • 17:10 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=cloudceph,name=cloudcephmon1002.wikimedia.org
  • 17:10 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: service=cloudceph,name=cloudcephmon1001.wikimedia.org
  • 16:43 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable banner on Special:Block for partial blocks early-adopter wikis T240300 (duration: 00m 57s)
  • 16:10 elukey: cr1/cr2-eqiad: set port 443 (was 8190) for term schema in analytics-in4
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088', diff saved to https://phabricator.wikimedia.org/P10070 and previous config saved to /var/cache/conftool/dbconfig/20200107-154529-marostegui.json
  • 15:44 papaul: shutting down db2076 for FW upgrade
  • 15:41 moritzm: installing urldownloader2002 T241979
  • 15:30 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 15:27 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 15:23 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088', diff saved to https://phabricator.wikimedia.org/P10069 and previous config saved to /var/cache/conftool/dbconfig/20200107-152304-marostegui.json
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1088', diff saved to https://phabricator.wikimedia.org/P10068 and previous config saved to /var/cache/conftool/dbconfig/20200107-151633-marostegui.json
  • 15:11 moritzm: installing urldownloader2001 T241979
  • 15:09 moritzm: reimaging mw2282
  • 15:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1088 for upgrade', diff saved to https://phabricator.wikimedia.org/P10067 and previous config saved to /var/cache/conftool/dbconfig/20200107-150440-marostegui.json
  • 14:39 _joe_: uploading python3-docker-report to {buster,stretch}-wikimedia, T241206
  • 14:35 marostegui: Power off db2076 for on-site maintenance T241647
  • 14:32 marostegui: Stop MySQL on db2076 for maintenance T241647
  • 14:22 marostegui: Deploy schema change on s7 codfw master, this will generate lag on s7 codfw - T234052
  • 14:21 marostegui: Deploy schema change on s2 codfw master, this will generate lag on s2 codfw - T234052
  • 14:05 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1126', diff saved to https://phabricator.wikimedia.org/P10066 and previous config saved to /var/cache/conftool/dbconfig/20200107-140300-marostegui.json
  • 14:02 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:43 moritzm: reimaging mw2282
  • 13:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1126', diff saved to https://phabricator.wikimedia.org/P10065 and previous config saved to /var/cache/conftool/dbconfig/20200107-134251-marostegui.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1104', diff saved to https://phabricator.wikimedia.org/P10064 and previous config saved to /var/cache/conftool/dbconfig/20200107-133439-marostegui.json
  • 12:56 Lucas_WMDE: EU SWAT done
  • 12:56 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix WBRepoCanonicalUriProperty setting for testwikidatawiki (duration: 00m 54s)
  • 12:52 lucaswerkmeister-wmde@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix wgImportSources setting for wikidata dblist (duration: 00m 54s)
  • 12:39 Urbanecm: Run mwscript initSiteStats.php --wiki=tawiktionary --update (T241684)
  • 12:38 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 5be01f0: Modify $wgArticleCount to any for ta.wiktionary (T241684) (duration: 00m 55s)
  • 12:32 urbanecm@deploy1001: Synchronized static/images/project-logos/: SWAT: d6ee5fe: Modify ge.wikimedia project logos (T241327) (duration: 00m 57s)
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1104', diff saved to https://phabricator.wikimedia.org/P10063 and previous config saved to /var/cache/conftool/dbconfig/20200107-122914-marostegui.json
  • 12:17 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Clean up unused configs in InitialiseSettings.php (T238154) (duration: 00m 54s)
  • 12:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Clean up unused configs in InitialiseSettings.php (T238154) (duration: 00m 55s)
  • 12:13 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Clean up unused configs in Wikibase.php (T238154) (duration: 00m 54s)
  • 12:12 ladsgroup@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Clean up unused configs in Wikibase.php (T238154) (duration: 00m 54s)
  • 12:11 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Clean up unused configs in Wikibase.php (T238154) (duration: 00m 56s)
  • 11:12 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 11:12 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 11:12 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 11:11 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 11:10 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 11:10 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 10:53 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 10:53 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 10:39 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.11/extensions/Wikibase/lib/includes/Store/Sql/Terms/DatabaseTermIdsAcquirer.php: Temporary add metrics of the need to reinsert in the new term store (duration: 00m 57s)
  • 10:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 10:19 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Fully repool db1092', diff saved to https://phabricator.wikimedia.org/P10062 and previous config saved to /var/cache/conftool/dbconfig/20200107-100743-marostegui.json
  • 10:05 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 10:05 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P10061 and previous config saved to /var/cache/conftool/dbconfig/20200107-100157-marostegui.json
  • 10:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P10060 and previous config saved to /var/cache/conftool/dbconfig/20200107-095501-marostegui.json
  • 09:52 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'staging' .
  • 09:52 akosiaris@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P10059 and previous config saved to /var/cache/conftool/dbconfig/20200107-094944-marostegui.json
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'Slowly repool db1092', diff saved to https://phabricator.wikimedia.org/P10058 and previous config saved to /var/cache/conftool/dbconfig/20200107-094506-marostegui.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1092 for alter and upgrade', diff saved to https://phabricator.wikimedia.org/P10057 and previous config saved to /var/cache/conftool/dbconfig/20200107-092221-marostegui.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1084 for compression', diff saved to https://phabricator.wikimedia.org/P10056 and previous config saved to /var/cache/conftool/dbconfig/20200107-082236-marostegui.json
  • 08:11 ayounsi@deploy1001: Finished deploy [librenms/librenms@7a0f7aa]: Upgrade LibreNMS to 1.59 - T241962 (duration: 00m 10s)
  • 08:11 ayounsi@deploy1001: Started deploy [librenms/librenms@7a0f7aa]: Upgrade LibreNMS to 1.59 - T241962
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es1019', diff saved to https://phabricator.wikimedia.org/P10055 and previous config saved to /var/cache/conftool/dbconfig/20200107-074159-marostegui.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1019 for upgrade', diff saved to https://phabricator.wikimedia.org/P10054 and previous config saved to /var/cache/conftool/dbconfig/20200107-074035-marostegui.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es1013', diff saved to https://phabricator.wikimedia.org/P10053 and previous config saved to /var/cache/conftool/dbconfig/20200107-073922-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1013 for upgrade', diff saved to https://phabricator.wikimedia.org/P10052 and previous config saved to /var/cache/conftool/dbconfig/20200107-073543-marostegui.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool es1018', diff saved to https://phabricator.wikimedia.org/P10051 and previous config saved to /var/cache/conftool/dbconfig/20200107-073508-marostegui.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1018 for upgrade', diff saved to https://phabricator.wikimedia.org/P10050 and previous config saved to /var/cache/conftool/dbconfig/20200107-072930-marostegui.json
  • 07:15 marostegui: Remove partitions from s5: db2084:3315 T239453
  • 07:13 marostegui: Remove partitions from revision table on s6: db1098 T239453
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1098:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10049 and previous config saved to /var/cache/conftool/dbconfig/20200107-070850-marostegui.json
  • 07:05 marostegui: Depool labsdb1011
  • 07:03 marostegui: Deploy schema change on s8 codfw (this will generate lag on s8 codfw) - T234052
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2089:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10048 and previous config saved to /var/cache/conftool/dbconfig/20200107-064846-marostegui.json
  • 01:17 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 01:15 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: InitialiseSettings - clean up groupOverrides layout / spacing (sync again) (duration: 00m 53s)
  • 01:14 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: InitialiseSettings - clean up groupOverrides layout / spacing (duration: 00m 54s)
  • 01:12 mutante: ganeti - creating urldownloader2002.wikimedia.org in codfw_B with 1 CPU, 1 GB RAM, 10 GB disk, public IP (T241979)
  • 01:12 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 01:09 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 01:04 mutante: ganeti - creating urldownloader2001.wikimedia.org in codfw_A with 1 CPU, 1 GB RAM, 10 GB disk, public IP (T241979)
  • 01:04 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 01:03 ebernhardson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert: "cirrus: Shift more_like to codfw cirrus cluster" (duration: 00m 54s)
  • 01:02 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 00:59 mutante: ganeti - creating urldownloader1002.wikimedia.org in eqiad_C with 1 CPU, 1 GB RAM, 10 GB disk, public IP (T241979)
  • 00:58 mutante: ganeti - creating urldownloader1001.wikimedia.org in eqiad_A with 1 CPU, 1 GB RAM, 10 GB disk, public IP (T241979)
  • 00:57 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:57 ebernhardson@deploy1001: Synchronized wmf-config/CirrusSearch-common.php: Revert "reduce query load on cirrus elastic clusters" (duration: 00m 54s)
  • 00:57 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
  • 00:56 ebernhardson@deploy1001: sync-file aborted: Revery (duration: 00m 00s)
  • 00:52 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm
  • 00:46 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: GrowthExperiments: use local search in production (T235717) (duration: 00m 54s)
  • 00:45 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: GrowthExperiments: use local search in production (T235717) (duration: 00m 58s)
  • 00:27 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Partial Blocks on every wiki excluding those that have opted-out (T218626) (duration: 00m 55s)

2020-01-06

  • 23:49 ejegg: updated payments-wiki from 827e3235dc to c3ca3ad6a7
  • 23:12 mutante: mailman - running /usr/local/sbin/rename_list wikimediamy wikimedia-my (T241988)
  • 22:34 eileen: civicrm revision changed from b7746c31aa to 51b6fca9b2, config revision is b8af24d7c8
  • 21:28 Amir1: starting rebuild of holes in new term store from Q1Mio to Q10Mio using screen in mwmaint1002 (T219123)
  • 20:06 ejegg: updated fundraising civicrm from 5642a92223 to b7746c31aa
  • 20:02 mutante: LDAP - added 'krli' (Kris Litson) to 'wmde' and 'nda' for superset access (T241722)
  • 19:39 ebernhardson@deploy1001: Finished deploy [search/mjolnir/deploy@41a22b8]: Bump to latest master (duration: 06m 57s)
  • 19:32 ebernhardson@deploy1001: Started deploy [search/mjolnir/deploy@41a22b8]: Bump to latest master
  • 19:26 Urbanecm: Morning SWAT done
  • 19:25 urbanecm@deploy1001: Synchronized dblists/commonsuploads.dblist: SWAT: 0f045c3: Enable local uploads on inh.wiki (T239925) (duration: 00m 54s)
  • 19:22 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 7722ff3: 0bff587: Add www.digital.archives.go.jp/mediaphoto.mnhn.fr to the wgCopyUploadsDomains (T238476, T241637) (duration: 00m 54s)
  • 19:19 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 1324af9: Add throttle rule for ECLAC editathon in Santiago, Chile (T241414) (duration: 00m 54s)
  • 19:14 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c3a3248: Add sandboxlink for eswikivoyage (T241163) (duration: 00m 58s)
  • 19:08 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: d7a19ca: Enable GeoData extension in ruwikinews (T239000) (duration: 00m 56s)
  • 18:49 ebernhardson@deploy1001: Finished deploy [search/airflow@8db442c]: match cryptography package with debian buster (duration: 00m 53s)
  • 18:48 ebernhardson@deploy1001: Started deploy [search/airflow@8db442c]: match cryptography package with debian buster
  • 18:17 ebernhardson@deploy1001: Finished deploy [search/airflow@8ae8500]: Require apache-airflow[kerberos] python package (duration: 00m 27s)
  • 18:16 ebernhardson@deploy1001: Started deploy [search/airflow@8ae8500]: Require apache-airflow[kerberos] python package
  • 17:11 jakob@deploy1001: helmfile [STAGING] Ran 'sync' command on namespace 'termbox' for release 'test' .
  • 16:56 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 16:27 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 15:57 milimetric@deploy1001: Finished deploy [analytics/refinery@09133cf]: Fix for geoeditors monthly (duration: 11m 49s)
  • 15:47 herron: migrating mx1001 to seconday ganeti node T240906
  • 15:45 milimetric@deploy1001: Started deploy [analytics/refinery@09133cf]: Fix for geoeditors monthly
  • 15:30 jforrester@deploy1001: Synchronized wmf-config/InitialiseSettings.php: [officewiki] Grant ipblock-exempt to all users T231943 (duration: 00m 56s)
  • 15:06 ariel@deploy1001: Finished deploy [dumps/dumps@db81d78]: avoid aborts on some symlink cleanup failures (duration: 00m 06s)
  • 15:06 ariel@deploy1001: Started deploy [dumps/dumps@db81d78]: avoid aborts on some symlink cleanup failures
  • 15:04 XioNoX: remove BGP to AS13285 in ulsfo (IXP not listed in peeringdb anymore)
  • 14:56 XioNoX: remove BGP to AS13285 in eqiad (IXP not listed in peeringdb anymore)
  • 14:32 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: Enable WebAuthn everywhere (duration: 00m 54s)
  • 14:31 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable WebAuthn everywhere (duration: 00m 57s)
  • 13:56 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 13:54 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 13:35 moritzm: reimaging mw2282 to validate correctness of apt::package_from_component for fresh installs
  • 12:58 Urbanecm: EU SWAT done
  • 12:56 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 88c800c: Add basic transwiki sources for ltwiki (T241288) (duration: 00m 54s)
  • 12:50 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: c44b4ff: Enable subpages for the main namespace on ge.wikimedia (T241329) (duration: 00m 55s)
  • 12:46 Urbanecm: mwscript namespaceDupes.php --wiki=napwikisource --fix (T231880)
  • 12:45 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 864a2f8: Set Author and Author_talk aliases for Autore NS at napwikisource (T231880) (duration: 00m 55s)
  • 12:43 Urbanecm: mwscript namespaceDupes.php --wiki=zhwiktionary --fix (T241023)
  • 12:41 urbanecm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: 0baf554: Add new namespace and aliases for zh.wiktionary (T241023) (duration: 00m 54s)
  • 12:39 urbanecm@deploy1001: sync-file aborted: SWAT: 0ac5032: Add throttle exception for Amical Wikimedia Workshop (T241705) (duration: 00m 01s)
  • 12:39 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 12:37 urbanecm@deploy1001: Synchronized wmf-config/throttle.php: SWAT: 0ac5032: Add throttle exception for Amical Wikimedia Workshop (T241705) (duration: 00m 56s)
  • 12:31 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Don’t check constraints on P6685 statements Bypassing T236104 (duration: 00m 55s)
  • 12:28 ladsgroup@deploy1001: Synchronized php-1.35.0-wmf.11/maintenance/rebuildLocalisationCache.php: SWAT: Add option to override storeClass in rebuildLocalisationCache (T105683 T99740) (duration: 00m 55s)
  • 12:25 ladsgroup@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Revert "Add a bit for forcing LC caching backend in cli mode" (duration: 00m 54s)
  • 12:23 jakob@deploy1001: helmfile [STAGING] Ran 'apply' command on namespace 'termbox' for release 'test' .
  • 12:18 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Don’t check constraints on P6685 statements (T227865) (duration: 00m 55s)
  • 12:15 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set read new for item term store up to Q100K (T219123) (duration: 00m 55s)
  • 12:07 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable wgCiteResponsiveReferences on cswiki (T241304) (duration: 00m 56s)
  • 11:42 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 55s)
  • 11:41 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 55s)
  • 10:56 urbanecm@deploy1001: Synchronized private/PrivateSettings.php: Revert T227416 mitigations (duration: 01m 05s)
  • 10:39 moritzm: installing libbsd security updates
  • 09:53 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:51 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 09:32 moritzm: reimaging mw2282 to validate correctness of apt::package_from_component for fresh installs
  • 07:37 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=100 --sleep=2 --file=/tmp/1mio.lines (T219301)
  • 03:53 Amir1: ladsgroup@mwmaint1002:~$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemTerms.php --wiki=wikidatawiki --batch-size=100 --sleep=2 --file=/tmp/100k.lines (T219301)
  • 00:06 effie: pool cp3065 T238305
  • 00:05 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=cp3065.esams.wmnet

2020-01-05

  • 23:56 effie: powecycle cp3065.esams.wmnet T238305
  • 23:53 jiji@cumin1001: conftool action : set/pooled=no; selector: name=cp3065.esams.wmnet
  • 13:09 Urbanecm: mwmaint1002: mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Coffeeandcrumbs /home/urbanecm/T241917 (T241917)

2020-01-04

  • 16:34 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 16:34 aborrero@cumin1001: START - Cookbook sre.hosts.downtime
  • 16:34 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99)
  • 16:34 aborrero@cumin1001: START - Cookbook sre.hosts.downtime

2020-01-03

  • 22:14 volker-e@deploy1001: Finished deploy [design/style-guide@8054026]: Deploy design/style-guide: (duration: 00m 08s)
  • 22:14 volker-e@deploy1001: Started deploy [design/style-guide@8054026]: Deploy design/style-guide:
  • 17:44 jynus@cumin1001: dbctl commit (dc=all): 'Repool db2084 instances T241103', diff saved to https://phabricator.wikimedia.org/P10035 and previous config saved to /var/cache/conftool/dbconfig/20200103-174447-jynus.json
  • 16:54 ejegg: updated fundraising CiviCRM from 217a1f8c63 to 5642a92223
  • 16:36 jynus: stopping db2084
  • 15:04 marostegui: Upgrade db2107
  • 14:58 marostegui: Deploy schema changes on s2 and s4 eqiad hosts T234052
  • 14:56 jbond42: clean up old /etc/apt/preferences.d/smartmontools.pref file
  • 14:48 jbond42: clean up old /etc/apt/preferences.d/puppet_all.pref file
  • 14:45 jbond42: clean up old /etc/apt/preferences.d/facter.pref file
  • 14:15 Urbanecm: Run undelete.php on a couple of pages at plwikisource per T241824
  • 13:50 marostegui: Deploy schema change on s4 codfw (lag will appear on codfw s4) - T234052
  • 13:46 moritzm: restarting exim on MXes to pick up SASL security update
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1074 after schema change', diff saved to https://phabricator.wikimedia.org/P10033 and previous config saved to /var/cache/conftool/dbconfig/20200103-110028-marostegui.json
  • 10:20 moritzm: restarting apache on cloudmetrics* to pick up SASL security update
  • 10:11 moritzm: installing cyrus-sasl2 security updates on stretch/buster
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1074 schema change', diff saved to https://phabricator.wikimedia.org/P10032 and previous config saved to /var/cache/conftool/dbconfig/20200103-094252-marostegui.json
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1076 after schema change', diff saved to https://phabricator.wikimedia.org/P10031 and previous config saved to /var/cache/conftool/dbconfig/20200103-093829-marostegui.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1076 schema change', diff saved to https://phabricator.wikimedia.org/P10030 and previous config saved to /var/cache/conftool/dbconfig/20200103-092107-marostegui.json
  • 08:17 marostegui: Deploy schema change on labswiki (wikitech) T234052
  • 07:10 marostegui: Deploy schema change on s2 codfw master, lag will appear on codfw - T234052
  • 06:57 marostegui: Deploy schema change on s6 eqiad hosts - T234052
  • 06:23 marostegui: Deploy schema change on db2089:3316
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2089:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10029 and previous config saved to /var/cache/conftool/dbconfig/20200103-062242-marostegui.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2087:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10028 and previous config saved to /var/cache/conftool/dbconfig/20200103-062148-marostegui.json

2020-01-02

  • 23:33 ejegg: updated Fundraising CiviCRM from d534f4e966 to 217a1f8c63
  • 23:09 ejegg: updated Fundraising CiviCRM from 6936aa0262 to d534f4e966
  • 22:44 ejegg: updated fundraising CiviCRM from f4db7fdb31 to 6936aa0262
  • 20:48 ejegg: updated Fundraising CiviCRM from abf0019c44 to f4db7fdb31
  • 20:30 sbassett@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploying revert of temporary patch for T241503 (permissions clean-up) (duration: 00m 53s)
  • 19:57 sbassett@deploy1001: Synchronized wmf-config/CommonSettings.php: Deploying temporary patch for T241503 (permissions clean-up) (duration: 00m 54s)
  • 18:53 ejegg: re-enabled fundraising cron jobs
  • 18:29 ejegg: disabled fundraising cron jobs
  • 16:15 moritzm: restarting Apache on graphite* hosts to pick up SASL security update
  • 16:11 moritzm: restarting Apache on webperf* hosts to pick up SASL security update
  • 15:52 moritzm: restarting Apache on puppetboard* hosts to pick up SASL security update
  • 15:46 moritzm: restarting FPM on parsoid canary to pick up SASL security update
  • 14:27 marostegui: Deploy schema change on s6 codfw master (db2129) with replication - T234052
  • 14:22 marostegui: Deploy schema change on s5 eqiad hosts - T234052
  • 14:05 moritzm: restarting PHP/Apache on mw canaries to pick up SASL security update
  • 13:47 moritzm: installing cyrus-sasl security updates on Stretch/Buster
  • 13:23 marostegui: Deploy schema change on s5 codfw master (db2123) with replication - T234052
  • 13:17 moritzm: upgrading jessie servers to intel-microcode 3.20191115.2
  • 13:14 foks: scramble password for Windy906
  • 13:00 XioNoX: enable BFD traceoptions on cr1-eqiad and cr3-knams - T240659
  • 12:41 moritzm: upgrade recently reimaged hosts to puppet 5 T239832
  • 12:32 moritzm: upgrade recently reimaged hosts to facter 3 T239832
  • 12:09 jmm@cumin2001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 12:07 jmm@cumin2001: START - Cookbook sre.hosts.downtime
  • 11:53 moritzm: restarting FPM on scandium to clear opcache health
  • 11:42 moritzm: reimaging mw2277 to validate fix for puppet5/facter3 installation on new installs T239832
  • 11:23 arturo: import more openstack packages into stretch-wikimedia thirdparty/openstack-pike-stretch (T241347)
  • 10:43 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 10:40 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 09:58 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0)
  • 09:58 filippo@cumin1001: START - Cookbook sre.hosts.downtime
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2076 T241647', diff saved to https://phabricator.wikimedia.org/P10021 and previous config saved to /var/cache/conftool/dbconfig/20200102-085806-marostegui.json
  • 08:35 marostegui: Upgrade db2090
  • 08:26 marostegui: Upgrade db2075
  • 08:10 marostegui: Deploy schema change on officewiki.flow_wiki_ref on s3 master (db1123) T241387
  • 07:49 marostegui: Deploy schema change on techconductwiki.flow_wiki_ref (empty table) on s3 master (db1123) T241387
  • 07:26 marostegui: Upgrade db2079
  • 07:18 marostegui: Deploy schema change on labswiki.flow_wiki_ref (empty table) T241387
  • 06:46 marostegui: Deploy schema change on db2131 - T241387
  • 06:44 marostegui: Repool labsdb1009
  • 06:30 marostegui: Upgrade labsdb1009
  • 06:29 marostegui: Remove revision partitions from db2087:3316 T239453
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2087:3316 - T239453', diff saved to https://phabricator.wikimedia.org/P10020 and previous config saved to /var/cache/conftool/dbconfig/20200102-062650-marostegui.json
  • 06:22 marostegui: Depool labsdb1009
  • 00:22 ejegg: re-enabled fundraising cron jobs

2020-01-01

  • 21:13 ejegg: stopped fundraising cron jobs to calculate EOY summaries
  • 04:57 andrewbogott: depooling labweb1002 so I can hotfix labweb1001 for T240734


Archives

See Server admin log/Archives.