Server Admin Log

From Wikitech
Jump to navigation Jump to search

2018-10-22

  • 13:19 marostegui: Run myloader for enwikivoyage cebwiki shwiki srwiki mgwiktionary on db2052 (s5 codfw master) - T184805
  • 13:12 kartik@deploy1001: Finished deploy [cxserver/deploy@5f53734]: Update cxserver to 7f996f3 (T207445) (duration: 03m 53s)
  • 13:08 kartik@deploy1001: Started deploy [cxserver/deploy@5f53734]: Update cxserver to 7f996f3 (T207445)
  • 11:51 zeljkof: eu swat finished
  • 11:49 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable rollbacker right on srwikisource (T206935) (duration: 00m 46s)
  • 11:37 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable autopatroller, patroller and rollbacker rights on srwikiquote (T206936) (duration: 00m 49s)
  • 11:28 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable suppressredirect and markbotedit rights to rollbackers on it.wikiversity (T207300) (duration: 00m 46s)
  • 11:21 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable cx2outreach campaign (T207031) (duration: 00m 47s)
  • 11:09 zfilipin@deploy1001: Synchronized static/images/project-logos/: SWAT: Anniversary logo for cswiki (T207589) (duration: 00m 47s)
  • 11:06 zfilipin@deploy1001: sync-file aborted: SWAT: Test if logo specified in wgLogo/wgLogoHD exists (T207053) (duration: 00m 02s)
  • 10:03 arturo: icinga downtime for cloudnet1003/4 for T206261
  • 09:16 marostegui: Remove replication filters from db2052 (s5 codfw master) - T184805
  • 09:04 marostegui: Run mydumper on db1100 for enwikivoyage cebwiki shwiki srwiki mgwiktionary - T184805
  • 08:58 marostegui: Stop replication in sync on db1100 and db2052 (codfw master) to reimport wikis - T184805
  • 08:33 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1100 - T184805 (duration: 00m 47s)
  • 08:29 moritzm: powercycling ms-be1018, stuck during reboot
  • 08:28 jynus: performing deletes on db1087 to fix wb_terms on labs
  • 08:27 marostegui: Deploy schema change on db2043 (s3 master) without replication - T204006
  • 08:22 marostegui: Disconnect codfw -> eqiad replication on s5 (db1070)
  • 08:19 marostegui: Disconnect codfw -> eqiad replication on s3 (db1075)
  • 08:13 marostegui: Disconnect codfw -> eqiad replication on es3 (es1017)
  • 08:11 marostegui: Disconnect codfw -> eqiad replication on es2 (es1015)
  • 08:08 marostegui: Disconnect codfw -> eqiad replication on x1 (db1069)
  • 08:05 marostegui: Disconnect codfw -> eqiad replication on s8 (db1071)
  • 08:03 marostegui: Disconnect codfw -> eqiad replication on s7 (db1062)
  • 08:01 marostegui: Disconnect codfw -> eqiad replication on s6 (db1061)
  • 07:59 marostegui: Disconnect codfw -> eqiad replication on s4 (db1068)
  • 07:57 marostegui: Disconnect codfw -> eqiad replication on s2 (db1066)
  • 07:52 marostegui: Disconnect codfw -> eqiad replication on s1 (db1067)
  • 07:38 moritzm: rebooting swift-be servers in eqiad for kernel security update
  • 07:24 godog: reformat ms-be2042 - T199198
  • 06:34 marostegui: Deploy schema change on db2036 - T204006
  • 06:11 marostegui: Deploy schema change on db2050 - T204006
  • 06:00 marostegui: Deploy schema change on db2057 - T204006
  • 05:47 marostegui: Deploy schema change on s3 db2074 (and db2094 sanitarium) - T204006
  • 05:31 marostegui: Deploy schema change on dbstore2002:3313 - T204006
  • 05:29 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Clarify db2033 BBU status (duration: 00m 49s)
  • 04:37 kartik@deploy1001: Finished deploy [cxserver/deploy@904151f]: Update cxserver to eee8974 (T207070, T203077, T199529) (duration: 05m 42s)
  • 04:31 kartik@deploy1001: Started deploy [cxserver/deploy@904151f]: Update cxserver to eee8974 (T207070, T203077, T199529)

2018-10-21

  • 22:15 onimisionipe: repooling wdqs1003 as it has caught up on lag
  • 20:42 banyek: resuming replication on s4@dbstore2002 (T204930)
  • 16:15 reedy@deploy1001: Synchronized wmf-config/interwiki.php: Updating interwiki cache (duration: 04m 52s)
  • 15:57 bawolff: adjust patch for T194204
  • 12:39 onimisionipe: depooling wdqs1003 to catchup on lag time

2018-10-20

  • 23:05 reedy@deploy1001: Synchronized php-1.32.0-wmf.26/extensions/CentralAuth/: Update setEmail (duration: 00m 55s)
  • 21:29 gehel: repooling wdqs1003 (still some lag, but 100[45] start to be impacted)
  • 19:54 gehel: depooling wdqs1003 to catch up on lag
  • 13:53 reedy@deploy1001: Synchronized php-1.32.0-wmf.26/includes/auth/AuthManager.php: (no justification provided) (duration: 00m 55s)
  • 12:46 hoo@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Add CentralAuth related permissions to stewards at metawiki (T207531) (duration: 01m 09s)
  • 05:38 marostegui: Force writeback on db2033 - T184888

2018-10-19

  • 20:33 twentyafterfour: deployed RCFilters: Fix completely broken highlight circles refs T207472
  • 20:32 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.26/resources/src/mediawiki.rcfilters/styles/: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/468636/ (duration: 00m 54s)
  • 20:31 twentyafterfour: deploying https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/468636/ to the full cluster.
  • 20:28 twentyafterfour: deployed https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/468636/ to mwdebug1002
  • 19:20 mutante: ns0 / ns1 - authdns-gen-zones -f /srv/authdns/git/templates /etc/gdnsd/zones && gdnsdctl reload-zones - to add new language shn (T206777)
  • 19:16 mutante: ns2/multatuli - gnddctl reload-zones
  • 19:12 mutante: labweb1001 / wikitech - disabling 2fa for myself, logging in , re-enabling it again
  • 17:49 ejegg: updated fundraising CiviCRM from 83874e75ba to 1f10dc8a18
  • 17:47 mutante: DNS - 'authdns-gen-zones -f /srv/authdns/git/templates /etc/gdnsd/zones && gdnsd checkconf && gdnsd reload-zones' - needed when adding new languages to langs.tmpl - adding "shn" (Shan language) T206777
  • 16:36 XioNoX: deactivate BGP to 15426 in ams-ix (down and no reply to emails) - T207428
  • 14:16 banyek: disconnecting s4 replication on dbstore2002 (T204930)
  • 14:12 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Remove useless comments (duration: 00m 54s)
  • 13:58 vgutierrez: Uploaded certcentral 0.2 to apt.wikimedia.org (stretch) - T207457
  • 11:46 banyek: starting compression of s4 tables @dbstore2002 (T204930)
  • 11:33 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T207313 UBN - Revert back wikidata for change_tag backend (duration: 00m 59s)
  • 10:53 arturo: icinga downtime for 2h for clounet1003/1004 to deploy patch related to T206261
  • 09:37 godog: bump /proc/sys/net/core/rmem_default temporarily to 6MB and bounce statsd-proxy statsite-instances on graphite1004 - T196484
  • 08:53 banyek: adding wmf-pt-kill_2.2.20-1+wmf4 package for stretch (T206521)
  • 08:28 jynus: stopping db1092 and db1087 in sync
  • 07:50 godog: bump /proc/sys/net/core/rmem_default temporarily to 2MB and bounce statsd-proxy statsite-instances on graphite1004 - T196484
  • 07:20 marostegui: Remove mwmaint1001 grants from m5 - https://phabricator.wikimedia.org/T201343 https://phabricator.wikimedia.org/T192457
  • 07:15 godog: powercycle ms-be1021, [19601329.556259] sd 0:1:0:1: rejecting I/O to offline device
  • 07:05 godog: bump /proc/sys/net/core/rmem_default temporarily to 1MB and bounce statsd-proxy statsite-instances on graphite1004 - T196484
  • 06:13 marostegui: Deploy schema change on s7 codfw host by host without replication - T204006
  • 05:58 marostegui: Deploy schema change on s2 codfw host by host without replication - T204006
  • 05:25 marostegui: Deploy schema change on s1 codfw host by host without replication - T204006
  • 01:49 krinkle@deploy1001: Synchronized php-1.32.0-wmf.26/extensions/WikimediaEvents/includes/WikimediaEventsHooks.php: Ic74a9d5601b8c (duration: 00m 55s)

2018-10-18

  • 22:00 mutante: lvs1011,lvs1012 - manually editing nagios NRPE config and restarting service (to make monitoring from icinga1001 work and puppet is disabled)
  • 21:52 mutante: eeden - manually editing nagios NRPE config and restarting service (to make monitoring from icinga1001 work and puppet is disabled)
  • 21:49 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.32.0-wmf.26 refs T191072
  • 21:46 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.26/includes/filerepo/file/LocalFile.php: sync Id97e1c refs T207419 (duration: 00m 53s)
  • 21:29 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.26/includes/filerepo/file/LocalFile.php: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/468470/ refs T207419 (duration: 00m 54s)
  • 20:49 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group2 wikis to 1.32.0-wmf.24 refs T191072
  • 20:39 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.32.0-wmf.26
  • 20:21 volans: start ferm on db2042, it failed to start at reboot due to DNS resolution timeout
  • 19:22 ejegg: updated SmashPig standalone deploy from 5f21d3f2db to 581c685326
  • 19:21 ejegg: updated payments-wiki from a3892e4ed3 to 06848600ed
  • 19:17 shdubsh: rebooting graphite1004
  • 19:11 shdubsh: upping ring buffer size on graphite1004 in an attempt to mitigate dropped packets at the interface -- T196484
  • 19:02 sbisson@deploy1001: Synchronized php-1.32.0-wmf.26/extensions/PageTriage/: SWAT: Use Main Object Stash for keeping track of PageTriage last use (duration: 00m 54s)
  • 18:19 awight: Restarting ORES services for T88997
  • 17:33 ladsgroup@deploy1001: Finished deploy [ores/deploy@4ac4c8b]: Logstash support for ores: T181546 T169586 T168921 T181630 T205256 (duration: 23m 48s)
  • 17:19 herron: aborted enabling kafka on logstash elasticsearch cluster due to puppet errors. reverted change T206454
  • 17:09 ladsgroup@deploy1001: Started deploy [ores/deploy@4ac4c8b]: Logstash support for ores: T181546 T169586 T168921 T181630 T205256
  • 17:00 twentyafterfour@deploy1001: Synchronized php: group1 wikis to 1.32.0-wmf.26 refs T191072 (duration: 00m 53s)
  • 16:59 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.26 refs T191072
  • 16:57 herron: enabling kafka on logstash elasticsearch cluster T206454
  • 16:55 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.26/extensions/WikibaseQualityConstraints/src/ServiceWiring.php: sync https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/WikibaseQualityConstraints/+/468352/ refs T207394 (duration: 00m 54s)
  • 16:52 mobrovac@deploy1001: Finished deploy [restbase/deploy@6c879fa]: Have 100% of traffic directed to Proton as well - T186748 (duration: 20m 52s)
  • 16:31 mobrovac@deploy1001: Started deploy [restbase/deploy@6c879fa]: Have 100% of traffic directed to Proton as well - T186748
  • 15:51 XioNoX: trunk cloud-instances2-b-eqiad between asw-b-eqiad and asw2-b-eqiad
  • 15:50 cmjohnson1: disabling checks on cloudvirt1019 for maintenance
  • 15:42 twentyafterfour: twentyafterfour@deploy1001 Synchronized php: group1 wikis to 1.32.0-wmf.24 refs T191072 (duration: 00m 53s)
  • 15:35 twentyafterfour@deploy1001: scap failed: average error rate on 6/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 14:46 moritzm: installing tomcat8 security updates
  • 14:34 moritzm: remove labvirt1018 from debmonitor (T207317)
  • 14:28 godog: temporarily bump default socket receive memory to 1MB on graphite1001, restart statsd-proxy and statsite
  • 14:22 godog: begin reformat of ms-be2041 - T199198
  • 14:21 banyek: shutting down mysql and powering down db2042 (T202051)
  • 14:13 godog: corrections to the statements above, graphite1004 not graphite1001
  • 14:11 godog: ditto for statsite instances on graphite1001, temporarily bump receive socket memory to 1MB and bounce the service
  • 14:08 godog: temporarily bump receive socket memory for statsd-proxy on graphite1001 and bounce the service
  • 13:51 moritzm: installing libidn security updates
  • 12:59 moritzm: installing libssh security updates
  • 12:55 godog: bounce statsd-proxy on graphite1001
  • 11:59 addshore: SWAT done
  • 11:59 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Wikidata.org: enable sense data type T203888 (duration: 00m 54s)
  • 11:54 mobrovac@deploy1001: Finished deploy [restbase/deploy@1041a02]: Disable onthisday check - T203588 (duration: 21m 23s)
  • 11:54 zfilipin@deploy1001: Synchronized tests/InitialiseSettingsTest.php: SWAT: Test if logo specified in wgLogo/wgLogoHD exists (T207053) (duration: 00m 53s)
  • 11:49 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix typo in IS.php: use ltwiki instead of ltwikipedia (T207081) (duration: 00m 54s)
  • 11:39 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Use testwikidatawiki instead of testwikidata in IS.php (T207089) (duration: 00m 53s)
  • 11:33 mobrovac@deploy1001: Started deploy [restbase/deploy@1041a02]: Disable onthisday check - T203588
  • 11:29 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Use new wordmarks in uzwiki (T205226) (duration: 00m 53s)
  • 11:10 zfilipin@deploy1001: Synchronized static/images/mobile/copyright/: SWAT: Upload uz specific wordmark (T205226) (duration: 00m 54s)
  • 10:59 addshore: wikidata senses deploy slot done
  • 10:57 addshore: addshore@mwmaint1002:~$ mwscript purgeList.php --wiki wikidatawiki --namespace 146
  • 10:57 mobrovac@deploy1001: Finished deploy [restbase/deploy@88c8f26]: Parallelise onthisday call, take #4 (duration: 03m 52s)
  • 10:55 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: RejectParserCacheValue Wikidata lexemes before sense deployment T203888 (duration: 00m 54s)
  • 10:54 addshore@deploy1001: sync-file aborted: RejectParserCacheValue Wikidata lexemes before sense deploymentT203888 (duration: 00m 00s)
  • 10:53 mobrovac@deploy1001: Started deploy [restbase/deploy@88c8f26]: Parallelise onthisday call, take #4
  • 10:53 mobrovac@deploy1001: Finished deploy [restbase/deploy@88c8f26]: Parallelise onthisday call, take #3 (duration: 04m 13s)
  • 10:51 addshore@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/WikibaseLexeme: Wikidata: Make statement group IDs on Senses unique (duration: 00m 59s)
  • 10:49 mobrovac@deploy1001: Started deploy [restbase/deploy@88c8f26]: Parallelise onthisday call, take #3
  • 10:49 mobrovac@deploy1001: Finished deploy [restbase/deploy@88c8f26]: Parallelise onthisday call, take #2 (duration: 07m 32s)
  • 10:41 mobrovac@deploy1001: Started deploy [restbase/deploy@88c8f26]: Parallelise onthisday call, take #2
  • 10:41 mobrovac@deploy1001: Finished deploy [restbase/deploy@88c8f26]: Parallelise onthisday call - T203588 (duration: 11m 24s)
  • 10:34 addshore@deploy1001: Synchronized wmf-config/Wikibase-production.php: Combine if blocks in Wikibase-production NOOP (duration: 00m 53s)
  • 10:32 volans@deploy1001: Finished deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (5) - T205896 (duration: 00m 29s)
  • 10:31 volans@deploy1001: Started deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (5) - T205896
  • 10:31 volans@deploy1001: Finished deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (5) - T205896 (duration: 01m 37s)
  • 10:31 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: BETA ONLY Remove wgLexemeEnableSenses from IS-labs (duration: 00m 53s)
  • 10:30 volans@deploy1001: Started deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (5) - T205896
  • 10:29 mobrovac@deploy1001: Started deploy [restbase/deploy@88c8f26]: Parallelise onthisday call - T203588
  • 10:28 volans@deploy1001: Finished deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (4) - T205896 (duration: 00m 05s)
  • 10:28 volans@deploy1001: Started deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (4) - T205896
  • 10:15 addshore: purging wikidata lexemes
  • 10:12 volans@deploy1001: Finished deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (3) - T205896 (duration: 00m 29s)
  • 10:11 volans@deploy1001: Started deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (3) - T205896
  • 10:10 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable senses on wikidatawiki T203888 (duration: 00m 53s)
  • 10:09 volans@deploy1001: Finished deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (2) - T205896 (duration: 02m 01s)
  • 10:07 volans@deploy1001: Started deploy [netbox/deploy@1cd4d43]: Upgrade to upstream v2.4.6 (2) - T205896
  • 10:00 volans@deploy1001: Finished deploy [netbox/deploy@438f1c0]: Upgrade to upstream v2.4.6 - T205896 (duration: 03m 07s)
  • 09:57 volans@deploy1001: Started deploy [netbox/deploy@438f1c0]: Upgrade to upstream v2.4.6 - T205896
  • 09:52 XioNoX: activate bgp group Customer6 on cr4-ulsfo
  • 09:20 banyek: enabling replication monitor check on pc1005 pc1006 pc2005 pc2006 (T206992)
  • 09:18 godog: bounce statsd-proxy on graphite1001
  • 09:08 moritzm: powercycling ms-be2019, stuck during reboot
  • 09:01 banyek: enabling replication monitor check on pc1004 (T206992)
  • 08:56 banyek: enabling replication monitor check on pc2004 (T206992)
  • 08:41 banyek: disabling puppet on parser caches (T206992)
  • 08:40 banyek: adding replication monitoring checks to parsercache hosts (T206992)
  • 08:26 vgutierrez: Uploaded certcentral 0.1-2 to apt.wikimedia.org (stretch)
  • 07:56 moritzm: rebooting swift backend servers in codfw for spectre v3/v4/L1TF security updates
  • 07:43 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: Wikidata dispatch: reduce concurrent dispatchers to 2 (duration: 00m 59s)
  • 05:34 marostegui: Restarting a failed s8 backup from dbstore1001 to db1116:3318
  • 05:05 XioNoX: start office-DC link renumbering - T205985
  • 02:51 ejegg: updated fundraising CiviCRM from 7b8d33bb4e to 83874e75ba
  • 00:32 twentyafterfour: restarting apache on phab1001 to apply b3bfff1

2018-10-17

  • 22:56 awight: Restarting ORES uwsgi service for T88997
  • 22:38 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.32.0-wmf.26 refs T191072
  • 22:36 robh: bast4001 reboot is my fault, power cables were justled when i was decommssioning lvs4002 right above it in the rack
  • 22:31 ejegg: updated fundraising CiviCRM from 5eac0634e6 to 7b8d33bb4e
  • 22:24 ejegg: updated payments-wiki from 0385ad02a7 to a3892e4ed3
  • 22:21 ppchelko@deploy1001: Finished deploy [restbase/deploy@88c8f26] (dev-cluster): Spread requests beetween MCS nodes for onthisday (duration: 02m 54s)
  • 22:18 ppchelko@deploy1001: Started deploy [restbase/deploy@88c8f26] (dev-cluster): Spread requests beetween MCS nodes for onthisday
  • 20:50 arlolra: Updated Parsoid to e6b708b (T204622, T187848, T207093)
  • 20:40 arlolra@deploy1001: Finished deploy [parsoid/deploy@babf1da]: Updating Parsoid to e6b708b (duration: 08m 41s)
  • 20:32 arlolra@deploy1001: Started deploy [parsoid/deploy@babf1da]: Updating Parsoid to e6b708b
  • 20:17 mobrovac@deploy1001: Started restart [proton/deploy@a657059]: (no justification provided)
  • 20:10 ejegg: updated fundraising CiviCRM from 4cc21d61c5 to 5eac0634e6
  • 19:26 shdubsh: restart eventlogging for statsd DNS change - T88997
  • 19:23 twentyafterfour: Mediawiki train is still blocked by T207288
  • 19:19 godog: restart zuul for statsd DNS change - T88997
  • 19:12 mutante: scb1003 - restart pdfrender
  • 19:09 godog: roll-restart eventbus for statsd DNS change - T88997
  • 19:00 krinkle@deploy1001: Synchronized php-1.32.0-wmf.26/includes/cache/: T193271 - I25aa0e27200a0 (duration: 01m 01s)
  • 18:57 awight: Restarting ORES cluster to refresh DNS, T88997
  • 18:48 banyek: repooling labsdb1009 (T181650)
  • 18:48 shdubsh: restart navtiming on webperf nodes
  • 18:39 godog: restart jmxtrans on kafka hosts
  • 18:17 shdubsh: moving statsd cname to graphite1004
  • 18:07 banyek: depooling labsdb1009 (T181650)
  • 17:08 banyek: depooling labsdb1009 (T181650)
  • 16:53 banyek: repooling labsdb1011
  • 15:53 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.26/extensions/AbuseFilter/: sync AbuseFilter revision 4e2a6b6 to 1.32.0-wmf.26 refs T207220 (duration: 00m 58s)
  • 15:34 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: T206593: Enabling db2096 for x1 (duration: 00m 56s)
  • 15:31 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: T206593: Enabling db2096 for x1 (duration: 00m 56s)
  • 15:28 banyek: enabling db2096 for cluster x1 (T206593)
  • 14:33 godog: upload prometheus-statsd-exporter 0.7.0+ds1-2 - T205870
  • 14:01 marostegui: Repool labsdb1010, depool labsdb1011 - T181650
  • 13:08 gehel: applying rps NIC config for all wdqs nodes - T206105
  • 13:05 banyek: deplooling labsdb1010 (T181650)
  • 12:56 banyek: enabling notifications on db2096 (T206593)
  • 12:55 banyek: enabling notifications on db2096
  • 11:40 Amir1: EU SWAT is done
  • 11:40 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reading from new backend of change tag everywhere (T194164) (duration: 00m 57s)
  • 11:32 moritzm: installing graphicsmagick security updates
  • 11:30 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: T206702 Enable client side error counting on Minerva (duration: 00m 57s)
  • 11:26 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: T207196 gerrit:467736 Wikidata: enable JSON-LD data format on test.wikidata.org (duration: 00m 56s)
  • 11:21 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: T207196 Wikidata: add setting for setting the enabled entity data forms gerrit:467735 PT 2/2 (duration: 00m 56s)
  • 11:19 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: T207196 Wikidata: add setting for setting the enabled entity data forms gerrit:467735 PT 1/2 (duration: 00m 57s)
  • 11:17 Amir1: ladsgroup@mwmaint1002:~$ mwscript deleteLocalPasswords.php --wiki=enwiki --delete --batch-size 200 (This will cause lag on codfw)
  • 11:15 addshore@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: T205611 T205330 Remove Wikidata RejectParserCacheValue hook gerrit:467913 (duration: 00m 56s)
  • 11:11 addshore@deploy1001: Synchronized wmf-config/Wikibase.php: SWAT: Increase wikidata dispatch randomness to 30 (duration: 00m 56s)
  • 11:08 addshore@deploy1001: Synchronized wmf-config/Wikibase-production.php: SWAT: T207019 gerrit:467343 Enable WBQualityConstraintsSuggestionsBetaFeature on wikidatawiki (duration: 00m 56s)
  • 11:04 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT gerrit:467691 Add constraint-suggestions to wgBetaFeaturesWhitelist (duration: 01m 10s)
  • 11:04 ariel@deploy1001: Finished deploy [dumps/dumps@ed7eed9]: use lbzip2 for recombine steps if configured (duration: 00m 03s)
  • 11:04 ariel@deploy1001: Started deploy [dumps/dumps@ed7eed9]: use lbzip2 for recombine steps if configured
  • 09:34 XioNoX: update interfaces and BGP IPs for office-DC link (DC side, interfaces still disabled) - T205985
  • 09:30 banyek: truncating parsercache tables on pc2006 (T206740)
  • 09:12 _joe_: reenabling puppet (not running it) in codfw
  • 09:12 _joe_: change applied to all appservers serving traffic
  • 09:08 _joe_: running puppet on all apaches (appserver/api) in eqiad to pick up the wikipedia.org vhost refactor
  • 09:05 _joe_: running puppet on mwdebug1001, then testing again wikipedia.org for regressions
  • 09:04 _joe_: puppet disabled on the appservers, now merging the wikipedia.org conversion to mediawiki::web::vhost
  • 08:43 mobrovac@deploy1001: Started restart [proton/deploy@a657059]: (no justification provided)
  • 08:30 kartik@deploy1001: Finished deploy [cxserver/deploy@b30a323]: Update cxserver to 29e01e4 (T206305, T204668) (duration: 03m 54s)
  • 08:27 kartik@deploy1001: Started deploy [cxserver/deploy@b30a323]: Update cxserver to 29e01e4 (T206305, T204668)
  • 08:09 banyek: stopping binlog purgers on the parsercache hosts (the binlogs will be kept for 24hrs) - T206740
  • 08:00 banyek: truncating parsercache tables on pc2005 (T206740)
  • 06:52 jynus: fixing s8 master drifts T206743
  • 02:10 ejegg: updated payments-wiki from 7fb1aae963 to 0385ad02a7
  • 01:24 legoktm@deploy1001: Synchronized wmf-config/CommonSettings.php: Add REL1_32 to ExtensionDistributor (duration: 00m 59s)

2018-10-16

  • 22:11 ppchelko@deploy1001: Finished deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 6 (duration: 01m 18s)
  • 22:09 ppchelko@deploy1001: Started deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 6
  • 22:09 ppchelko@deploy1001: Finished deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 5 (duration: 05m 16s)
  • 22:04 ppchelko@deploy1001: Started deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 5
  • 22:04 ppchelko@deploy1001: Finished deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 4 (duration: 03m 53s)
  • 22:00 ppchelko@deploy1001: Started deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 4
  • 22:00 ppchelko@deploy1001: Finished deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 3 (duration: 04m 15s)
  • 21:58 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.32.0-wmf.24 refs T191072
  • 21:55 ppchelko@deploy1001: Started deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 3
  • 21:55 ppchelko@deploy1001: Finished deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 2 (duration: 09m 11s)
  • 21:46 ppchelko@deploy1001: Started deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required, take 2
  • 21:45 ppchelko@deploy1001: Finished deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required (duration: 03m 53s)
  • 21:42 ppchelko@deploy1001: Started deploy [restbase/deploy@d9e3a09]: Downgrade major-greater to minor-greater if no-cache is required
  • 21:18 twentyafterfour@deploy1001: rebuilt and synchronized wikiversions files: group0 wikis to 1.32.0-wmf.26 refs T191072
  • 20:55 twentyafterfour@deploy1001: Finished scap: Syncing 1.32.0-wmf.26 refs T191072 (duration: 26m 32s)
  • 20:28 twentyafterfour@deploy1001: Started scap: Syncing 1.32.0-wmf.26 refs T191072
  • 20:14 shdubsh: restarted pdfrender on scb1003
  • 18:44 ppchelko@deploy1001: Started restart [proton/deploy@a657059]: Try restarting again for metrics
  • 18:43 ppchelko@deploy1001: Started restart [proton/deploy@a657059]: Try restarting again for metrics
  • 18:42 ppchelko@deploy1001: Finished deploy [proton/deploy@a657059]: Try restarting for metrics (duration: 00m 20s)
  • 18:42 ppchelko@deploy1001: Started deploy [proton/deploy@a657059]: Try restarting for metrics
  • 17:01 _joe_: restarted pdfrender on scb1004
  • 16:33 akosiaris: depool restbase-async from eqiad in order to test traffic going to parsoid codfw
  • 16:15 _joe_: disabled puppet on all appservers, merging wikidata apache change, re-enabling puppet on mwdebug1001 for testing
  • 14:51 mobrovac@deploy1001: Finished deploy [proton/deploy@a657059]: Rollback to puppeteer v1.5.0 - T186748 (duration: 00m 49s)
  • 14:51 mobrovac@deploy1001: Started deploy [proton/deploy@a657059]: Rollback to puppeteer v1.5.0 - T186748
  • 14:28 godog: roll-restart elasticsearch on logstash100[456] to change elasticsearch data dir - T206454
  • 14:06 godog: depool in turn logstash1008 and logstash1009 to change elasticsearch data dir - T206454
  • 13:55 godog: depool logstash1007 to change elasticsearch data dir - T206454
  • 13:54 XioNoX: router back and healthy, enable external BGP sessions on cr2-eqdfw - T203261
  • 13:51 moritzm: rebooting acamar for update to stretch-proposed-updates kernel
  • 13:44 XioNoX: reboot cr2-eqdfw for upgrade - T203261
  • 13:43 XioNoX: disable external BGP sessions on cr2-eqdfw - T203261
  • 13:43 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting comment table migration stage to write-new/read-both on group 0 (T166733) (duration: 00m 50s)
  • 13:34 XioNoX: start install process on cr2-eqdfw (non impacting before reboot) - T203261
  • 13:11 akosiaris: pool codfw for apertium|citoid|cxserver|eventbus|eventstreams|graphoid|mathoid|mobileapps|ores|parsoid|pdfrender|proton|recommendation-api|restbase|restbase-async|wdqs|wdqs-internal|zotero
  • 13:11 akosiaris@puppetmaster1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=^apertium|citoid|cxserver|eventbus|eventstreams|graphoid|mathoid|mobileapps|ores|parsoid|pdfrender|proton|recommendation-api|restbase|restbase-async|wdqs|wdqs-internal|zotero$
  • 13:08 elukey: restart memcached on mc1035 with -R 200 (will wipe the object cache shard as consequence) - T203786
  • 12:57 akosiaris: pool mathoid eqiad
  • 12:52 gtirloni: T186571 removed legofan4000 user from project-tools group (leftover from T165624 legofan4000->macfan4000 rename)
  • 12:44 akosiaris@deploy1001: scap-helm mathoid finished
  • 12:43 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 12:43 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid --reset-values -f mathoid.yaml [namespace: mathoid, clusters: eqiad]
  • 12:35 akosiaris: depool eqiad mathoid for helm chart upgrade
  • 12:32 akosiaris: pool codfw mathoid
  • 12:14 akosiaris@deploy1001: scap-helm mathoid finished
  • 12:14 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 12:14 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid --reset-values -f mathoid.yaml [namespace: mathoid, clusters: codfw]
  • 12:08 Amir1: EU SWAT is done
  • 12:08 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reading from new backend of change_tag in s7 (T194164) (duration: 00m 50s)
  • 12:03 akosiaris@deploy1001: scap-helm mathoid finished
  • 12:03 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 12:03 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/includes/changetags/ChangeTags.php: SWAT: Avoid fatals when the filter tags is empty (T194164) (duration: 00m 50s)
  • 12:03 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid --set main_app.limits.memory=1G [namespace: mathoid, clusters: codfw]
  • 12:02 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid --set main_app.limits.memory=1g [namespace: mathoid, clusters: codfw]
  • 11:49 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Re-enable search integration for ArticlePlaceholder (T195751) (duration: 00m 50s)
  • 11:38 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Translate on idwikimedia (T204292) (duration: 00m 49s)
  • 11:32 banyek: the binlog purging stopped on pc2004 (T206740)
  • 11:27 akosiaris: upgrade mathoid chart to version 0.0.12
  • 11:26 akosiaris@deploy1001: scap-helm mathoid finished
  • 11:26 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 11:26 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid [namespace: mathoid, clusters: codfw]
  • 11:24 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle rule for editathon at University of North Carolina at Charlotte (T207043) (duration: 00m 49s)
  • 11:18 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add new throttle rule for WMCL Editathon (T206914) (duration: 00m 49s)
  • 11:09 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Add throttle rule for "Night of the Digital Language" (T206408) (duration: 00m 49s)
  • 11:05 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Remove expired throttle rule (T207015) (duration: 00m 50s)
  • 11:02 banyek: truncating tables in parsecache@pc2004 (T206740)
  • 10:52 moritzm: rolling reboot of thumbor in eqiad for kernel security updates
  • 10:50 godog: run puppet on scb to deploy db configuration for recommendation-service
  • 10:37 banyek: stopping pc2005 -> pc1005 replication (T206740)
  • 10:37 banyek: stopping pc2006 -> pc1006 replication (T206740)
  • 10:22 jynus: running database maintenance tasks on cumin1001, expect very high memory usage
  • 09:53 akosiaris: upload blubber_0.6.0-1_amd64 to apt.wikimedia.org/jessie-wikimedia/main and apt.wikimedia.org/stretch-wikimedia/main T206766
  • 09:03 moritzm: rolling reboot of thumbor in codfw for kernel security updates
  • 08:56 banyek: stopping pc2004 -> pc1004 replication (T206740)
  • 08:42 moritzm: removed mwmaint1001 from debmonitor (T192457)
  • 07:46 akosiaris: upgrade apertium-apy throught the fleet T199447
  • 07:46 akosiaris: upgrade apertium-apy throught the fleet
  • 07:22 akosiaris: upload apertium-apy_0.11.4-1+wmf1 to apt.wikimedia.org/jessie-wikimedia/main T199447
  • 07:22 akosiaris: upload apertium-apy_0.11.4-1+wmf1 to apt.wikimedia.org/jessie-wikimedia/main
  • 07:20 akosiaris@deploy1001: scap-helm mathoid finished
  • 07:19 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 07:19 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid [namespace: mathoid, clusters: codfw]
  • 07:19 akosiaris@deploy1001: scap-helm mathoid upgrade [namespace: mathoid, clusters: codfw]
  • 07:17 moritzm: installing net-snmp security updates
  • 06:32 legoktm@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Revert "Enable reading from new backend of change_tag in s7" (T194164) (duration: 00m 50s)
  • 06:05 jynus: stopping db1092 and db1087 in sync T206743
  • 05:10 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Remove db1092 BBU comments after BBU replacement (duration: 00m 52s)
  • 00:23 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@c81dd9e]: Redeploy Updater for removal of props channel (duration: 10m 21s)
  • 00:13 smalyshev@deploy1001: Started deploy [wdqs/wdqs@c81dd9e]: Redeploy Updater for removal of props channel

2018-10-15

  • 20:52 arlolra: Updated Parsoid to 8f3ff40 (T205642, T206003, T187848, T205455, T205743)
  • 20:37 bsitzmann@deploy1001: Finished deploy [mobileapps/deploy@834d00a]: Update mobileapps to c2a4ef9 (T206701 T206467 T168875) (duration: 03m 47s)
  • 20:34 bsitzmann@deploy1001: Started deploy [mobileapps/deploy@834d00a]: Update mobileapps to c2a4ef9 (T206701 T206467 T168875)
  • 20:32 arlolra@deploy1001: Finished deploy [parsoid/deploy@b758124]: Updating Parsoid to 8f3ff40 (duration: 11m 43s)
  • 20:20 arlolra@deploy1001: Started deploy [parsoid/deploy@b758124]: Updating Parsoid to 8f3ff40
  • 19:37 mforns@deploy1001: Finished deploy [analytics/refinery@3f4adf8]: deploy refinery together with source version 0.0.78 without all removed old jars (duration: 05m 18s)
  • 19:33 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@ff3bf90]: Redeploy 1010 (duration: 00m 28s)
  • 19:33 smalyshev@deploy1001: Started deploy [wdqs/wdqs@ff3bf90]: Redeploy 1010
  • 19:32 mforns@deploy1001: Started deploy [analytics/refinery@3f4adf8]: deploy refinery together with source version 0.0.78 without all removed old jars
  • 19:27 mforns@deploy1001: Finished deploy [analytics/refinery@1fc53d9]: deploy refinery together with source version 0.0.78 (duration: 15m 56s)
  • 19:11 mforns@deploy1001: Started deploy [analytics/refinery@1fc53d9]: deploy refinery together with source version 0.0.78
  • 18:59 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reading from new backend of change_tag in s7 (T194164) (duration: 00m 49s)
  • 18:59 mutante: LDAP - added crusnov to wmf and ops groups
  • 18:51 tgr: pulled gerrit 467315 to mwdeploy1001 (no-op, no scap needed)
  • 18:47 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@ff3bf90]: GUI updates and new Updater build (duration: 13m 57s)
  • 18:44 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: cswikivoyage has HD logo even the project doesnt exist (T207066) (duration: 00m 49s)
  • 18:39 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Disable AICaptcha data collection (T186244) (duration: 00m 49s)
  • 18:33 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix a typo in wgLogoHD (mapwiki => napwiki) T207056, Remove techcomwikis row in wgLogo, techcomwiki doesnt exist T207056 (duration: 00m 48s)
  • 18:33 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@ff3bf90]: GUI updates and new Updater build
  • 18:30 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: Beta: Show share button on mobile web for beta user (no-op) (duration: 00m 49s)
  • 18:14 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT enable senses on testwikidatawiki T203887 (duration: 00m 49s)
  • 18:10 addshore@deploy1001: Synchronized wmf-config/Wikibase-production.php: SWAT: T207019 Enable WBQualityConstraintsSuggestionsBetaFeature on testwikidatawiki (duration: 00m 49s)
  • 18:01 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@ff3bf90]: Test deployment - GUI update and new Updater build(wdqs1009) (duration: 02m 11s)
  • 17:59 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@ff3bf90]: Test deployment - GUI update and new Updater build(wdqs1009)
  • 17:57 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@ff3bf90]: Test deployment - GUI update and new Updater build(wdqs1009) (duration: 02m 10s)
  • 17:55 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@ff3bf90]: Test deployment - GUI update and new Updater build(wdqs1009)
  • 16:54 marostegui: Start replication on db1087 and db1092 to avoid them lagging behind the whole night (nothing running there at this time)
  • 16:36 cmjohnson1: replacing pem0 on asw2-a7-eqiad T206972
  • 16:18 _joe_: restart prometheus-mcrouter-exporter.service across the fleet
  • 15:39 marostegui: Stop MySQL and poweroff db1092 for BBU replacement - T205514
  • 15:31 andrewbogott: restarting slapd on seaborgium as a test for T205463
  • 15:14 cmjohnson1: replacing optics asw2-b fpc2 -fpc8
  • 15:13 mforns@deploy1001: Finished deploy [analytics/refinery@9b288c5]: deploy refinery together with source version 0.0.77 (duration: 20m 19s)
  • 14:53 mforns@deploy1001: Started deploy [analytics/refinery@9b288c5]: deploy refinery together with source version 0.0.77
  • 14:46 marostegui: Ease consistency replication options on db2048 to mitigate lag
  • 14:29 moritzm: rebooting backup2001 for some tests
  • 13:35 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting MCR migration stage to write-both/read-new on Commons (T198308) (duration: 00m 49s)
  • 13:32 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: T206593: adding db2096 to hosts (and repooling db2069) (duration: 00m 49s)
  • 13:30 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: T206593: adding db2096 to hosts (and repooling db2069) (duration: 00m 49s)
  • 13:16 jynus: stopping db1092 and db1087 in sync T206743
  • 13:10 Jeff_Green: auithdns-update to deploy saiph->frpig2001 rename
  • 13:02 godog: upload prometheus-statsd-exporter 0.7.0 - T205870
  • 12:45 banyek: rebooting db2096
  • 12:44 gehel: reseting kafka offsets on wdqs public cluster
  • 12:44 elukey: complete rolling restart of eventbus on kafka[12]00[1-3] for python security upgrades (only codfw was done)
  • 12:41 elukey: upgrade prometheus-memcached-exporter on swift and thumbor
  • 11:57 Amir1: start of mwscript deleteLocalPasswords.php --delete --batch-size 200 on all wikis
  • 11:38 zeljkof: EU SWAT finished
  • 11:29 hoo: Started rebuildItemsPerSite on mwmaint1002 (T44325). Can be killed at any time, if necessary.
  • 11:26 zfilipin@deploy1001: scap failed: average error rate on 4/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 11:09 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable reading from ct_tag_id in s7 (T194164) (duration: 00m 49s)
  • 10:57 moritzm: installing ghostscript security updates for jessie
  • 10:47 moritzm: installing tomcat7 security updates
  • 10:43 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 10:42 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 49s)
  • 09:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1092 for recloning - T206743 (duration: 00m 49s)
  • 09:45 marostegui: Stop MySQL on db1116:3318 to reclone db1092
  • 09:41 banyek: max_binlog_size is set back to 1048576000 on ParseCache hosts (T206740)
  • 09:23 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Restore original weight for db1104 (duration: 00m 49s)
  • 09:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase traffic for db1104 (duration: 00m 48s)
  • 08:58 banyek@deploy1001: Synchronized wmf-config/db-codfw.php: T206593: depooling db2069 (duration: 00m 48s)
  • 08:50 elukey: restart hadoop yarn resource managers on an-master* to pick up new jvm settings
  • 08:49 XioNoX: repool eqsin - T206861
  • 08:48 banyek: depooling db2033 (T206593)
  • 08:46 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1104 - T206743 (duration: 00m 49s)
  • 08:17 moritzm: installing imagemagick security update
  • 07:57 godog: reformat ms-be2040 with crc=1 finobt=0 - T199198
  • 07:32 banyek: reimaging db2096(T206593)
  • 07:31 banyek: reimaging db2096
  • 07:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 - T206743 (duration: 00m 48s)
  • 07:15 marostegui: Stop MySQL at db1116:3318 to clone db1104
  • 07:12 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1104 - T206743 (duration: 00m 49s)
  • 07:04 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Fully repool db1109 (duration: 00m 49s)
  • 06:55 XioNoX: add v6 monitoring for mr1-ulsfo OOB - T206778
  • 06:49 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase weight for db1109 (duration: 00m 49s)
  • 06:40 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Slowly repool db1109 (duration: 00m 50s)
  • 05:20 kartik@deploy1001: Finished deploy [cxserver/deploy@fd74c3b]: Update cxserver to b51f363 (T203077, T99934, T203550) (duration: 04m 25s)
  • 05:16 kartik@deploy1001: Started deploy [cxserver/deploy@fd74c3b]: Update cxserver to b51f363 (T203077, T99934, T203550)
  • 05:16 marostegui: Stop MySQL on db1109 for recloning - T206743
  • 05:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 (duration: 00m 50s)
  • 05:11 marostegui: Stop MySQL on db1116:3318 to use it to clone db1109
  • 03:18 kartik@deploy1001: Finished deploy [cxserver/deploy@5a70ef1]: Update cxserver to 47a864b (T205420, T203077, T205700, T205616) (duration: 04m 44s)
  • 03:14 kartik@deploy1001: Started deploy [cxserver/deploy@5a70ef1]: Update cxserver to 47a864b (T205420, T203077, T205700, T205616)
  • 00:45 krinkle@deploy1001: Synchronized multiversion/MWRealm.php: I79fb3d194a58: use env.php (duration: 00m 49s)
  • 00:08 krinkle@deploy1001: Synchronized wmf-config/: I79fb3d194a: add env.php file (not yet used) (duration: 00m 50s)

2018-10-14

  • 23:42 krinkle@deploy1001: Synchronized multiversion/getMWVersion: Ice9a74e73481 no-op (duration: 00m 49s)
  • 23:21 krinkle@deploy1001: Synchronized wmf-config/ProductionServices.php: If4d8faa4 (duration: 00m 48s)
  • 21:48 krinkle@deploy1001: Synchronized multiversion/MWMultiVersion.php: I83b2bdd53c13e (duration: 00m 50s)
  • 20:47 krinkle@deploy1001: Synchronized wmf-config/import.php: beta-only (duration: 00m 54s)
  • 16:34 volans: forcing a puppet run on all eqsin hosts with batch 1 to clear most of the alarms - T206861
  • 08:54 elukey: restart Yarn resource manager on an-master1002 to force an-master1001 to take the leadership back - T206943
  • 08:34 elukey: powercycle restbase1015 (frozen, no ssh, no metrics, no root console via serial available)
  • 00:48 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/CentralAuth/includes/specials/SpecialGlobalGroupMembership.php: T203767 - If2bfa092b (duration: 00m 50s)

2018-10-13

  • 23:37 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T45086 - I4857e8ac (duration: 00m 51s)
  • 03:07 bblack: eqsin repooled

2018-10-12

  • 18:56 brion: restarted vp9 background transcodes in eqiad, via mwmaint1002
  • 18:37 addshore: modified attachLatest.php script finished running over 9395 pages T206743
  • 18:25 addshore: running modified attachLatest.php script over ~9000 pages on wikidatawiki (with added wait for slaves) T206743
  • 15:50 mutante: repair /dev/sde1 on ms-be2041 - T199198
  • 15:48 mutante: repair /dev/sdh1 on ms-be1043 - T199198
  • 14:23 _joe_: depooling eqsin via geodns due to loss of power redundancy
  • 13:35 gehel: repooling wdqs1003 catched up on lag
  • 12:59 gehel: depooling wdqs1003 to catch up on lag
  • 12:20 bblack: uploading gdnsd 2.99.9942-beta-1+wmf1 to stretch-wikimedia
  • 10:51 _joe_: depooling mw2252 for mcrouter tests T203786
  • 10:27 hoo: Updated the Wikidata property suggester with data from Monday's JSON dump and applied the T132839 workarounds
  • 10:08 addshore@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/WikimediaEvents/extension.json: T205283 gerrit:466843 Update Schema:WMDEBannerEvents rev to 18437830 (duration: 00m 52s)
  • 09:01 elukey: rolling restart of eventbus on kafka[1,2]00[1-3] to pick up python security upgrades
  • 05:54 moritzm: installing git security updates on trusty
  • 02:25 ejegg: updated fundraising tools from 3754f32 to 5a2d39b

2018-10-11

  • 23:33 Reedy: ran mwscript extensions/ShortUrl/populateShortUrlTable.php --wiki=gomwiki T206741
  • 23:32 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable shorturl on gomwiki (duration: 00m 48s)
  • 23:30 Reedy: created shorturl table on gomwiki T206741
  • 23:26 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable FileExporter to Meta-Wiki (duration: 00m 49s)
  • 23:21 reedy@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Disable CongressLookup (duration: 00m 49s)
  • 23:05 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/jobqueue/jobs/ThumbnailRenderJob.php: T203135 - Ib4640e (duration: 00m 49s)
  • 22:56 dzahn@neodymium: conftool action : set/pooled=inactive; selector: name=mwmaint1001.eqiad.wmnet
  • 22:53 mutante: netbox - correction, mwmaint1001 to status "Staged", following new lifecycle docs T192457
  • 22:50 mutante: netbox - renamed mwmaint1001 to mw1279, changed status to inventory, renamed in DNS - T192457
  • 22:45 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/Revision/RenderedRevision.php: I553dba13486 (duration: 00m 51s)
  • 22:30 mutante: mwmaint1001 - shutting down after final backup of /home, renaming back to mw1297 in DNS and DHCP, and reinstalling (T192457)
  • 21:53 mutante: mwmaint1001 - schduled downtime, is being renamed back to mw1297 and reinstalled
  • 21:47 mutante: mwmaint2001 - rsyncing home dirs from mwmaint1002 to /root/home-mwmaint1002 (which includes home-terbium even!) in case anyone is missing anything from one of mwaint*
  • 21:41 mutante: mwmaint2001 - deleting 60G of unneeded files from home
  • 20:37 XioNoX: add IPv6 to mr1-ulsfo OOB - T206778
  • 18:46 sbisson@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/PageTriage/: SWAT: Handle page that are unnominated for deletion (duration: 00m 50s)
  • 18:34 sbisson@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/PageTriage/modules/ext.pageTriage.views.list/ext.pageTriage.listControlNav.js: SWAT: Default to deleted and others when no type is selected on mode switch (duration: 00m 50s)
  • 18:22 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Remove config for RCFilters variables being removed from Core (duration: 00m 49s)
  • 18:14 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2083 and db2085:3318 (duration: 00m 48s)
  • 18:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1101:3318 (duration: 00m 49s)
  • 18:09 sbisson@deploy1001: Synchronized wmf-config/CommonSettings.php: SWAT: Add copyviobot group management to relevant wikis (duration: 00m 49s)
  • 17:36 gehel: repooling wdqs1003, catched up on lag
  • away: automated binlog purging started on pc2004, pc2005, pc2006
  • 16:54 gehel: depooling wdqs1003 to let it catch up on lag
  • 15:38 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1087 (duration: 00m 50s)
  • 15:12 marostegui: Stop MySQL on db2085:3318 to reclone db1101:3318 - T206743
  • 15:11 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1101:3318 (duration: 00m 49s)
  • 15:04 akosiaris: Media storage/Swift Swift set to active/passive
  • 15:01 akosiaris: Media storage/Swift Swift set to active/active
  • 14:56 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1099:3318 (duration: 00m 48s)
  • 14:52 jynus: deploying wikidata row fix to db1087 with replication enabled
  • 14:47 END: (PASS) - Cookbook sre.switchdc.services.02-restore-ttl (exit_code=0) (volans@neodymium)
  • 14:47 START: - Cookbook sre.switchdc.services.02-restore-ttl (volans@neodymium)
  • 14:36 END: (PASS) - Cookbook sre.switchdc.services.01-switch-dc (exit_code=0) (volans@neodymium)
  • 14:36 Switching: services parsoid, restbase, restbase-async, mobileapps, apertium, citoid, cxserver, eventstreams, graphoid, mathoid, proton, pdfrender, recommendation-api, zotero, eventbus, ores, wdqs, wdqs-internal: codfw => eqiad (volans@neodymium)
  • 14:36 START: - Cookbook sre.switchdc.services.01-switch-dc (volans@neodymium)
  • 14:35 END: (PASS) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=0) (volans@neodymium)
  • 14:30 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: T206743: mariadb: Depool db1087 (duration: 00m 49s)
  • 14:30 START: - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (volans@neodymium)
  • 14:28 banyek: depooling db1087 (T206743)
  • 14:28 banyek: depooling db1087
  • 14:15 elukey: reboot eventlog1002 for kernel upgrades
  • 14:15 jynus: applying row filling to (most) eqiad s8 dbs, including the mater
  • 14:13 moritzm: install libxml2 security updates on jessie servers
  • 13:55 jynus: recovering rows to db1092
  • 13:26 jynus: filling in missing rows on dbstore1002
  • 13:23 marostegui: Stop MySQL on db2083 to reclone db1116:3318 - T206743
  • 13:21 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2083 (duration: 00m 49s)
  • 13:20 marostegui: Stop MySQL on db1116:3318 to reclone it from db2083 - T206743
  • 12:43 elukey: upgrade prometheus-memcached-exporter on mc1*
  • 12:38 elukey: upgrade prometheus-memcached-exporter on mc2*
  • 12:15 elukey: upgrade prometheus-memcached-exporter on mc2035
  • 12:14 elukey: upload prometheus-memcached-exporter_0.4.1+git20181010.2fa99eb-1 to (jessie|stretch)-wikimedia
  • 12:12 Amir1: EU SWAT is done
  • 11:30 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Set some small wikis to read new for change tag backend (T194164) (duration: 00m 50s)
  • 11:10 marostegui: Stop MYSQL on db2085:3318 and db1099:3318 T206743
  • 11:09 marostegui: Stop MYSQL on db2088:3318 and db1099:3318 T206743
  • 11:08 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2085:3318 and db1099:3318 (duration: 00m 49s)
  • 11:07 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db2085:3318 and db1099:3318 (duration: 00m 49s)
  • 11:07 banyek: binlog expiration set to 60 days on db2045
  • 08:30 banyek: setting up some automated binlog purge mechanism on pc1004,pc1005,pc1006
  • 08:26 jynus: setting up replication from pc2005 -> pc1005 and from pc2006 -> pc2006
  • 08:20 jynus: setting up replication from pc2004 -> pc1004
  • 08:04 banyek: purging binary logs on pc1006
  • 08:04 banyek: purging binary logs on pc1005
  • 08:04 jynus: running /usr/local/bin/mwscript purgeParserCache.php --wiki=aawiki --age=1900800 --msleep 0
  • 08:04 banyek: purging binary logs on pc1004
  • 07:57 gehel: rolling restart blazegraph on wdqs-internal for config change - T206648
  • 07:43 addshore: deploy https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/466031 to mwmaint1002 only (increasing tracking of wikidata dispatching) T205865
  • 07:36 elukey: roll restart of aqs on aqs100[4-9] to pick up new Druid settings
  • 06:00 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Increase db1092 weight (duration: 00m 49s)
  • 05:43 marostegui: Purge binary logs on pc2005 due to disk space issues - T206740
  • 05:29 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1109 (duration: 00m 48s)
  • 05:24 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1109 (duration: 00m 51s)
  • 02:25 krinkle@deploy1001: Synchronized w/static.php: T127233 - Ic6acb70 (duration: 00m 49s)
  • 02:10 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/page/WikiPage.php: T203942 - Ib211d98498f (duration: 00m 49s)
  • 02:07 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/tests/phpunit/includes/page/: Ib211d98498f (duration: 00m 49s)
  • 01:38 krinkle@deploy1001: Synchronized wmf-config/etcd.php: T176370 - I5e7e5d167d517 (duration: 00m 55s)

2018-10-10

  • 23:08 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/maintenance/resources/foreign-resources.yaml: Ic865e7077d (duration: 00m 49s)
  • 22:59 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/MultimediaViewer/: T206099 - I53dbce0a (duration: 00m 49s)
  • 22:43 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/specials/SpecialDeletedContributions.php: T187619 - Ic6b0d8020553 (duration: 00m 48s)
  • 22:41 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/ORES/includes/FetchScoreJob.php: T204753 - Icc28230585bc (duration: 00m 49s)
  • 22:25 mutante: icinga1001 - chmod 2710 /var/lib/icinga/rw
  • 22:16 krinkle@deploy1001: Synchronized wmf-config/arclamp.php: T206092 - If607ad111a (duration: 00m 48s)
  • 21:51 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/ContentTranslation/specials/SpecialContentTranslation.php: T205433 - Ib34b28 (duration: 00m 49s)
  • 21:48 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/Echo/includes/DiscussionParser.php: T204291 - Ia5323b401b94 (duration: 00m 51s)
  • 21:45 XioNoX: Add icinga1001 to mr* security policies - T206704
  • 20:34 thcipriani: upgrading ci jenkins install on contint1001
  • 20:19 thcipriani: upgrading releases-jenkins jenkins install on releases1001
  • 20:17 thcipriani: upgrading releases-jenkins jenkins install on releases2001
  • 19:58 mutante: icinga - enabled icinga service on icinga1001 (stretch), but all notifications are disabled
  • 19:43 mutante: awight restarted ORES celery workers on ores2003 (~17:00), ores200* (17:05)
  • 19:35 kaldari@deploy1001: Finished scap: (no justification provided) (duration: 22m 05s)
  • 19:13 kaldari@deploy1001: Started scap: (no justification provided)
  • 19:11 kaldari: scap sync to rebuild i18n cache
  • 18:35 XioNoX: disable VC port 1/2 on asw2-c-eqiad:fpc3 (to fpc8)
  • 18:20 otto@deploy1001: Finished deploy [analytics/refinery@28bbee8]: Add accept header to webrequest logs - T170606 (duration: 10m 34s)
  • 18:19 XioNoX: delete sessions to AS6805 on cr2-esams (left AMS-IX)
  • 18:10 otto@deploy1001: Started deploy [analytics/refinery@28bbee8]: Add accept header to webrequest logs - T170606
  • 18:09 otto@deploy1001: Finished deploy [analytics/refinery@4e2d956]: Add accept header to webrequest logs - T170606 (duration: 04m 35s)
  • 18:05 otto@deploy1001: Started deploy [analytics/refinery@4e2d956]: Add accept header to webrequest logs - T170606
  • 17:49 XioNoX: replace 10.195.0.0/25 with 10.195.0.0/24 in prefix-list fundraising-codfw4 on cr1/2-codfw - T206637
  • 16:25 mutante: LDAP - added isaacj to wmf group (for SWAP access, existing shell user since recently) (T206631) (T205840)
  • 16:16 _joe_: restart of now-unused jobqueue redises for stopping the alerts post-switchover
  • 16:09 ejegg: updated CiviCRM from 1165e7ed79 to 4cc21d61c5
  • 15:59 vgutierrez: Uploaded certcentral 0.1 to apt.wikimedia.org (stretch) - T199711
  • 15:55 cmjohnson1: scheduled downtime for host cloudvirt1019 swap raid card T196507
  • 15:35 moritzm: uploaded jenkins 2.138.2 security release to apt.wikimedia.org (jessie/stretch) (T206234)
  • 15:11 _joe_: started again hhvm on mwmaint2001
  • 14:51 ejegg: turned fundraising scheduled jobs back on
  • 14:43 ejegg: turned off fundraising scheduled jobs
  • 14:42 END: (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0) (volans@neodymium)
  • 14:42 START: - Cookbook sre.switchdc.mediawiki.08-restore-ttl (volans@neodymium)
  • 14:42 END: (FAIL) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=99) (volans@neodymium)
  • 14:40 START: - Cookbook sre.switchdc.mediawiki.08-start-maintenance (volans@neodymium)
  • 14:39 END: (FAIL) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=99) (volans@neodymium)
  • 14:38 START: - Cookbook sre.switchdc.mediawiki.08-start-maintenance (volans@neodymium)
  • 14:33 oblivian@puppetmaster1001: conftool action : set/weight=15; selector: cluster=api_appserver,service=apache2,dc=eqiad,name=mw123.*
  • 14:31 oblivian@puppetmaster1001: conftool action : set/weight=15; selector: cluster=api_appserver,service=apache2,dc=eqiad,name=mw122.*
  • 14:19 END: (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0) (volans@neodymium)
  • 14:19 START: - Cookbook sre.switchdc.mediawiki.08-update-tendril (volans@neodymium)
  • 14:18 END: (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0) (volans@neodymium)
  • 14:18 MediaWiki: read-only period ends at: 2018-10-10 14:18:26.908958 (volans@neodymium)
  • 14:18 START: - Cookbook sre.switchdc.mediawiki.07-set-readwrite (volans@neodymium)
  • 14:18 END: (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0) (volans@neodymium)
  • 14:18 START: - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (volans@neodymium)
  • 14:17 END: (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0) (volans@neodymium)
  • 14:17 START: - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (volans@neodymium)
  • 14:17 END: (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-traffic (exit_code=0) (volans@neodymium)
  • 14:15 START: - Cookbook sre.switchdc.mediawiki.04-switch-traffic (volans@neodymium)
  • 14:15 END: (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0) (volans@neodymium)
  • 14:14 START: - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (volans@neodymium)
  • 14:14 END: (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0) (volans@neodymium)
  • 14:14 START: - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (volans@neodymium)
  • 14:14 END: (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0) (volans@neodymium)
  • 14:13 MediaWiki: read-only period starts at: 2018-10-10 14:13:46.068081 (volans@neodymium)
  • 14:13 START: - Cookbook sre.switchdc.mediawiki.02-set-readonly (volans@neodymium)
  • 14:10 END: (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0) (volans@neodymium)
  • 14:10 START: - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (volans@neodymium)
  • 14:10 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0) (volans@neodymium)
  • 14:07 START: - Cookbook sre.switchdc.mediawiki.00-warmup-caches (volans@neodymium)
  • 14:07 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0) (volans@neodymium)
  • 14:05 START: - Cookbook sre.switchdc.mediawiki.00-warmup-caches (volans@neodymium)
  • 14:05 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0) (volans@neodymium)
  • 14:01 START: - Cookbook sre.switchdc.mediawiki.00-warmup-caches (volans@neodymium)
  • 14:01 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0) (volans@neodymium)
  • 14:01 START: - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (volans@neodymium)
  • 14:00 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0) (volans@neodymium)
  • 14:00 START: - Cookbook sre.switchdc.mediawiki.00-disable-puppet (volans@neodymium)
  • 12:18 _joe_: decommissioning conf1001-1003: stopping etcd, nginx, and masking both
  • 11:41 jynus: renaming some s3 wiki tables on eqiad master to prevent split brain T184805
  • 11:29 zeljkof: EU SWAT finished
  • 11:26 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Permissions changes on itwikibooks (T206447) (duration: 00m 57s)
  • 10:54 marostegui: Set a replication filter on db1075 (s3 eqiad) to ignore enwikivoyage, cebwiki, shwiki, srwiki & mgwiktionary - T184805
  • 10:49 marostegui@deploy1001: Synchronized dblists/s5.dblist: Update s5.dblist to reflect the wikis moved from s3 - T184805 (duration: 00m 56s)
  • 10:48 marostegui@deploy1001: Synchronized dblists/s3.dblist: Update s3.dblist to reflect the wikis moved to s5 - T184805 (duration: 00m 58s)
  • 09:12 ema: Traffic: move restbase back to eqiad T203777
  • 09:07 ema: Traffic: set services active/active T203777
  • 09:00 ema: Traffic: route esams caches back to eqiad T203777
  • 08:27 moritzm: installing fuse security updates
  • 08:07 ariel@deploy1001: Finished deploy [dumps/dumps@0714a93]: fix adds/changes dumps generation when prev run is missing (duration: 00m 06s)
  • 08:07 ariel@deploy1001: Started deploy [dumps/dumps@0714a93]: fix adds/changes dumps generation when prev run is missing
  • 08:01 moritzm: rolling out debdeploy 0.0.99.6
  • 07:51 elukey: cleaned up some log files from eventlog1002
  • 02:55 ejegg: updated payments-wiki from 1472604b6e to 7fb1aae963
  • 00:19 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/utils/UIDGenerator.php: T94522 - I2a0c51bea58 (duration: 00m 56s)
  • 00:15 krinkle@deploy1001: sync-file aborted: T205567 - I75f1eb6dc2cb (duration: 00m 01s)
  • 00:14 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/tests/phpunit/includes/utils/: T94522 - I2a0c51bea58 (duration: 01m 02s)

2018-10-09

  • 22:58 SMalyshev: repooled wdqs2003
  • 22:26 shdubsh: repairing /dev/sdl1 on ms-be2040 - T199198
  • 21:52 bblack: cp1085: varnish backend restart for mbox lag
  • 21:50 mutante: releases1001 - restarted jenkins (it went from 200 -> 503 -> 403) curl localhost:8080 works again after restart, icinga check still getting 403 now
  • food: updated fundraising CiviCRM from 7a0d14015e to 1165e7ed79
  • 20:08 mutante: repair /dev/sdg1 on ms-be2041 - T199198
  • 19:37 XioNoX: disable igmp-snooping on asw2-c-eqiad - T201039
  • 19:25 XioNoX: disable igmp-snooping on asw2-b-eqiad - T201039
  • 19:20 XioNoX: bounce igmp-snooping on asw2-b-eqiad
  • 18:24 ottomata: adding Accept header to all varnishkafka generated webrequest logs
  • 17:21 SMalyshev: depooled wdq23 again, sigh
  • 13:54 moritzm: rebooting prometheus1004 for kernel security update
  • 13:41 moritzm: rebooting prometheus1003 for kernel security update
  • 13:28 moritzm: rebooting prometheus2004 for kernel security update
  • 13:13 moritzm: rebooting prometheus2003 for kernel security update
  • 12:54 gehel: silencing wdqs-public lag alerts (service still functional, and SLO unclear) - T199228
  • 12:45 moritzm: installing imagemagick security updates
  • 11:47 END: (ERROR) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=2) (volans@neodymium)
  • 11:47 START: - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (volans@neodymium)
  • 11:45 akosiaris: dry-run services switchover from codfw to eqiad in preparation for Thursday
  • 11:37 END: (ERROR) - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (exit_code=2) (volans@neodymium)
  • 11:37 START: - Cookbook sre.switchdc.services.00-reduce-ttl-and-sleep (volans@neodymium)
  • 11:14 volans: live-test of the inverted switchdc (eqiad->codfw) completed, all good - T203777
  • 11:14 END: (PASS) - Cookbook sre.switchdc.mediawiki.08-update-tendril (exit_code=0) (volans@neodymium)
  • 11:13 START: - Cookbook sre.switchdc.mediawiki.08-update-tendril (volans@neodymium)
  • 11:12 END: (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0) (volans@neodymium)
  • 11:11 START: - Cookbook sre.switchdc.mediawiki.08-start-maintenance (volans@neodymium)
  • 11:11 END: (PASS) - Cookbook sre.switchdc.mediawiki.08-restore-ttl (exit_code=0) (volans@neodymium)
  • 11:11 START: - Cookbook sre.switchdc.mediawiki.08-restore-ttl (volans@neodymium)
  • 11:11 END: (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0) (volans@neodymium)
  • 11:11 [DRY-RUN]: MediaWiki read-only period ends at: 2018-10-09 11:11:05.042622 (volans@neodymium)
  • 11:11 START: - Cookbook sre.switchdc.mediawiki.07-set-readwrite (volans@neodymium)
  • 11:08 END: (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0) (volans@neodymium)
  • 11:08 START: - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (volans@neodymium)
  • 11:07 END: (PASS) - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (exit_code=0) (volans@neodymium)
  • 11:07 START: - Cookbook sre.switchdc.mediawiki.05-invert-redis-sessions (volans@neodymium)
  • 11:06 END: (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-traffic (exit_code=0) (volans@neodymium)
  • 11:04 START: - Cookbook sre.switchdc.mediawiki.04-switch-traffic (volans@neodymium)
  • 11:03 END: (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0) (volans@neodymium)
  • 11:03 START: - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (volans@neodymium)
  • 11:00 END: (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0) (volans@neodymium)
  • 10:59 START: - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (volans@neodymium)
  • 10:56 END: (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0) (volans@neodymium)
  • 10:56 [DRY-RUN]: MediaWiki read-only period starts at: 2018-10-09 10:56:12.213026 (volans@neodymium)
  • 10:56 START: - Cookbook sre.switchdc.mediawiki.02-set-readonly (volans@neodymium)
  • 10:53 END: (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0) (volans@neodymium)
  • 10:53 START: - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (volans@neodymium)
  • 10:51 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0) (volans@neodymium)
  • 10:49 onimisionipe: repooling wdqs2001 catched up on lag - T206423
  • 10:48 START: - Cookbook sre.switchdc.mediawiki.00-warmup-caches (volans@neodymium)
  • 10:47 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-warmup-caches (exit_code=0) (volans@neodymium)
  • 10:41 START: - Cookbook sre.switchdc.mediawiki.00-warmup-caches (volans@neodymium)
  • 10:40 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0) (volans@neodymium)
  • 10:40 START: - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (volans@neodymium)
  • 10:37 END: (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0) (volans@neodymium)
  • 10:36 START: - Cookbook sre.switchdc.mediawiki.00-disable-puppet (volans@neodymium)
  • 10:35 onimisionipe: deploying prometheus-blazegraph-exporter 0.6 on all wdqs clusters - T206123
  • 10:34 volans: about to perform live-test of the inverted switchdc (eqiad->codfw), actions will be real but basically noop due to codfw being already active - T203777
  • 09:25 elukey: swapped Hadoop's hive/oozie from analytics1003 to an-coord1001
  • 09:16 ema: restart pybal on lvs1005 to pick up config changes (conf2001 -> conf1004)
  • 09:00 ema: re-enable puppet/pybal on lvs1002, IPv6 connectivity with phab1001 working again T201039
  • 08:16 elukey: update puppet compiler facts
  • 08:06 onimisionipe: depooling wdqs2001 to catch up on lag -T206423
  • 07:03 akosiaris: restart zuul and zuul-merger on contint1001 for the upgrade of zuul to finish
  • 06:37 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1122 (duration: 00m 57s)
  • 05:19 marostegui: Stop MySQL on db1122 for binlog format change, mysql and kernel upgrade
  • 05:14 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1122 (duration: 00m 59s)
  • 02:41 krinkle@deploy1001: Synchronized wmf-config/profiler.php: T176916 / T206092 - Ie86e88777c48 (duration: 00m 56s)
  • 02:21 krinkle@deploy1001: Synchronized wmf-config/arclamp.php: T176916 - Id79baae90: ensure file exists before Ie86e88777c48 (duration: 00m 57s)
  • 00:04 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/libs/rdbms/database: T201900 - I8ae754a2518 (duration: 00m 59s)

2018-10-08

  • 22:45 XioNoX: increase accepted-prefix-limit for 24115 on cr4-ulsfo
  • 22:41 XioNoX: clear BGP neighbor cr1-eqsin:AS9583 (bgp limit threshold reached)
  • 21:11 ejegg: updated payments-wiki from d623de9494 to 1472604b6e
  • 20:42 gehel: repooling wdqs2003 catched up on lag - T206423
  • 19:41 XioNoX: troubleshooting asw2-b-eqid with JTAC - T201039
  • 19:08 gehel: depooling wdqs2003 to catch up on lag -T206423
  • 19:00 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable MCR read-new mode on some small wikis (T198308) (duration: 00m 56s)
  • 18:55 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@bd698bd]: WDQS deployment - New federation whitelist entries (duration: 10m 07s)
  • 18:45 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@bd698bd]: WDQS deployment - New federation whitelist entries
  • 18:37 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@bd698bd]: WDQS test deployment - New federation whitelist entries(wdqs1009) (duration: 00m 33s)
  • 18:37 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@bd698bd]: WDQS test deployment - New federation whitelist entries(wdqs1009)
  • 18:36 tgr@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable Extension:File exporter to mrwikipedia (T206437) (duration: 00m 57s)
  • 16:29 XioNoX: push firewall filter counters on asw2-b-eqiad - T201039
  • 16:28 elukey: restart eventlogging on eventlog1002 for python security upgrades
  • 14:05 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: T184805: Revert 'mariadb: Depool db1110 for testing s3 imports' (duration: 00m 57s)
  • 14:03 banyek@deploy1001: Synchronized wmf-config/db-eqiad.php: T184805: Revert 'mariadb: Depool db1110 for testing s3 imports' (duration: 00m 56s)
  • 13:43 elukey: restart confd on esams nodes to pick up new srv settings
  • 13:41 elukey: restart navtiming.service on webperf1001 to pick up the dns change for etcd
  • 13:39 marostegui: Enable gtid on the following slaves: db2068 db1122 db1117:3323
  • 13:37 elukey: restart confd on all the other eqiad nodes to pick up new srv records
  • 13:32 elukey: restart confd on cp1* to pick up new srv records
  • 13:11 _joe_: purging the dnsrec cache for eqiad,esams etcd client SRV records
  • 13:09 ema: depool eqiad front-edge traffic T201039
  • 13:05 banyek: converting cebwiki.templatelinks to TokuDB on host dbstore1002.eqiad.wmnet (T205544)
  • 13:04 banyek: downtime notifications for dbstore1002 repliaction threads (T205544)
  • 12:49 banyek: pt-kill-wmf enabled on the wikireplicas (T203674)
  • 11:59 _joe_: restart pybal in esams, after running puppet, to switch etcd cluster used
  • 11:46 _joe_: restart pybal on lvs1001
  • 11:46 addshore: SWAT done
  • 11:45 addshore@deploy1001: Synchronized wmf-config/throttle.php: Add throttle exception for Netherlands Hackathon October 2018 - Wiki Techstorm T206241, and remove other rules. (duration: 00m 56s)
  • 11:39 addshore: addshore@mwmaint2001:~$ mwscript namespaceDupes.php --wiki fywiktionary --fix --add-prefix=T202769 # T202769
  • 11:35 addshore: addshore@mwmaint2001:~$ mwscript namespaceDupes.php --wiki fywiktionary --fix # Finished, still 111 pages to fix
  • 11:34 addshore: addshore@mwmaint2001:~$ mwscript namespaceDupes.php --wiki fywiktionary --fix # Started
  • 11:33 addshore: addshore@mwmaint2001:~$ mwscript namespaceDupes.php --wiki fywiktionary # (dryrun, 11529 links to fix, 11529 were resolvable.)
  • 11:32 addshore@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: gerrit:455249 Use translated MetaNamespace for fy.wiktionary T202769 (duration: 00m 58s)
  • 11:27 addshore@deploy1001: Synchronized wmf-config/flaggedrevs.php: SWAT: gerrit:464890 Remove the "reviewer" group at ruwikisource T205997 (duration: 00m 57s)
  • 10:41 elukey: restart mcrouter on mw2201 with more verbose logging settings as test
  • 09:55 moritzm: installing python3.5/python2.7 security updates
  • 09:51 godog: rebuild sdc sdh sdj sdi on ms-be2041 with crc=1 finobt=0 - T199198
  • 08:20 marostegui: Disable gtid on es2 and es3 eqiad master
  • 08:20 gehel@puppetmaster1001: conftool action : set/weight=15; selector: dc=codfw,cluster=wdqs,name=wdqs2001.codfw.wmnet
  • 08:20 gehel@puppetmaster1001: conftool action : set/weight=15; selector: dc=codfw,cluster=wdqs,name=wdqs2002.codfw.wmnet
  • 07:50 marostegui: Enabling replication eqiad -> codfw in preparation for DC failover
  • 07:40 marostegui: Disable GTID on s1,s2,s3,s4,s6,s7,s8 eqiad masters in preparation for enabling replication eqiad -> codfw
  • 07:39 _joe_: disabling puppet, doing etcd tests on lvs1006
  • 07:38 gehel@puppetmaster1001: conftool action : set/weight=15; selector: dc=codfw,cluster=wdqs,name=wdqs2002.eqiad.wmnet
  • 07:38 gehel@puppetmaster1001: conftool action : set/weight=15; selector: dc=codfw,cluster=wdqs,name=wdqs2001.eqiad.wmnet
  • 07:38 gehel: reducing relative weight of wdqs2003 in pybal - T206423
  • 07:27 banyek: enabling first time wmf-pt-kill on labsdb1010
  • 07:20 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1092 with low weight - T205514 (duration: 01m 27s)
  • 07:00 moritzm: installing git security updates

2018-10-07

  • 16:40 dereckson: Reset user email for account "Dominic Mayers" (T206421)
  • 16:35 elukey: run a script in tmux (my username) on mw2201 to poll the status of a mcrouter key/route every 10s using its admin api (very lightweight but kill if needed)
  • 14:52 onimisionipe: repooling wdqs2003. Catched up on Lag and also Lag issues seems to be creeping on wdqs200[1|2]
  • 04:29 SMalyshev: temp depooled wdqs2003
  • 03:12 ejegg: disabled all fundraising scheduled jobs - something that looks like disk issues on civi1001

2018-10-06

  • 21:20 gehel: repooling wdqs2003: catched up on updater lag
  • 20:43 _joe_: restarting apache2 on puppetmaster1001
  • 19:16 onimisionipe: depooling wdqs2003
  • 18:10 elukey: restart Yarn Resource Manager on an-master1002 to force an-master1001 to take the active role back (failed over due to a zk conn issue)
  • 17:07 onimisionipe: restarting wdqs-blazegraph on wdqs2003
  • 13:48 bblack: multatuli: update gdnsd package to 2.99.9930-beta-1+wmf1
  • 13:47 bblack: authdns1001: update gdnsd package to 2.99.9930-beta-1+wmf1 (correction to last msg)
  • 13:46 bblack: authdns1001: update gdnsd package to 2.99.9161-beta-1+wmf1
  • 12:57 bblack: rebooting cp1076
  • 12:49 bblack: depool cp1076, apparently has disk issues

2018-10-05

  • 23:50 bblack: <<<<<<< repooling eqiad edge caches, a few days ahead of intended switchback next Weds, to alleviate some traffic engineering concerns over the weekend >>>>>>
  • 20:48 mutante: T191183 - it's still showing the error page as before but that isn't due to apache issues, it just needs additional ferm rules
  • 20:44 mutante: gerrit - adding gerrit.wmfusercontent.org virtual host for avatars. applied first on gerrit2001, then on cobalt (T191183)
  • 20:03 ejegg: updated fundraising CiviCRM from ebc2e0076c to 7a0d14015e
  • 19:48 banyek: repooling labsdb1009 (T195747)
  • 19:44 smalyshev@deploy1001: Finished deploy [wdqs/wdqs@f8776de]: Redeploy 1009 (duration: 00m 26s)
  • 19:44 smalyshev@deploy1001: Started deploy [wdqs/wdqs@f8776de]: Redeploy 1009
  • 18:37 bblack: authdns2001: upgraded gdnsd to 2.99.9930-beta
  • 18:31 bblack: gdnsd-2.99.9930-beta-1+wmf1 uploaded to stretch-wikimedia
  • 18:26 mutante: icinga - noop on all servers, no change, puppet re-enabled, operations normal
  • 18:08 mutante: disabling puppet on icinga for 5 min for extra safety before a change that should be noop
  • 17:58 banyek: depooling labsdb1009 (T195747)
  • 17:50 banyek: repooling labsdb1011 (T195747)
  • 17:12 elukey: set etcd in codfw as read/write (was readonly) and eqiad as readonly (was read/write)
  • 14:57 banyek: depooling labsdb1011 (T195747)
  • 14:56 banyek: depooling labsdb1011
  • 13:26 banyek: adding wmf-pt-kill_2.2.20-1+wmf3 package for stretch
  • 13:25 moritzm: installing python3.5/2.7 security updates
  • 13:02 volans: upgraded spicerack to version 0.0.9 on sarin/neodymium/cumin* - T199079
  • 12:13 vgutierrez: Creating certcentral1001.eqiad.wmnet in ganeti - T206308
  • 12:12 vgutierrez: Creating certcentral2001.codfw.wmnet in ganeti - T206308
  • 11:59 elukey: deleted bohrium from ganeti via gnt-instance
  • 11:43 moritzm: rebooting wezen for kernel security update
  • 11:29 moritzm: rebooting ruthenium for kernel security update
  • 10:40 jynus: restarting replication on labsdb1010/1 on s3 and s5
  • 10:37 volans: uploaded spicerack_0.0.9-1{,+deb9u1} to apt.wikimedia.org {jessie,stretch}-wikimedia - T199079
  • 10:17 moritzm: rearmed keyholder on netmon2001
  • 10:10 elukey: restart confd on labs-puppetmaster to pick up new etcd settings (eqiad -> codfw)
  • 10:03 _joe_: restarting navtiming.service on webperf1001 to pick up the dns change for etcd
  • 09:37 elukey: restart rsyslog on lithium - broken connection to tegmen - T199406
  • 09:37 banyek: disabling puppet on labsdb1009,labsdb1010,labsdb1011 (T203674)
  • 09:36 banyek: adding wmf-pt-kill_2.2.20-1+wmf2 package for stretch
  • 09:16 volans: rebooting tegmen, console stuck, possible re-occurrence of T199413 (to be confirmed)
  • 09:12 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Move some wikis for s3 to s5 (duration: 00m 56s)
  • 09:06 elukey: stop etcdmirror replication on conf2002
  • 09:05 _joe_: restarting confd on all nodes in eqiad and esams
  • 08:58 _joe_: wiped cached values for the read-only etcd SRV record
  • 08:56 _joe_: read-write connections to etcd only go to codfw now
  • 08:35 _joe_: reenabling notifications for etcdmirror on conf1005
  • 08:02 jynus: start replication on db1069 (x1)
  • 07:54 jynus: starting replicatios on db1075; db1070, db1070:s3 with disabled gtid
  • 07:50 jynus: stopping dbstore1001:x1
  • 07:33 jynus: chaning s3 master for db1070
  • 07:28 jynus: stopping s3 replication on db1070
  • 07:20 jynus: stopping x1 replication on db1069
  • 07:20 godog: temporarily stop prometheus on bast4001 to finalize data transfer - T179050
  • 07:19 jynus: stopping s3 replication on db1075
  • 07:18 jynus: stopping s5 replication on db1070
  • 07:09 moritzm: installing python3.4/2.7 security updates
  • 05:55 krinkle@deploy1001: Synchronized wmf-config/InitialiseSettings.php: T205599 - Ic28e00c30 (duration: 00m 57s)
  • 05:53 _joe_: upgrading python-etcd on conf1004-6, restarting etcdmirror
  • 05:13 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Clarify db1092 status - T205514 (duration: 00m 57s)
  • 04:18 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/includes/libs/filebackend/FileBackendStore.php: T205567 - I75f1eb6dc2cb (duration: 00m 56s)
  • 04:16 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/CirrusSearch/includes/DataSender.php: I0769c50c (duration: 01m 01s)
  • 00:31 mutante: LDAP: added user skvjold to group wmf (T204377)

2018-10-04

  • 22:51 ejegg: updated fundraising CiviCRM from 944b954bac to ebc2e0076c
  • 21:27 XioNoX: bounce phab1001 switch port - T201039
  • 20:47 ejegg: updated fundraising CiviCRM from ddf4865650 to 944b954bac
  • 20:23 mforns@deploy1001: Finished deploy [analytics/refinery@3eb9bf2]: deploying refinery together with refinery-source v0.0.76 (duration: 00m 17s)
  • 20:22 mforns@deploy1001: Started deploy [analytics/refinery@3eb9bf2]: deploying refinery together with refinery-source v0.0.76
  • 20:10 mforns@deploy1001: Finished deploy [analytics/refinery@3eb9bf2]: deploying refinery together with refinery-source v0.0.76 (duration: 14m 04s)
  • 19:56 mforns@deploy1001: Started deploy [analytics/refinery@3eb9bf2]: deploying refinery together with refinery-source v0.0.76
  • 19:30 marxarelli: rise in fatals "Fatal error: entire web request took longer than 60 seconds and timed out in /srv/mediawiki/php-1.32.0-wmf.24/includes/Title.php"
  • 19:26 dduvall@deploy1001: rebuilt and synchronized wikiversions files: all wikis to 1.32.0-wmf.24
  • 19:15 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@6dc89c0]: Bump cirrusSearchLinksUpdate concurrency to 50 (duration: 00m 53s)
  • 19:14 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@6dc89c0]: Bump cirrusSearchLinksUpdate concurrency to 50
  • 18:49 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:460202|]] (duration: 00m 59s)
  • 18:24 XioNoX: bounce lvs1002:eth1 switch port
  • 18:23 sbisson@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Enable PageTriage/ORES on enwiki (T206149) (duration: 01m 01s)
  • 18:21 bblack: lvs1002: puppet disabled, stopping pybal (fail to 1005)
  • 18:07 _joe_: disabled notifications for etcd replication lag on conf1005, not in production
  • 17:47 banyek: repooling labsb1010 (T195747)
  • 17:41 _joe_: uploaded new python-etcd packages for jessie, stretch
  • 17:38 XioNoX: asw2-b-eqiad recabling done - T201039
  • 17:34 elukey: pool kafka1002 (eventbus) after maintenance
  • 17:22 elukey: re-enable ircecho after alarms shower
  • 17:15 andrewbogott: triggering some alerts on labvirt1018 to figure out about alert thresholds
  • 17:06 elukey: stop ircecho on einstenium - alarms shower
  • 17:02 gtirloni: tools - published updated toollabs-* Docker images
  • 16:54 ejegg: updated standalone SmashPig deploy from 82f9d49c23 to 5f21d3f2db
  • 16:52 XioNoX: Step 3) Add missing links - T201039
  • 16:45 shdubsh: etherpad1001 running systemctl reset-failed
  • 16:41 XioNoX: Connect/enable fpc2:0/51-fpc5:1/0 (5m DAC) - T201039
  • 16:39 XioNoX: Enable fpc5-fpc7 - T201039
  • 16:33 twentyafterfour: started phd on phab1001 and re-enabled puppet (I had it disabled to prevent starting phd during read-only)
  • 16:25 twentyafterfour: phabricator is read-write
  • 16:21 jynus: reloading dbproxy1003,8
  • 16:16 marostegui: Stop and reboot db1072 (phabricator master) for maintenance
  • 16:16 twentyafterfour: phabricator is read-only
  • 16:14 XioNoX: Enable all VC ports on FPC2 and FPC7 - T201039
  • 16:13 XioNoX: starting asw2-b-eqiad re-cabling - T201039
  • 16:08 twentyafterfour: logged downtime for phabricator in icinga, stopped phd queue processing in preparation for read-only mode
  • 16:07 jynus: reloading haproxy @ dbproxy1005
  • 16:00 marostegui: Stop MySQL on db1073 for mariadb and kernel upgrade - T201039 T148507
  • 15:58 arturo: icinga downtime every server in the main cloudvps deployment for 2h T201039
  • 15:56 arturo: icinga downtime every server with the cloudXXXX scheme for 2h T201039
  • 15:54 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@55dbb8b]: Proper reconnect on topics change T199444 (duration: 00m 55s)
  • 15:53 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@55dbb8b]: Proper reconnect on topics change T199444
  • 15:52 ppchelko@deploy1001: Finished deploy [changeprop/deploy@5d00448]: Proper reconnect on topics change T199444 (duration: 01m 40s)
  • 15:51 ppchelko@deploy1001: Started deploy [changeprop/deploy@5d00448]: Proper reconnect on topics change T199444
  • 15:41 elukey: depool kafka1002 from eventbus as precautionary step for T201039
  • 14:48 banyek: depooling labsb1010 (T195747)
  • 14:09 marostegui: Sanitize enwikivoyage cebwiki shwiki srwiki mgwiktionary on db1124:3315 T184805
  • 13:46 pmiazga@deploy1001: Finished deploy [proton/deploy@ecb9a0e]: Bugfix:handle undefined response and fix grafana stats (T186748,T201158) (duration: 02m 55s)
  • 13:43 pmiazga@deploy1001: Started deploy [proton/deploy@ecb9a0e]: Bugfix:handle undefined response and fix grafana stats (T186748,T201158)
  • 13:14 banyek: muting alerts on s2replication @dbstore2002 and resuming compression of s2 database tables (T204930)
  • 13:14 banyek: muting alerts on dbstore2002 and resuming compression of s2 database tables (T204930)
  • 12:23 elukey: deploy etcdmirror on conf1005 - T205814
  • 12:06 zeljkof: EU SWAT finished
  • 12:06 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add permission "move-rootuserpages" to usergroup "eliminator" at ptwiki (T205595) (duration: 00m 57s)
  • 12:01 moritzm: rolling reboot of ms-fe hosts in codfw for kernel security update
  • 12:00 zeljkof: one more patch for EU SWAT
  • 11:57 zeljkof: EU SWAT finished
  • 11:57 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add *.nasimonline.ir to wgCopyUploadsDomains whitelist for Commons (T203371) (duration: 00m 56s)
  • 11:52 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: add Radlines.org to $wgCopyUploadsDomains (T203219) (duration: 00m 57s)
  • 11:42 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add .bollywoodhungama.in to wgCopyUploadsDomains (T203363) (duration: 00m 57s)
  • 11:35 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Add some namespaces aliases for zhwikiversity (T201675) (duration: 00m 57s)
  • 11:27 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Change acewiki default time zone to Asia/Jakarta (T205693) (duration: 00m 56s)
  • 11:17 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create Photowalk and Photowalk Talk namespaces for bd.wikimedia.org (T205747) (duration: 00m 57s)
  • 10:44 twentyafterfour@deploy1001: Synchronized php-1.32.0-wmf.23/README: noop sync to verify that scap 3.8.7-1 works (at least on a basic level) (duration: 00m 59s)
  • 10:38 godog: upload scap 3.8.7-1 - T204383
  • 10:36 _joe_: uploading etcd-mirror to stretch-wikimedia T205814
  • 10:08 moritzm: rolling reboot of ms-fe hosts in eqiad for kernel security update
  • 09:13 arturo: T203177 schedule 8h icinga downtime for cloudcontrol1003,1004 and labmon1001
  • 08:52 moritzm: installing python2.7/python3.4/python3.5 security updates on jessie/stretch
  • 08:34 moritzm: installing ca-certificates updates for jessie/stretch
  • 08:09 marostegui: Restart icinga T196336
  • 08:00 gehel: re-enabling puppet on maps1004
  • 07:31 elukey: move Piwik/Matomo from bohrium to matomo1001 - T202962
  • 07:25 godog: reformat ms-be1041 with crc=1 finobt=0 - T199198
  • 06:57 jynus: starting multisource replication of s3 from s5 at eqiad master
  • 06:51 jynus: reenabling consistency configuration on s5 replica databases
  • 06:24 jynus: create manual backup of databases on eqiad s6, s7, s8, x1
  • 05:36 marostegui: Deploy schema change on db2048 (s1 master) - T205913
  • 05:35 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2062 (duration: 00m 56s)
  • 05:30 marostegui: Deploy schema change on db2062 - T205913
  • 05:30 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2062 (duration: 00m 57s)
  • 04:04 SMalyshev: repooled wdqs2003
  • 03:22 SMalyshev: depool wdqs2003 to let it catch up
  • 03:21 SMalyshev: repooled wdqs2001
  • 03:16 ejegg: re-enabled PayPal EC orphan rectifier
  • 03:06 ejegg: updated CiviCRM from 80cb98e33e to ddf4865650
  • 02:43 SMalyshev: depooled wdqs2001 to see if it catches up faster
  • 01:54 ejegg: updated payments-wiki from 8b673cfb4f to d623de9494

2018-10-03

  • 23:54 mutante: scheduled downtime for wdqs as it's flapping and already known
  • 23:45 catrope@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/VisualEditor/: Require Parsoid HTML 2.0.0, and handle its <audio> tags (T201081); ext.visualEditor.mwlanguage: Actually load all of the code (T205834) (duration: 00m 57s)
  • 23:41 catrope@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/VisualEditor/: Require Parsoid HTML 2.0.0, and handle its <audio> tags (T201081) (duration: 00m 59s)
  • 23:29 catrope@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/PageTriage/: Hide copyvio AFC filter option behind flag (T205918) (duration: 00m 57s)
  • 23:23 catrope@deploy1001: Synchronized php-1.32.0-wmf.24/includes/utils/UIDGenerator.php: Make UID clock drift error have more details (T94522) (duration: 00m 58s)
  • 23:20 XenoRyet: shut off Paypal orphan rectifier
  • 23:12 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Bump Minerva A/B test rates to 100% on jawiki, ruwiki, fawiki (T200792) (duration: 00m 56s)
  • 22:49 shdubsh: re-enable puppet on einsteinium
  • 22:45 shdubsh: einsteinium: setting enable_notifications=1 and reloading icinga
  • 22:36 herron: herron@neodymium:~$ sudo cumin -b 15 -p 95 '*' 'run-puppet-agent -q --failed-only'
  • 22:20 shdubsh: einsteinium: setting enable_notifications=0 and starting icinga
  • 22:06 herron: herron@neodymium:~$ sudo cumin -b 40 -p 95 'R:file = /etc/nagios/nrpe_local.cfg' run-puppet-agent
  • 22:02 mutante: mw2242 - started nagios-nrpe-server
  • 22:01 shdubsh: icinga stopped manually
  • 21:57 mutante: einstienium - disabling puppet
  • 21:25 bblack: upgraded gdnsd to 2.99.9161 on authdns1001
  • 21:17 dduvall@deploy1001: Synchronized php: group1 wikis to 1.32.0-wmf.24 (duration: 00m 55s)
  • 21:16 dduvall@deploy1001: rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.24
  • 21:12 dduvall@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/WikibaseQualityConstraints/src/ServiceWiring.php: deploying fix to 1.32.0-wmf.24 for T206161 (duration: 00m 57s)
  • 20:28 marxarelli: deployed proposed WikibaseQualityConstraints fix and wikiversions bump for wikidatawiki to mwdebug1001 and mwdebug1002 for verification (T206161)
  • 20:18 robh: optic swap on cr4-ulsfo:et-0/0/1
  • 20:03 bblack: upgraded gdnsd to 2.99.9161 on multatuli
  • 19:40 bblack: upgraded gdnsd to 2.99.9161 on authdns2001
  • 19:35 bblack: uploaded 2.99.9161-beta-1+wmf1 to stretch-wikimedia
  • 19:33 mateusbs17: running initial osm import in maps1004
  • 19:23 dduvall@deploy1001: Synchronized php: rollback group1 to 1.32.0-wmf.23 (duration: 00m 54s)
  • 19:18 dduvall@deploy1001: rebuilt and synchronized wikiversions files: rollback group1 to 1.32.0-wmf.23
  • 19:15 marxarelli: rolling back group1 after rapid rise in fatals
  • 19:14 dduvall@deploy1001: scap failed: average error rate on 6/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details)
  • 18:49 RoanKattouw: Deployed patches for T206130
  • 18:36 papaul: reinstalling OS on lvs2010
  • 18:16 mutante: lvs2010 - schduled downtime for host and services for 12 hours for reinstall
  • 18:09 mutante: lvs2009 - schedule downtime in icinga for 4 hours, reinstall in progress
  • 18:08 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@d5bab41]: Bump cirrusSearchLinksUpdate concurrency to 20 (duration: 00m 57s)
  • 18:07 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@d5bab41]: Bump cirrusSearchLinksUpdate concurrency to 20
  • 18:07 XioNoX: disable ulsfo Zayo transit/transport links
  • 17:42 XioNoX: re-enable cr1-eqiad:ae1 - T201145
  • 17:28 XioNoX: start of recabling asw2-a-eqiad between asw and cr1 - T201145
  • 17:26 XioNoX: disable cr1-eqiad:ae1 - T201145
  • 17:10 papaul: reinstalling OS on lvs2009
  • 16:24 reedy@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/Flow/: fixup flow exporting T203424 (duration: 01m 03s)
  • 15:45 ejegg: updated fundraising CiviCRM from e3e1963915 to 80cb98e33e
  • 14:42 jynus: fixed some prometheus metrics grants on dbstore1001:3306, db1116:3317 and db1116:3318
  • 14:07 banyek: converting wikidatawiki.change_tag to TokuDB on host dbstrore1002 (T205544)
  • 12:54 urandom: DROP unused RESTBase tables - T204752
  • 12:26 stephanebisson: Finished mwscript extensions/ORES/maintenance/BackfillPageTriageQueue.php --wiki enwiki (T203286)
  • 12:12 stephanebisson: Starting mwscript extensions/ORES/maintenance/BackfillPageTriageQueue.php --wiki enwiki (T203286)
  • 11:54 ladsgroup@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Don't purge articlequality, draftquality scores (T203286) (duration: 00m 57s)
  • 11:45 banyek: converting enwiki.slots to TokuDB on host dbstrore1002 (T205544)
  • 11:42 pmiazga@deploy1001: Synchronized wmf-config: SWAT: Remove dead config relating to wgRelatedArticlesEnabledBucketSize (T202306) (duration: 00m 57s)
  • 11:38 arturo: downtime cloudcontrol1003,1004 for 2h for T203177
  • 11:30 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Create eliminator group at Vietnamese Wikibooks (T202207) (duration: 00m 58s)
  • 11:25 zfilipin@deploy1001: Synchronized wmf-config/InitialiseSettings.php: SWAT: Fix a typo in zhwikiversitys importsources definition (T201328) (duration: 00m 57s)
  • 11:20 zfilipin@deploy1001: Synchronized wmf-config/throttle.php: SWAT: Fix a typo in lift account creation cap for cswiki event (T206119) (duration: 00m 56s)
  • 10:41 jynus: start compressing dbstore1001:x1 tables
  • 09:26 jynus: reducing io overhead temporarilly in exchange for crash safety for s5 replicas T184805
  • 09:23 jynus: fixing replication filters on dbstore1002 (again)
  • 08:34 jynus: fixing replication filters on dbstore1002
  • 08:18 jynus: starting importing of certain s3 wikis into eqiad s5 master T184805
  • 07:51 jynus: deploying replication filtes to s5 at labsdb1009/10/11 and dbstore1002 T184805
  • 07:06 mholloway-shell@deploy1001: Finished deploy [kartotherian/deploy@27062b4] (maps1004): Specify WDQS endpoint at wdqs.discovery.wmnet in the service config (T205607) (duration: 00m 28s)
  • 07:05 mholloway-shell@deploy1001: Started deploy [kartotherian/deploy@27062b4] (maps1004): Specify WDQS endpoint at wdqs.discovery.wmnet in the service config (T205607)
  • 06:42 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2055 (duration: 00m 55s)
  • 06:37 marostegui: Deploy schema change on db2055 - T205913
  • 06:37 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2055 (duration: 00m 56s)
  • 06:03 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2085:3311 (duration: 00m 56s)
  • 05:59 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@e1aab7b]: Request Parsoid HTML version 2.0.0 (0866a07) (duration: 03m 32s)
  • 05:57 marostegui: Deploy schema change on db2085:3311 - T205913
  • 05:56 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@e1aab7b]: Request Parsoid HTML version 2.0.0 (0866a07)
  • 05:55 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2085:3311 (duration: 00m 58s)
  • 05:26 marostegui: Deploy schema change on db1067 (s1 eqiad master), lag will be generated - T205913
  • 05:25 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2070 (duration: 00m 57s)
  • 05:24 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/languages/Language.php: T206030 - I985dfa3eb17 (duration: 00m 56s)
  • 05:21 marostegui: Deploy schema change on db1075 (s3 eqiad master), lag will be generated - T205913
  • 05:20 marostegui: Deploy schema change on db2070 - T205913
  • 05:20 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2070 (duration: 00m 56s)
  • 04:45 krinkle@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/NavigationTiming: T205580 - I04c52658fbf6d (duration: 01m 03s)
  • 00:42 Amir1: Evening SWAT is done
  • 00:41 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/GlobalPreferences/resources/ext.GlobalPreferences.global.ooui.js: SWAT: Fail gracefully if we failed to find associated widget (T205991) (duration: 00m 57s)
  • 00:38 mutante: icinga1001 (not prod yet), removing all icinga packages, running puppet to reinstall them, debugging dpkg issue
  • 00:19 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/GlobalPreferences/resources/ext.GlobalPreferences.global.ooui.js: SWAT: Fail gracefully if we failed to find associated widget (T205991) (duration: 00m 55s)

2018-10-02

  • 23:54 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/PageTriage/i18n/en.json: SWAT: Align copyvio log terminology (T199359) (duration: 00m 56s)
  • 23:38 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/PageTriage/modules/ext.pageTriage.views.list/ext.pageTriage.listControlNav.underscore: SWAT: Hide copyvio, none afc filter options behind flag (T205918) (duration: 00m 56s)
  • 23:33 ejegg: updated fundraising CiviCRM from c353eba283 to e3e1963915
  • 23:26 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.24/extensions/ORES/tests/phpunit/includes/HooksTest.php: SWAT: Disable RCFilters in tests (duration: 00m 54s)
  • 23:16 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/FlaggedRevs/frontend/specialpages/reports/ProblemChanges_body.php: SWAT: Fix using the old index when new indexes are not there (T205904) (duration: 00m 57s)
  • 22:53 shdubsh: powercycling icinga1001 after removing problematic entry from fstab
  • 22:26 gtirloni: labstore2003 re-started service block_sync
  • 21:39 XioNoX: Fix unused vlans XLink1/2 on asw2-a5
  • 21:15 banyek: enabling puppet on es2001
  • 21:12 banyek: re-enabling and starting backups on host es2001 (TT205257)
  • 21:01 gtirloni: labstore2003 stopped service block_sync
  • 20:15 dduvall@deploy1001: Finished scap: group0 to php-1.32.0-wmf.24 (duration: 33m 00s)
  • 20:04 Jeff_Green: authdns-update to deploy new IP for frbast2001.frack.eqiad.wmnet
  • 19:50 XioNoX: update prefix-list fundraising-codfw-internal4 to /24 on pfw3-codfw - T204271
  • 19:42 dduvall@deploy1001: Started scap: group0 to php-1.32.0-wmf.24
  • 19:36 dduvall@deploy1001: Pruned MediaWiki: 1.32.0-wmf.19 (duration: 07m 25s)
  • 19:21 XioNoX: update fw policies on pfw3-eqiad - T204271
  • 19:19 XioNoX: update fw policies on pfw3-codfw - T204271
  • 18:39 XioNoX: replace 10.195.0.73/29 with 10.195.0.65/28 on pfw3-codfw - T204271
  • 18:26 XioNoX: remove old 10.195.0.65/29 from pfw3-codfw - T204271
  • 18:24 jynus: restarting ferm on dbstore2002 T205257
  • 18:08 arlolra: Updated Parsoid to 65d6f82 (T163438, T205674, T205673)
  • 18:07 ariel@deploy1001: Finished deploy [dumps/dumps@a9570fb]: fix incr dumps multiversion conf setting (duration: 00m 06s)
  • 18:07 ariel@deploy1001: Started deploy [dumps/dumps@a9570fb]: fix incr dumps multiversion conf setting
  • 18:01 arlolra@deploy1001: Finished deploy [parsoid/deploy@19053a3]: Updating Parsoid to 65d6f82 (duration: 10m 44s)
  • 17:51 arlolra@deploy1001: Started deploy [parsoid/deploy@19053a3]: Updating Parsoid to 65d6f82
  • 17:37 XioNoX: update NAT for frbast2001 on pfw3-codfw - T204271
  • 17:25 XioNoX: update fw policies on pfw3-eqiad - T204271
  • 17:22 XioNoX: update fw policies on pfw3-codfw - T204271
  • 17:22 andrewbogott: upgraded wikitech-static to remotes/origin/REL1_31
  • 17:18 andrewbogott: upgrading debian packages and MediaWiki version on wikitech-static
  • 16:53 jynus: setup test s3 replication channel on db1110 (filtered)
  • 16:49 XioNoX: assign 10.195.0.129/29 to pfw3-codfw:reth0.2133 - T204271
  • 16:38 cmjohnson1: swapping failed disk db1067 T205780
  • 16:04 ppchelko@deploy1001: Finished deploy [cpjobqueue/deploy@093551f]: Increase cirrusSearchLinksUpdate concurrency (duration: 01m 06s)
  • 16:03 ppchelko@deploy1001: Started deploy [cpjobqueue/deploy@093551f]: Increase cirrusSearchLinksUpdate concurrency
  • 15:50 marxarelli: cutting 1.32.0-wmf.24 branch
  • 15:33 gehel: cleanup old cronjob (cleanup GC logs) on all elasticsearch servers
  • 15:24 akosiaris: upgrade mathoid chart version to 0.0.11
  • 15:24 akosiaris@deploy1001: scap-helm mathoid finished
  • 15:23 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 15:23 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 15:23 akosiaris@deploy1001: scap-helm mathoid upgrade production stable/mathoid [namespace: mathoid, clusters: eqiad,codfw]
  • 15:21 akosiaris@deploy1001: scap-helm mathoid finished
  • 15:21 akosiaris@deploy1001: scap-helm mathoid cluster codfw completed
  • 15:21 akosiaris@deploy1001: scap-helm mathoid cluster eqiad completed
  • 15:21 akosiaris@deploy1001: scap-helm mathoid upgrade -h [namespace: mathoid, clusters: eqiad,codfw]
  • 14:11 banyek: powering off dbstore2002.codfw.wmnet for BBU change (T205257)
  • 13:47 marostegui: Deploy schema change on s4 eqiad, this will generate lag on eqiad - T205913
  • 13:06 marostegui: Deploy schema change on s7 eqiad, this will generate lag on eqiad - T205913
  • 12:47 banyek: converting enwiki.content to TokuDB on host dbstrore1002 (T205544)
  • 12:47 banyek: converting enwiki.contents to TokuDB on host dbstrore1002 (T205544)
  • 11:58 banyek: converting wikidatawiki.slots to TokuDB on host dbstrore1002 (T205544)
  • 11:41 arturo: downtime labstore1007 load check in icinga for 1d
  • 11:21 zeljkof: EU SWAT finished
  • 11:19 ladsgroup@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/FlaggedRevs/frontend/specialpages/reports/ProblemChanges_body.php: SWAT: Use proper index on change_tag table (T205904) (duration: 00m 57s)
  • 10:58 mobrovac@deploy1001: Synchronized rpc/RunSingleJob.php: RunSingleJob: Delay job execution while in read-only mode - T204154 (duration: 00m 57s)
  • 10:34 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2092 (duration: 00m 56s)
  • 10:24 marostegui: Deploy schema change on db2092 - T203709
  • 10:24 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2092 (duration: 00m 56s)
  • 09:30 marostegui: Deploy schema change on s2 eqiad master, lag will be generated T205913
  • 08:43 banyek: disabling puppet on es2001 and disabling backups too
  • 08:28 marostegui: Deploy schema change on s6 eqiad master, lag will be generated T205913
  • 08:16 jynus: test recover some s3 wiki data onto db1110 (s5)
  • 08:04 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1110 (duration: 00m 56s)
  • 08:04 marostegui: Deploy schema change on s5 eqiad master, lag will be generated T205913
  • 08:01 banyek: converting wikidatawiki.content to TokuDB on host dbstrore1002 (T205544)
  • 07:54 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2071 (duration: 00m 55s)
  • 07:50 marostegui: Deploy schema change on db2071 T205913
  • 07:50 mholloway-shell@deploy1001: Finished deploy [tilerator/deploy@6c80537] (maps1004): Disable event logging requests and remove HTTP proxy (duration: 00m 17s)
  • 07:49 mholloway-shell@deploy1001: Started deploy [tilerator/deploy@6c80537] (maps1004): Disable event logging requests and remove HTTP proxy
  • 07:49 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2071 (duration: 00m 56s)
  • 07:48 mholloway-shell@deploy1001: Finished deploy [kartotherian/deploy@0bf513a] (maps1004): Remove HTTP proxy (duration: 00m 16s)
  • 07:48 mholloway-shell@deploy1001: Started deploy [kartotherian/deploy@0bf513a] (maps1004): Remove HTTP proxy
  • 07:42 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2088:3311 (duration: 00m 56s)
  • 07:36 marostegui: Deploy schema change on db2088:3311 T205913
  • 07:36 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2088:3311 (duration: 00m 55s)
  • 07:32 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2072 (duration: 00m 55s)
  • 07:18 marostegui: Deploy schema change on db2072 T205913
  • 07:17 marostegui@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2072 (duration: 01m 02s)
  • 05:22 _joe_: stopped tilerator on maps1004, was spamming like crazy
  • 01:18 ejegg: updated CiviCRM from e7a620a00c to c353eba283

2018-10-01

  • 23:44 eileen: update process control revision is b9c7ab286e - define but not enable Redis
  • 23:43 foks: disabling 2FA for two users
  • 23:31 twentyafterfour: finished creating database tables
  • 23:18 twentyafterfour: creating ipblocks_restrictions table (command run on mwmaint2001: foreachwiki sql.php maintenance/archives/patch-ipblocks_restrictions-table.sql)
  • 22:52 ppchelko@deploy1001: Finished deploy [restbase/deploy@babfe80]: Don't log the request for transform failures, take 3, feeds check timeouts (duration: 06m 22s)
  • 22:46 ppchelko@deploy1001: Started deploy [restbase/deploy@babfe80]: Don't log the request for transform failures, take 3, feeds check timeouts
  • 22:45 ppchelko@deploy1001: Finished deploy [restbase/deploy@babfe80]: Don't log the request for transform failures, take 2, feeds check timeouts (duration: 03m 57s)
  • 22:41 ppchelko@deploy1001: Started deploy [restbase/deploy@babfe80]: Don't log the request for transform failures, take 2, feeds check timeouts
  • 22:41 ppchelko@deploy1001: Finished deploy [restbase/deploy@babfe80]: Don't log the request for transform failures (duration: 12m 27s)
  • 22:29 ppchelko@deploy1001: Started deploy [restbase/deploy@babfe80]: Don't log the request for transform failures
  • 21:17 arlolra: Updated Parsoid to 224ecde (T198504, T133673, T202666)
  • 20:45 arlolra@deploy1001: Finished deploy [parsoid/deploy@8ff45db]: Updating Parsoid to 224ecde (duration: 08m 22s)
  • 20:37 arlolra@deploy1001: Started deploy [parsoid/deploy@8ff45db]: Updating Parsoid to 224ecde
  • 20:35 gehel@deploy1001: Finished deploy [wdqs/wdqs@a637583]: New version of WDQS GUI, updater and blazegraph (duration: 14m 00s)
  • 20:21 gehel@deploy1001: Started deploy [wdqs/wdqs@a637583]: New version of WDQS GUI, updater and blazegraph
  • 19:52 gehel@deploy1001: Finished deploy [wdqs/wdqs@a637583]: New version of WDQS GUI, updater and blazegraph (wdqs1009 only) (duration: 00m 30s)
  • 19:51 gehel@deploy1001: Started deploy [wdqs/wdqs@a637583]: New version of WDQS GUI, updater and blazegraph (wdqs1009 only)
  • 19:27 ppchelko@deploy1001: Finished deploy [restbase/deploy@7caf4d8]: Content-negotiation filter going live T128040 (duration: 03m 38s)
  • 19:24 ppchelko@deploy1001: Started deploy [restbase/deploy@7caf4d8]: Content-negotiation filter going live T128040
  • 19:11 thcipriani: restarting ci jenkins for new plugins
  • 18:33 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable page issues A/B test at 20% rate (T200792) (duration: 00m 56s)
  • 18:28 Amir1: ladsgroup@mwmaint2001:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=enwiki --prefix (T201009)
  • 18:23 catrope@deploy1001: Synchronized php-1.32.0-wmf.23/maintenance/includes/DeleteLocalPasswords.php: T201009 (duration: 00m 56s)
  • 18:17 catrope@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/PageTriage/: Ensure valid AFC option is selected (T205324, T205168); hide copyvio behind a global var and URL param (duration: 00m 57s)
  • 18:12 catrope@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Enable page issues A/B test at 5% rate (T200792) (duration: 00m 59s)
  • 17:59 XioNoX: push fw change on pfw3-eqiad - T205888
  • 17:57 XioNoX: push fw change on pfw3-codfw - T205888
  • 17:28 onimisionipe@deploy1001: Finished deploy [wdqs/wdqs@a637583]: Test deployment for recent updater build and GUI changes. Also blazegraph updates(wdqs1009) (duration: 01m 46s)
  • 17:27 onimisionipe@deploy1001: Started deploy [wdqs/wdqs@a637583]: Test deployment for recent updater build and GUI changes. Also blazegraph updates(wdqs1009)
  • 17:06 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1093, db1064 (duration: 00m 57s)
  • 17:02 jynus: stopping some mariadb instances on dbstore1001 and starting compression T201392
  • 16:26 ppchelko@deploy1001: Started restart [cpjobqueue/deploy@58f9ed3]: Fix KafkaConsumer not connected error
  • 15:16 jynus: stopping db1064 to clone it to dbstore1001
  • 15:00 akosiaris: upgrade etherpad to 1.7.0-2
  • 14:14 anomie@deploy1001: Synchronized wmf-config/InitialiseSettings.php: Setting MCR migration stage to write-both/read-new on mediawikiwiki (T198308) (duration: 00m 56s)
  • 13:51 banyek: Downtimed the slave lag monitoring on dbstore1002 while the tables getting converted (T205544)
  • 12:38 akosiaris: upload hfst_3.13.0~r3461-1+wmf2 to apt.wikimedia.org/jessie-wikimedia/main. T199962
  • 12:26 banyek: converting enwiki.categorylinks to TokuDB on host dbstrore1002 (T205544)
  • 12:19 banyek: stopping replication on s2@dbstore20002: the tables being compressed (T204930)
  • 12:19 banyek: stopping replication on s2@dbstore20002: the tables being compressed
  • 12:15 banyek: enabling puppet on labsdb1009, labsdb1010, labsdb1011 (T183983)
  • 12:13 zeljkof: EU SWAT finished
  • 12:12 zfilipin@deploy1001: Synchronized php-1.32.0-wmf.23/extensions/ContentTranslation/: SWAT: Fix error in CXTransclusionNode#afterRender method (T205521) (duration: 00m 59s)
  • 11:56 jynus: stopping db1093 to clone it to dbstore1001
  • 11:52 arturo: install prometheus-openstack-exporte 0.0.8-3 in reprepro T203177
  • 11:41 zfilipin@deploy1001: Synchronized wmf-config: SWAT: Remove unused default source language config for CX (duration: 00m 57s)
  • 11:16 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Repool db2058 (duration: 00m 55s)
  • 11:09 _joe_: killed bash runner.sh by user ladsgroup on mwmaint2001
  • 10:58 jynus@deploy1001: Synchronized wmf-config/db-codfw.php: Depool db2058 (duration: 00m 57s)
  • 10:52 jynus@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1093, db1064 (duration: 00m 57s)
  • 10:42 jdrewniak@deploy1001: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 56s)
  • 10:41 jdrewniak@deploy1001: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 00m 59s)
  • 10:21 godog: repair /dev/sdf1 /dev/sde1 on ms-be1041 - T199198
  • 10:15 Amir1: ladsgroup@mwmaint2001:~$ mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --prefix on all CentralAuth wikis (T201009)
  • 10:10 Amir1: mwscript extensions/CentralAuth/maintenance/deleteLocalPasswords.php --wiki=fawiki --delete (T201009)
  • 09:33 godog: test formatting sdh and sdi on ms-be2040 with crc=0 - T199198
  • 09:15 volans: Set Racktables in read-only mode - T199083
  • 08:56 _joe_: rolling restart of parsoid in codfw; afterwards, parsoid will connect to the MediaWiki API via HTTPS
  • 08:54 _joe_: rolling restart of parsoid in eqiad
  • 07:54 banyek: disabling puppet on labsdb1009, labsdb1010, labsdb1011 (T183983)
  • 07:54 banyek: disabling puppet on labsdb1009, labsdb1010, labsdb1011
  • 07:00 mholloway-shell@deploy1001: Finished deploy [kartotherian/deploy@ab6cb74] (maps1004): Update kartotherian to latest (T205462) (duration: 00m 16s)
  • 07:00 mholloway-shell@deploy1001: Started deploy [kartotherian/deploy@ab6cb74] (maps1004): Update kartotherian to latest (T205462)
  • 06:39 mholloway-shell@deploy1001: Finished deploy [tilerator/deploy@22f90ee] (maps1004): Update tilerator to latest (T205462) (duration: 00m 19s)
  • 06:39 mholloway-shell@deploy1001: Started deploy [tilerator/deploy@22f90ee] (maps1004): Update tilerator to latest (T205462)
  • 05:35 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Repool db1103:3312 (duration: 00m 56s)
  • 05:19 marostegui: Stop replication on dbstore1002 and db1103:3312 in sync
  • 05:19 marostegui@deploy1001: Synchronized wmf-config/db-eqiad.php: Depool db1103:3312 (duration: 01m 01s)
  • 05:19 mholloway-shell@deploy1001: Finished deploy [mobileapps/deploy@07cbfb4]: Update mobileapps to a1fa41b (duration: 03m 18s)
  • 05:15 mholloway-shell@deploy1001: Started deploy [mobileapps/deploy@07cbfb4]: Update mobileapps to a1fa41b
  • 05:07 marostegui: Deploy schema change on s1 codfw msater - T203709
  • 03:21 onimisionipe: restarting inplace reindexing of enwiki and viwiki at codfw - T204362


Archives

See Server admin log/Archives.